During the drupal plugin/update manager discussions I had an aha moment. One of those weird and wonderful ideas came back to me. What if most of the code lived in the db? One would be able to arrange the co-habitation of several concurrent versions of the same website relatively easy. Backups would mean database backup.
Funnily enough, this can help two opposite (scale-wise) types of users - the bottom end, cheapest or free hosting ones and the load balanced crowd.
Why "back"? Well... I had this idea ever since the user streams appeared in php, version 4.3 or there abouts, but it just nestled cosily in the back of my mind, waiting for love, the shy little thing.
Ok. So what is this about? Since php allows you to write stream wrappers and include* and require* can use arbitrary streams to load code, one should be able to put the code in a database, load it and execute it. The biggest obvious downside is that it is probably slow. How much?
I decided to benchmark it. I've prepared a micro-benchmark to test the idea and to see how significant would be the difference in performance. One should note, that since this is mostly an IO bound task, the difference in performance will result mostly in higher response times, rather than cpu load. Bear in mind that the benchmarks were performed on a tiny Acer Aspire One netbook with 512MB RAM with its standard SSD drive.
I've prepared three different small programs. The first just including 20 php files. The third including the same code from sqlite3 via streams. The second is including the 20 php files, but contains the streams code to have a similar parsing time profile. The files are attached to this post, if you want to run them yourselves, just rename and assign the appropriate permissions.
I've used the criterion haskell library to gather, process the statistics for me and to draw the nice plots below.
The haskell program is simple. It just declares and executes the three benchmarks:
import Criterion.Main (defaultMain, bench, bgroup) import System.Cmd (system) main = defaultMain [ bgroup "php includes" [ bench "standard/clean" $ system "./clean.php" , bench "standard/mixed" $ system "./non-stream1.php" , bench "streams" $ system "./stream1.php" ] ]
To compile use
ghc --make bench
I've writtern a barebones TestStream class adhering to the streams api, pass it to stream wrapper and do 20 times include_once. The includes have one print statement ala hello world.
The non-stream versions
The base case "standard/clean" just includes the 20 files. The "standard/mixed" includes the 20 files and has a useless copy of the TestStream class to bulk up the code to judge the significance of the parsing overhead.
The benchmark results
benchmarking php includes/standard/clean collecting 100 samples, 2 iterations each, in estimated 12.13241 s bootstrapping with 100000 resamples mean: 58.12652 ms, lb 57.14786 ms, ub 60.15813 ms, ci 0.950 std dev: 6.912029 ms, lb 4.108045 ms, ub 13.29588 ms, ci 0.950 found 6 outliers among 100 samples (6.0%) 2 (2.0%) high mild 4 (4.0%) high severe variance introduced by outliers: 1.000% variance is unaffected by outliers
benchmarking php includes/standard/mixed collecting 100 samples, 2 iterations each, in estimated 11.08999 s bootstrapping with 100000 resamples mean: 58.86753 ms, lb 57.81748 ms, ub 60.82246 ms, ci 0.950 std dev: 7.118014 ms, lb 4.625828 ms, ub 12.58350 ms, ci 0.950 found 8 outliers among 100 samples (8.0%) 5 (5.0%) high mild 3 (3.0%) high severe variance introduced by outliers: 1.000% variance is unaffected by outliers
benchmarking php includes/streams collecting 100 samples, 2 iterations each, in estimated 14.42270 s bootstrapping with 100000 resamples mean: 76.48482 ms, lb 74.66795 ms, ub 78.86988 ms, ci 0.950 std dev: 10.60164 ms, lb 8.515426 ms, ub 13.80536 ms, ci 0.950 found 8 outliers among 100 samples (8.0%) 7 (7.0%) high mild 1 (1.0%) high severe variance introduced by outliers: 1.000% variance is unaffected by outliers
As expected, the streams code is slower, it adds around 1ms per include file. If you compare the probability density estimates, you will see that there is a small, albeit probably insignificant, overlap between the standard and stream versions. The results suggest that in larger programs the effect will be far less significant. The results are encouraging. This technique definitely merits further investigation, run it with mysql - the most widespread database deployed alongside php and if time permits against a patched version of drupal.