programming

Iterator design pattern in php (revisited)

In this post I'll try to sketch a number of related ideas about working with sequential collections, all of them expressed in php. Enumerations, enumerables, interators, iteratees and their cousins have been tortured to death, but as Eric Meijer and his team working on LINQ and Rx frameworks for .net show, it pays off doing it systematically. I'll revisit the Iterator and Subject Observer design patterns and will derive them systematically from first principles.

In php iterators come handy for all kinds of work on sequences - usually using

foreach( $sequence in $key,value) { ... }

The problem with foreach is that it is not composable, but there are a lot of functions and operations on iterators which are, at least in theory. I will try to show how to achieve composability by, first, composing the processing of each individual step and, later, by composing iterators. This can lead to easily reconfigurable applications and a more declarative style of programming.

A look at state management in http

HTTP is a stateless protocol. This fact has confused and tripped many aspiring web developers. Mostly stateless. It was stateless until Netscape, back in the misty days of 1994, baked cookies and released them for public consumption. Later the Internet power mind, IETF, published the standard baking recipes in their RFC 2109, followed later by RFC 2965, titled "HTTP state management mechanism".

The standard is clear and straight forward. Not surprisingly, it is a translation to network protocol language of one of the most widely used approaches of pure functional programming to managing state - when you have to care about state, thread it. In this post I'm trying to transcribe it. For my amusement, public confusion as well as for brevity I will use haskell like notation when trying to present the modelling of the utility of the http cookies as a carrier of the http state.

More binary and interpolation search in haskell

The array code in the previous post on binary search in haskell is messy. Mostly due to my lack of style and the use of arrays, which add a layer of indirection and bounds checks.

This time, I'm going to rewrite the examples to be slightly more readable and I'll implement a linear interpolation search routine. The latter could be handy if function evaluation or array access (access to data on disk?) are relatively slow, and the data is sorted and relatively linear. Although the implementations could be coded even tighter, with less checks, I skip that optimisation for the sake of readability.

Binary search - haskell arrays

I've been looking more into haskell, as part of some personal projects. There I need some constant array like structures, that is ones which have integer indexing and there are some long winded reasons I'm not using a Map. As a first step I've decided look at haskell arrays, by writing some search functions. It is mostly a syntax exercise.

Micro-benchmarking php streams includes part 2

In part 1 I've reported the results of a micro-benchmark designed to compare the performance of plain php includes vs includes via streams from a sqlite database. In this post I extend the test cases with two more no-includes same work, to serve as a base case and includes from a mysql database, the code is the same as in sqlite case, the connector differs. You can find the code attached at the end previous post.

Overall, having in mind the random factors like netbook load (CPU and IO), the differences between the different test cases are insignificant on the test machine - Acer Aspire One netbook.

Please let me know if you decide to run these benchmarks. I will be especially interested to see what are the differences.

These results are encouraging enough to merit developing a more substantial test. I'm interested in benchmarking drupal. What scenarios would you suggest?

Micro-benchmarking php streams includes from database vs standard includes

During the drupal plugin/update manager discussions I had an aha moment. One of those weird and wonderful ideas came back to me. What if most of the code lived in the db? One would be able to arrange the co-habitation of several concurrent versions of the same website relatively easy. Backups would mean database backup.

Funnily enough, this can help two opposite (scale-wise) types of users - the bottom end, cheapest or free hosting ones and the load balanced crowd.

Why "back"? Well... I had this idea ever since the user streams appeared in php, version 4.3 or there abouts, but it just nestled cosily in the back of my mind, waiting for love, the shy little thing.

He Sells Shell Scripts Not Just to Intersect Sets

He Sells Shell Scripts to Intersect Sets is a good read. Set and multiset operations from the command line or in .sh. Rather clean, straightforward shell magic.

Science paper of the month, IMHO

In this paper we will propose the first — to our knowledge — method that finds the global optimum of the TSP by reducing the NP complexity of the problem to N2 complexity. We achieve this by using white light interferometry
(WLI)[14]. From the mathematical point of view we unfortunately haven’t disapproved the commonly believed conjecture that NP 6= P (the Clay mathematics institute set out a price of 1.000.000 US-$ for doing that) but we apparently reduced the complexity by a trick, namely replacing operations by photons (and the physics of interference).

In section 2 we introduce the method as a gedankenexperiment[thought experiment in friendly german]....
http://www.opticsexpress.org/abstract.cfm?id=140598

Just cool. It reminds me of the often forgotten analogue computers. They are very rarely used in practice these days, although they could be used to solve predefined probles fast, furious and sparkling.

With the introduction of uncertainty and loops you could get some very interesting results. Just think about the offspring of a theremin and a miniMoog on acid. Raving BEAM bots? Pacifist fighter drones?

An exercise with continuations, prompts and scheme macros

control resumed inspired me to take a small exercise.

"Code a minimal approximation of the python yield operator based generators"
Here follow the results I managed to get with help from Matthias Felleisen, via the plt mailing list. You can see the non-edited exchange via the mailing list archives and control and macors. All control&prompt are forms specific to plt scheme's control.ss module. The first attempt looks something like this:

(require (lib "control.ss"))

(define (make-step)
  (define (control-state)
    (yield 1)
    (yield 2)
    (yield 3)
    'finished)
  (define (yield value) (control resume-here (set! control-state resume-here) value))
  (define (generator) (prompt (control-state)))
  generator)

(define step (make-step))

(step) (step) 

It does the job. Kind of. For me the interesting bit is how the interplay of prompt and control achieve maintaining the current state of execution before the return and allowing to continue from that interruption. It can be done with call/cc but it looks a helluva a lot more complicated and is far more unreadable.

There is a problem with the code above though. As Matthias notes in his email:

LESSON: never 'test' by running things at the top-level prompt. It IS a prompt. -- Matthias

So comes version 2:

(require (lib "control.ss"))

(define (make-step)
  (define (control-state)
    (yield 1)
    (yield 2)
    (yield 3)
    'finished)
  (define (yield value) 
    (control resume-here 
             (set! control-state 
                   (lambda () (prompt (resume-here)))) 
             value))
  (define (generator) (prompt (control-state)))
  generator)

(define step (make-step))

(equal? '(1 2 3) (list (step) (step) (step)))

control-state is a variable holding a reference to a function with no arguments, doing what the python function would do. yield is a variable holding a reference to a function which does the interrupt magic. generator is a variable holding a reference to a (sugary) function which does the initial step.

The logic of the above algorithm is roughly as follows - mark thy state, do what you need to do, when you encounter control stop, remember your state by wrapping it in a function (resume-here), remember where do you what to continue from ( the set! control-state ...). Notice that the value of control-state is replaced. This is possible since the resume-here continuation is a function and a first class value.

So what next. Boilerplate code is tedious to write. And there is a lot of boilerplate code there. And every function, which needs to follow this recipe, will need to duplicate it.

So scheme macros to the rescue. It is not a trivial problem. A 'generator' function can look something like this:

Drupal, computer science, random bits

There are quite a few interesting computer science artefacts in Drupal. I'm going to try to highlight a few of them. Sorry, this is unfinished, and might not be finished ever, but I'm just trying to capture a snapshot of confused meandering thoughts.

Very late, dynamic binding

Late binding came to popular life with the advent of object oriented languages. Essentially it means binding of values, for example functions, to names at object creation, as opposed to compile or link time. In drupal, this can happen at any time, more even, it is algorithmic - that is you can change at runtime what is to be executed at a specific control point. The hook system is one of the ways to do it.

Powered by Drupal, an open source content management system