Models, records, databases, friends and foes

the beginnings of a long on DB related pattern discussion.

Problems of active record and friends

After a lot of silence caused by lot's of work (good, my bills are happy) and a continuous diversion into scheme, haskell, dylan and other interesting languages, I'm back into php speaking land. Funny feeling that. So here comes the beginning of something I've been continuously rubbing my (leftovers of) brains against.

copout This is not php specific, but since it discusses frequent problems occuring in php apps, I've labeled it with as php related as well. Makes it easier maintaining the site you see.

I have a problem with active record as a pattern. It is a logical one. Essentially it turn a (SQL) database row into an object. The class of the object represents the SQL table or view. Of course you can add behaviour to those classes and objects, i.e. activating those records.

All that is nice and dandy, but when you start talking about relations and constructing dynamically queries and corresponding objects, we start hitting the limitations of what I will call from now on the naive active record.

The problem is not trivial at all. It boils down to how to make peace between the host language type system (for example classes) and the SQL's dynamic compound types - the relations expressed as SELECT variants for example. It gets even harder when we decide to reverse the direction, so that we actulally want to update the database. Yuk! As though somebody actually does that!

A first look into the meaning of SQL higher types

What are the SQL tables, views and joins?

Records actually. The sql tables are simple records with named fields. The name of the record type is the name of the table. The type definition is the table definition. In SQL you have the usual projectsion operators, like table.filed_name. More interestingly you can actually construct new temporary record types. For example in the statement:
SELECT n.one, n.two, n.three FROM n;
We construct a new (anonymous) type whose slots inherit the names from the record where they are copied from. Well, you could assign a different name if you so desire.

Joins

If we think of tables as record types and rows as records, then JOIN is an operator on record types not records. It is more complicated than that, but for the time being let's stick to this meaning.

Consider the following example:
A::{ a, b, c} JOIN B::{b, c, d} -> {A.a, A.b, A.c, B.b, B.c, B.d}
as used in
SELECT A.*, B.* FROM A,B
;; alternatively
SELECT A.*, B.* FROM A CROSS JOIN B;

The result type is a union, or more precisely a cartesian product, of the arguments of the JOIN op. This resembles multiple inheritance, with a crucial difference - you preserve all ops of the parents (the tables arguments of the JOIN). This causes one of the majour problems with active record - we don't have a language, let alone php doing it, that allows us to naturally model this using the language itself, rather than constructing strange hacks ourselves. The opposite direction - folding inheritance and friends into SQL meets similar kinds of problems.

The rest of the JOINS are essentially specialised products (unions) which can be modelled with some extra operators on types, which are dutifully skipped for the time being.

Views are 'named' joins, so they are not much more interesting. What is interesting is the fact that you can use anywhere, where you can use tables.

So what do we have? Some kind of a mental model, meta-language, if you like, expressing how to construct records, or what are these records made of, (which is the same). If we were to implement that language, mapping as much as possible native language constructs, we would have the means to reason about our objects and automagically construct database queries of the appropriate kind. Maybe even optimise the queries with respect of what is required for our application, rather that simply relying on the generic SQL optimiser or a handy local SQL guru.

Active record comes to play after we know the record. The question is how to get to that point. The answer lies in the tools of the native language. I'll describe later some possible approaches in php, probably not good, but you are welcome to improve them and tell me how to do it better.

First sketches

Ok, Let's recap:

  • Relational database (SQL) tables are records. We can view them (in our heads) as record type definitions.
  • JOINS define (some kind of) derived union types.
  • SQL CROSS JOINS are full cartesian products, i.e. union types where no name folding/aliasing is done by the imaginary type system

How can we model that in php?

Classes (and objects) are records. Arrays are records too. Both are candidates for doing the job. We need to be able to somehow represent this (meta) type information and manipulate it. The aim is to have a natural feeling/looking abstraction of the database in php. It should be flexible enough, for us to modify the relations at runtime, as we need. It should allow us in the long run to have a near optimal speed and not too much complications in the end code. That is a tough cookie.

pre 1: table A is - A::{ a, b, c}
pre 2: table B is - B::{ b, c, d}
case 1: view C is - C::{ A.a, A.b, A.c, B.a, B.b, B.c } , i.e. a cross join of A & B
case 2: view D is - D::{ A.a, A.b, A.c, B.c | A.b = B.b }, same as above with a guard

Let's use the above cases as basis for reasoning about building our php models. We can extend them with extras later.

table as active record


class A_Record {
var a,b,c ;
.....
}
function load_A {
$result = db_query('SELECT * from A');
while( $a[] = new( db_get( $result) );
return $a;
}

Skipping most of the details, the above code outlines the basic idea of aquiring/loading into memory all values of type A, make sure it is not an endless loop, the real life implementation of the database abstraction layer you are using. The methods of the A_record class will implement the behaviours of these records, making it an active record. load_A is external to the class for obvious reasons - it is a type property, as opposed to a class one. It belong to the type. We could, in theory make it a static method, but it's not clear what benefits this will give. At least for now.

First attempts at joins

//case 2: C = A JOIN B -> C::{ A.a, A.b, A.c, B.a, B.b, B.c }
class C_Record {
var a_a, a_b, a_c;
var b_a, b_b, b_c;
...
}
//load similar to previous example, just add sql join

The problems with this approach are that it is hardcoded, inflexible, hard to maintain.
The good side is that it is simple and straightforward. Assuming that we can recognise (and we can using reflection) we can use a polymorphic call to load.

Let's go for take 2 (we can't use inheritance =( ):
class C_record {
private A; //we need to ensure that only variables of A_record can end up here
private B; //we need to ensure that only variables of B_record can end up here
....
//use your bag of tricks to keep objects of this class type safe:
// mostly _call(), _get(), _set() and various design patterns :)
}

Essentially we implement the association (join) operation using a special class. This is fine, but we now introduce an indirection via A and B and the extra calls to _call, _get or _set, which will slow down the whole thing.

Time for take 3.
$op['product'][] = 'A_record';
$op['product'][] = 'B_record';
....
$res = op_eval($op);
// Array('fields' => Array( ... ),
// 'load' => 'SELECT a.*, b.* FROM a,b',
// 'insert' => ..., 'update' => ... );

Basically we (somehow) build the type ops AST, produce load, update, insert, delete, .... SQL code. Cons - slow, but we can cache the result array for reuse, which will make the whole execution times faster, at least competitive with takes 1&2. Pros - flexible.

Take N. I might add more of these later.

The choice depends on the problem and flexibility at hand. There is no 'best' solution.