Going mad talking about relations

This is a very dirty and mathematically not precise dialogue between me and me on "general" relations in drupal.

What is a relation

Something taking the form of (object,predicate,subject), or in its general form (objects,predicate,subjects). The predicate can be viewed as a label denoting the relation. In a lot of cases, the predicate of a relation is implied.

Yes, but what does that mean?

Well, with relations we can express things like X is part of Y or x,y,z are Y

Ok, but give me some more real examples

Fine. Let's say we have Mike, a person, who likes reading sci-fi, is a programmer, who knows C,C++,PHP. We can describe him as:


  Mike - is a  - person
  Mike - likes - reading sci-fi
  Mike - is a  - programmer
  Mike - knows - C,C++,PHP
  subject-predicate-object
                

We can use these declarations to ask things about Mike, to infer things about him, etc...

This reminds me of Prolog

Indeed. We could use Prolog to store and infer information about Mike. The questions we can ask can be formalised as functions. In other languages we will need to build our own inference engines, optimised for the particular use cases we are interested.

Actually RDF is built around such concepts as well.

Yeah, yeah, but what about Drupal. It's in php. Isn't it going to be complex to do this?

Not necessarily so. Let's see. Nodes are predefined relations already. Relations between the fields of a node. We can ask What is the title of node with nid=1234. The taxonomy terms define other types of relations. We can ask Which terms relate to a node and Which nodes relate to a term. So we already have some, pretty basic, inference. It would be interesting, and useful to generalise this. So we can ask more complicated questions. That can result in more interesting and useful sites, with richer navigation.

Give me examples

For simplicity let's assume everything is a node (or has a node reflection).

If we have relations defined as sets of functions - for example: function relation_has terms($node), function relation_nodes_with_term($term). These two questions give us the possibility to construct more complicated questions, based on the answers of a question. If we think of these functions as filters a series of questions become sequential filters. They should be executable in directly in SQL, either with a sequence of SQL statements or forming a single using SQL injection techniques similar to the permissions system.

A general $node-$relation_type-$node, can be expressed as an adjacency list matrix. This has the following advantages:

  • can be applied to any relationship structures
  • is widely used, so there is a lot of knowledge about it out there in the wild
  • has an implied order (actually it could be a disadvantage in the case of sets without order)
  • you can derive simple distance metrics based on this representation (it is true for all graph models)

Cons:

  • Slow, since you require a lot of sequential queries to get "all children" or "all after me" nodes

Unfortunately we won't find a good model for all kinds(by structure) of relationships. So I would suggest using
specialised indexing schemes for the different kinds of structures.

What types of structures?

  • unordered sets
  • ordered sets
  • trees (and forests)
  • general unordered and ordered graphs

For unordered sets we can simply flag or tag all elements to a set identifier, similar to tagging nodes with taxo-terms. (nid, fid)

For ordered sets add weights. Similar to the way we build forms. (nid,fid,weight)

For trees use nested sets. (nid,left,right)

For graphs - adjacency lists or suffer data duplication. There are possible optimisations, but we'll need to limit certain characteristics of the graphs and complicate the code unnecessary. (nid,nid)

I would suggest having common functionality to implement the different indexing schemes, which could be applied to each individual relation type. I think in a lot of cases we will be able to come to declarative style statements ==> define the relationship types interactively via the web interface.

Wait a moment. What do you need to store everything as adjacency graphs?

Adjacency graphs are not necessary in the first two instances. They might be handy in the trees case, to speed up some of the queries. For example "all siblings of node X" is pretty straight forward and fast with this model, as opposed to the nested sets.

And what about many to many relations?

We can always transform that to relations between different sets (regardless of their structures). This leads me to believe that we should view the relations themselves as nodes, so we can express such statements without too much extra code. They are just "dynamic nodes"® . Basically, we could replace most of the "_page" functions with nodes in db. This can be handy for drupal customisations.But let's not get carried away.

Hmm. But isn't SQL anyway about defining relations? Why do we need to do all this.

Yes. It is. And all these acrobatics are just to optimise the different queries we might want, and to be able to provide the developers and site-admins with an API, which is good enough for the majority of cases. They will still need to put more effort if their requirements are really esoteric.

  • a node is a relation between all its fields, it is labelled by the nid
  • There relations between nodes and terms. Their exact meaning differs on context.
  • We already have a node ownership relation - a user owns a node.
  • events are another kind of relation - basically an extension to the node relation
  • we could extract node-links to-node relations if we really wanted to.
  • the search is another example of a dynamic/algorithmic relation
  • ...

Might sound odd, but why do we need this relations stuff?

To organise the content better, generally speaking. To be able to do fancy, flexible sites. To express and display interesting interactions between different nodes. To be cool. To make life harder. To make Drupal's learning curve even steeper. Many more reasons out there. We will be able to define aggregate relations or use computed ones. Define different similarity metrics. Add your own reason.

Let's model the "relations" language then

Let's try

If we accept that relations are nodes, then we can have a relation node type. The different types of relations (by structure) can be expressed via nodeapi. The question is open about the title, and do we need it there, but let's skip it for the moment.

We want to be able to express node(a) relation(A) node(b), so the relation node type should have relation_subjects($relation_node,$node) and relation_objects($relation_node,$node)

Since we want to be able to express composite relations like relation_subjects($relation, relation_subjects($relation1,$object)), the the functions should be able to accept arrays as their arguments.

An alternative notation can be to define function relation_compose($relation,$relation,...), returning, or performing the composition of the relations.

The specialised structure dependent indexing is applied via the appropriate nodeapi hooks.

What about the 'old way'. Do I need to change my old handrafted good code?

No, I don't think so. The hand optimised code shouldn't need changing. If you want to mix it with the "new" code, then probaly some wrappers could be written.

I'll try and add code to describe the indexes, and put more meat in the pan, so we can have a better, non theoretical discussion.

What do you think?

I liked the first half

... In fact I liked it so much I thought I was reading part of my own essay I did this week!

I've followed my own thoughts up (regarding the 'generalized relationship discussion on Drupal) with a big post of my own

Anyway - I thought your approach here was going well, until you skipped the bit about the "meaning" of the terms (ontology) and went straight to 'types of structures'.

I see from your thoughts on relation modules that you are coming at this from a database table sort of angle, thus you want to lock down and solve the rules early on.

That's cool, but I think you might want to find a more general case solution for solving how the sets work. Have a look at OWL - it might not be actually useful to use, but it covers the problem you are looking at, and will show how some folk are solving it, and gives you some useful terminology too..

.dan.

let's try not to overcook the broth :)

It is a longer discussion, so I wrote a new post instead of a comment :

Powered by Drupal, an open source content management system