This is a very dirty and mathematically not precise dialogue between me and me on "general" relations in drupal.
Something taking the form of (object,predicate,subject), or in its general form (objects,predicate,subjects). The predicate can be viewed as a label denoting the relation. In a lot of cases, the predicate of a relation is implied.
Well, with relations we can express things like X is part of Y or x,y,z are Y
Fine. Let's say we have Mike, a person, who likes reading sci-fi, is a programmer, who knows C,C++,PHP. We can describe him as:
Mike - is a - person
Mike - likes - reading sci-fi
Mike - is a - programmer
Mike - knows - C,C++,PHP
subject-predicate-object
We can use these declarations to ask things about Mike, to infer things about him, etc...
Indeed. We could use Prolog to store and infer information about Mike. The questions we can ask can be formalised as functions. In other languages we will need to build our own inference engines, optimised for the particular use cases we are interested.
Actually RDF is built around such concepts as well.
Not necessarily so. Let's see. Nodes are predefined relations already. Relations between the fields of a node. We can ask What is the title of node with nid=1234. The taxonomy terms define other types of relations. We can ask Which terms relate to a node and Which nodes relate to a term. So we already have some, pretty basic, inference. It would be interesting, and useful to generalise this. So we can ask more complicated questions. That can result in more interesting and useful sites, with richer navigation.
For simplicity let's assume everything is a node (or has a node reflection).
If we have relations defined as sets of functions - for example: function relation_has terms($node), function relation_nodes_with_term($term). These two questions give us the possibility to construct more complicated questions, based on the answers of a question. If we think of these functions as filters a series of questions become sequential filters. They should be executable in directly in SQL, either with a sequence of SQL statements or forming a single using SQL injection techniques similar to the permissions system.
A general $node-$relation_type-$node, can be expressed as an adjacency list matrix. This has the following advantages:
Cons:
Unfortunately we won't find a good model for all kinds(by structure) of relationships. So I would suggest using
specialised indexing schemes for the different kinds of structures.
For unordered sets we can simply flag or tag all elements to a set identifier, similar to tagging nodes with taxo-terms. (nid, fid)
For ordered sets add weights. Similar to the way we build forms. (nid,fid,weight)
For trees use nested sets. (nid,left,right)
For graphs - adjacency lists or suffer data duplication. There are possible optimisations, but we'll need to limit certain characteristics of the graphs and complicate the code unnecessary. (nid,nid)
I would suggest having common functionality to implement the different indexing schemes, which could be applied to each individual relation type. I think in a lot of cases we will be able to come to declarative style statements ==> define the relationship types interactively via the web interface.
Adjacency graphs are not necessary in the first two instances. They might be handy in the trees case, to speed up some of the queries. For example "all siblings of node X" is pretty straight forward and fast with this model, as opposed to the nested sets.
We can always transform that to relations between different sets (regardless of their structures). This leads me to believe that we should view the relations themselves as nodes, so we can express such statements without too much extra code. They are just "dynamic nodes"® . Basically, we could replace most of the "_page" functions with nodes in db. This can be handy for drupal customisations.But let's not get carried away.
Yes. It is. And all these acrobatics are just to optimise the different queries we might want, and to be able to provide the developers and site-admins with an API, which is good enough for the majority of cases. They will still need to put more effort if their requirements are really esoteric.
To organise the content better, generally speaking. To be able to do fancy, flexible sites. To express and display interesting interactions between different nodes. To be cool. To make life harder. To make Drupal's learning curve even steeper. Many more reasons out there. We will be able to define aggregate relations or use computed ones. Define different similarity metrics. Add your own reason.
Let's try
If we accept that relations are nodes, then we can have a relation node type. The different types of relations (by structure) can be expressed via nodeapi. The question is open about the title, and do we need it there, but let's skip it for the moment.
We want to be able to express node(a) relation(A) node(b), so the relation node type should have relation_subjects($relation_node,$node) and relation_objects($relation_node,$node)
Since we want to be able to express composite relations like relation_subjects($relation, relation_subjects($relation1,$object)), the the functions should be able to accept arrays as their arguments.
An alternative notation can be to define function relation_compose($relation,$relation,...), returning, or performing the composition of the relations.
The specialised structure dependent indexing is applied via the appropriate nodeapi hooks.
No, I don't think so. The hand optimised code shouldn't need changing. If you want to mix it with the "new" code, then probaly some wrappers could be written.
I'll try and add code to describe the indexes, and put more meat in the pan, so we can have a better, non theoretical discussion.
What do you think?
... In fact I liked it so much I thought I was reading part of my own essay I did this week!
I've followed my own thoughts up (regarding the 'generalized relationship discussion on Drupal) with a big post of my own
Anyway - I thought your approach here was going well, until you skipped the bit about the "meaning" of the terms (ontology) and went straight to 'types of structures'.
I see from your thoughts on relation modules that you are coming at this from a database table sort of angle, thus you want to lock down and solve the rules early on.
That's cool, but I think you might want to find a more general case solution for solving how the sets work. Have a look at OWL - it might not be actually useful to use, but it covers the problem you are looking at, and will show how some folk are solving it, and gives you some useful terminology too..
.dan.