Rewriting macros - the peculiar case of php

Without going into theoretical details, some of which are quite alien to me, I'll try to describe some of the challenges that pattern patching rewriting macros might pose for a language like php. After brief explanation what kind of a beast is this, I try to explore some of the finer points, which might cause problems. The intent of this post is to sketch a design and highlight some of the possible issues.

pattern matching rewrite only macros - a bird eye view

The idea is simple really. Imagine that you have something, usually some parser, which produces a tree representation of an input text. This initial tree is passed to a transformer, which knows some transformation rules. A transformation rule maps an input tree to an output tree. When a transformer applies a rule the output is copied to the input to the next application. The following pseudo-php snippet explains it best:
while($act) {
$rtree = transformer($itree);
if( !equal_tree($rtree, $itree)) $act = false;
$itree = $rtree;
}

The the decision, which rule to apply, is based on matching the pattern shapes to shapes present in the input tree, based on some tree walking strategy. Of course there needs to exist a strategy for resolving pattern conflicts, i.e when more than one rules match the same tree.

This general tree transformation process roughly describes what is meant here by macros.

Transformation time

In order to understand what a macro does and how it could be used, we need to know when such transformations occur during a program's lifetime. If we distinguish the overall lifetime using read-analyse-evaluate phases, then the macro transformations are part of the program analysis phase. In some cases it can leak to both the read and evaluate, but those are not going to be considered here.

A special case of understanding the times is how should we treat included files, since they might introduce new macro definitions after the transformation. For the time being this issue will be filed to the back of the queue, by using the classic cop out - if you don't want to deal with it - ignore it. That is for now the assumption is macros can't create new macros. If they do, the result is undefined.

Let's see what form the program code takes during the different times

before read
linear text
after read
a rich syntax tree
during transform
a rich syntax tree
after the transform
a pure syntax tree
evaluation
a pure syntax tree
pretty print
a linearisation of the pure syntax tree, in our case a "proper" php program string/file

A rich tree means that the tree might still contain macro shapes.
A pure means that only core language (php) shapes are present.

Basic/skeleton shapes and intermediate shapes

The most primitive, basic shapes describing a tree are - what is a leaf and what is a compound node. The reader needs to be able to recognise at least them. For the time being I won't cosider mixing the macro definitions with the program source, the extension is trivial.

Let's see the javascript like snippet:
$z = function ($x,$y) { $x; }
the skeleton syntax tree in s-expression form:
(z = function ($x $y) ( '{' $x '}' ))
The nested parenthesis as used in the s-expressions conveniently show the nesting of elements in the syntax tree. the quotes are used to denote non-obvious punctuation, otherwise consider that everything is treated as a string objects with optional (invisible) attributes.

And a possible pure text form:
function z($x, $y) { return $x; }

Code generation issues specific to php

In php some declarations are not first class values/objects. The nesting of classes and functions is prohibited. This means that for a useful macro system we need to provide a way
to specify that a shape (usually a definition) should appear in the global scope. Further, since php doesn't allow redefining symbols, we need to provide init or define time values, which are guranteed to appear only once in the top level of the syntax tree. The type of templates can be:

local/in place
replaces a matched shape
top
adds a the template to the top 'scope' (root of the tree)
init
adds the template to the init 'scope', guaranteed to be instantiated once and only once

The conflict resolution between patterns needs to be specified with care. I suppose weighting patterns with specificty and definition order should be done, but the exat ordr rules are unclear yet.

Hygiene

The so far defined transformation language can't guarantee that it won't introduce conflicting variable or other symbol definitions into the pure language. In fact it doesn't have a clue about the underlying language at all. For the interst of hygiene it will good to provide unique name generation, so that the programmer can explictly maintain hygiene if required.

A rough macro shape outline

macro macro_name {
init{ template } //optional, if any pattern matches place this template into the init environment
match { pattern } => {
init { template } //optional, if any pattern matches place this template into the init
top { template } //optional, if any pattern matches place this template into the init
{ template } //optional, if any pattern matches place this template into the init
}
} //the temaplates can be empty - this reduces the result to a reflection
// of the big bucket in the sky

Output

Initially this should be a pure preprocessor, similar to cpp, etc... Later? who know.

Status

I'm experimenting with convenient paresrs and tree representations at the moment, and different priority algorithms.

Powered by Drupal, an open source content management system