Most of my work on this is on paper, and some code sketeches. There quite a few non-trivial issues, and hard decisions to be made. Since I have the knack of loosing or recycling the various paper bits lying around, I decided to keep at least some of them in my diary, aka this blog.
initial requirements
- convenient syntax and semantics
- convenience matters. The syntax should be obvious. The meaning of the syntax should be easy to grasp.
- minimal core
- the core of the macro processor should be minimal. This will help managing the code quality. The minimal core should be enriched using macros.
- retargetable/multi-targets
- using one and the same 'surface syntax' to produce different target code. It should be possible to have a one to many code generation. Example application - one definition producing javascript, php and sql code. Alternative example: one definition - three code targets - debug, deployment + a set of autogenerated tests
- target and syntax agnostic
- no assumptions should be made (at least at top level) about the target or syntax. We should be able to accomodate any theoretically parsable syntax, within reason, of course. Subject to parsing strategy selection, etc...
processor core, first observations
Take a step back, close your eyes and try to imagine a macro transformer. Good. Now let's look what is going on.
The thing magically finds a pattern in a program tree and replaces it with a template substituting any found variables defined in the pattern in the process. There are a few possible followup strategies from here.
- deep evaluation
- Always normalise the result of the macro call, i.e. immediatley try and perform all valid substitutions in the result of a macro call. It is equivalent to inserting the result of the macro call into the imput stream.
- shallow evaluation
- don't modify the input of the parser, but chain parsers. The two are possibly equivalent, but that requires proof.
A single expansion pass stops when there is no more input left.
The expansion is looped until there is nothing to do in the output(s). There is definitely a possibility of non convergent macros.
Some interesting and very useful observations and comments, which should help with the specifications.
- Rule application has behavioural similarities to lambda application (as in lambda calculus).
- Named collections of rules - are similar to types, can we derive operations on macros based on operations on types? union types? derived types? ....
- The combination of repeated evaluation and multiple targets, essentially introduces namespaces. How could we exploit that? What underlying abstract machine to use to reflect these observations?
- Macro libraries?
parser(s)
The grammar of the parser should be changeable. We need to be able to add, and maybe redefine and remove grammar rules. This will allow for embedding the macros directly into the parsed grammar. The first obvious candidate is probably coming from the domain of combinator parsers. We should strive to minimise the backtracking though.
Ambiguties between rules
It is obvious that we need an ambiguity resolution strategy. There is no simple answer to that. We need to order all matching rules and select the best one. We need to strike a balance between convenience and speed. Obviously shortest match will remove backtracking for most if not all cases. There is always the option to investigate the packrat (or variant of it) strategy as well. There is a need for a measure which incorporates match length, specificity, possibly position or some other order, and as a last resort priority. The problem with fixed priority levels is that they are static, and counter intuitive. We could introduce relative order, that is being able to state something like rule1 is to be considered with a higher priority than rule2.
How to deal with left and right associative rules since it has direct consequences for pattern matching?
A freak idea - since there are quite a few parameters for resolving ambiguities, we could code a learning parser, that is a parser, that learns the proper order of the rules space based on examples and feedback. To keep the things standardised, we will need to simply give standard axioms, maybe axioms for namespaces and/or domains.
You are crazy for attempting
You are crazy for attempting this. Good on you!
You know, you might be
You know, you might be right, but when you have that nagging thing in your top box, you can do nothing but follow