Monday, January 19, 2009

Xtext - new parser backend

So far, Xtext's parsing was based on Antlr 3.0, a very popular Java-based parser generator.
Unfortunately we've had an IP-issue with Antlr. As Xtext is an Eclipse project, all used libraries need to get IP-approved by the EMO. This has been done for the runtime of Antlr 3.0 but not for the generator. The generator is implemented in StringTemplate, Terence Parr's template language. The problem is that StringTemplate itself is implemented in Antlr 2.x, which won't get an approvement, because it is not clear where all it's code orginally comes from.
As Terence won't be able to migrate StringTemplate to Antlr 3.0 before this summer, we had to solve this somehow. :-(

The good news are, that we found a very nice solution:
We're implementing our own parser generator :-)

Actually Sebastian already did so and is still working on things like error recovery and error reporting. The parser backend is based on a packrat parsing algorithm, which means it is scanner-less. We've read all kinds of related papers (Actually, I was surprised how good the quality of those papers are) and played around with prototypes.

The new implementation is three times faster. This does not mean that Antlr is three times slower, but that we've used it in a rather expensive way :-). The replacement of a lexer-based parser with a packrat algorithm also allows for much simpler composition of grammars (a.k.a. grammar modularization). And we can provide nice hooks for error recovery and handling.

We now develop both solutions in parallel, so that the dependency to Antlr's generator is only 'works-with' as oppposed to 'pre-req'. But we hope that we're soon able to remove the Antlr implementation completely.

Stay tuned.