So far, Xtext's parsing was based on Antlr 3.0, a very popular Java-based parser generator.
Unfortunately we've had an IP-issue with Antlr. As Xtext is an Eclipse project, all used libraries need to get IP-approved by the EMO. This has been done for the runtime of Antlr 3.0 but not for the generator. The generator is implemented in StringTemplate, Terence Parr's template language. The problem is that StringTemplate itself is implemented in Antlr 2.x, which won't get an approvement, because it is not clear where all it's code orginally comes from.
As Terence won't be able to migrate StringTemplate to Antlr 3.0 before this summer, we had to solve this somehow. :-(
The good news are, that we found a very nice solution:
We're implementing our own parser generator :-)
Actually Sebastian already did so and is still working on things like error recovery and error reporting. The parser backend is based on a packrat parsing algorithm, which means it is scanner-less. We've read all kinds of related papers (Actually, I was surprised how good the quality of those papers are) and played around with prototypes.
The new implementation is three times faster. This does not mean that Antlr is three times slower, but that we've used it in a rather expensive way :-). The replacement of a lexer-based parser with a packrat algorithm also allows for much simpler composition of grammars (a.k.a. grammar modularization). And we can provide nice hooks for error recovery and handling.
We now develop both solutions in parallel, so that the dependency to Antlr's generator is only 'works-with' as oppposed to 'pre-req'. But we hope that we're soon able to remove the Antlr implementation completely.
Stay tuned.
Are you sure there was no alternative? I know Antlr is used and also LPG. I tend to cringe everytime we try to re-invent the wheel here.
ReplyDeleteWe don't want to use LPG, because it's based on an algorithm (LR), which makes error recovery and error handling complicated. It is also complicated and very limited to mix LPG-based languages. Antlr has been used but won't get IP-approved until the generator has been migrated.
ReplyDeleteUsual clients of Antlr just need the already approved runtime where we also need the generator. That's why there are Eclipse projects which don't have the IP-problems with Antlr we have.
So we wanted a packrat-based parser in order to allow grammar composition and good error handling. Unfortunately the only more or less mature Java implementation we found (rats!) is shipped with a GPL licence.
Anyway, I think even if there was a parser generator with a good algorithm and compatible license, I'ld still prefer having an Xtext specific implementation. Parsing is a central aspect of Xtext and having this under control allows us to do nice things in future :-).
But in general you're absolutely right and of course we should always scan the net for possible reuse before re-inventing the wheel.
"Parsing is a central aspect of Xtext and having this under control allows us to do nice things in future"
ReplyDeleteI totally agree. You shouldn't re-invent the wheel, but if you build your core aspects yourself you have a lot more flexibility. I'm curious to see what improvements we will see in the future!
Did you guys try approaching the developers of Rats! and ask whether they would consider re-licensing it under the EPL so it could be used in xText (and other Eclipse projects in general)?
ReplyDeleteI find it really sad that the only parser generator that is currently 100% kosher for Eclipse projects is written in C.
Could you add an extension-point to allow selection of parser generator and then move forward with both?
ReplyDeleteI ran into the same IP-issue with Gymnast (the tool used to generate Emfatic). I added an extension-point, refactored the original Antlr generator into one extension and then added another for JavaCC. I looked at LPG too but gave up there - like you I found it hard to make it fit.
IMHO having the extensibility here is nice because you can continue to compare benchmarks and even add new PGs later if it makes sense (maybe oslo/MGrammar :-).
@Rafael:
ReplyDeleteNo we didn't ask Robert Grimm (the rats! developer) to change the license. But note, that Antlr 3 is also IP approved at least if you don'T need to hip the generator, which usually is not necessary.
@Chris:
Although we have the hook (not via extension point but via dependency injection) we won't explicitly support switching parser generators. This is because it's pretty hard to adapt all the things Xtext expects from a parser. It has to create a parse tree in the expected form, it has to create the AST in the expected form. It has to be based on a packrat algorithm and must support all the things we have in our grammar language (like grammar mixins, hidden tokens, scanner-less parsing). Although it might be possible to replace our parser with a different one, possibilities are very limited (Oslo's parser wouldn't fit in because it's based on an SLR algorithm).
Rafael,
ReplyDeleteI'm not a lawyer, not even an IP clearance enthusiast ;-) But as I understand the EPL an attempt to make a third party component available under the EPL itself would just move the impossible burden, not remove. The problematic part of the license is that the original author/copyright holder of every line of code must be known and EPL-compatible. I'm sure the EMO legal folks have already asked the Rats! folks for this sort of information, and failed ;-(
Funny: Last week I looked into the Lua programming language. One of the modules that fascinated me is the LPeg pattern-matching library, which implements Parsing Expression Grammars For Lua.
ReplyDeleteThen I looked for java implementations and I found rats! which is GPL and therefore not really interesting. What I really liked is the virtual mashine described in A Text Pattern-Matching Tool based on Parsing Expression Grammars. I thought that's something I'd like to see in Java....
Will the parser be usable outside Xtext?
Hi Michael,
ReplyDeletethanks for the pointers.
The parser's frontend is Xtext's grammar language so you have to interface with it somehow if you want to use the parser generator. Sebastian made the parser itself highly configurable, so it should be reusable for scenarios we haven't thought of yet. However, reusing it outside of Xtext has no priority as for now.
As Terence won't be able to migrate StringTemplate to Antlr 3.0 before this summer, we had to solve this somehow. :-(
ReplyDeleteDoes this mean that your own parser generator will stand as adhoc replacement? Are you guys thinking on reverting to ANTLR if StringTemplate's new version comes out using ANTLRv3.0?
We don't have plans to go back to Antlr. The new implementation is much nicer, but as outlined hasn't fully caught up with everything Antlr can do. Especially the generic error recovery Antlr provides is really nice and relatively complicated to implement with a packrat parser (but we're working on it).
ReplyDeleteSo for now we have both. It's our desire that our own implementation gets better than Antlr in all aspects so that we can remove the Antlr dependency completely.
Would you care? And if yes, why would you care?
I need xtext to parse the follwoing:
ReplyDelete{ javaCode }
the problem, xtext parsing rule:
terminal -> terminal can not work with curly brackets.
any ideas?
I would recommend having a look at ometa, too. Also based on PEG but extends its capabilities. There are some interessting implementations in Squeak, JavaScript, Scheme/Lisp, Python, Ruby, C# and there is an ongoing approach in Boo. Boo is based on ANTLR, too but there are plans to substitute it with ometa.
ReplyDeletehttp://tinlizzie.org/ometa/
I know ometa and have considered using it, or at least reusing some of the interesting parts (e.g. the left recursion algorithm).
ReplyDeleteWhich is the recommended parser for the future? I just downloaded xtext 2.0, and it seems to be recommending Antlr, however, at least 2 years ago, that was meant to be phased out.
ReplyDeleteHas that decision been reversed? Is Antlr now the way to go?
Yes, Antlr is the parser technology used in Xtext.
ReplyDelete