Tuesday, November 20, 2007

Definition of the term "Language Workbench"

I just had another look at Martin Fowler's article about language workbenches. I read it when it first came out, but didn't remember what the definition of the term "Language Workbench" was like.

Here is what the article states:
  • Users can freely define new languages which are fully integrated with each other.
  • The primary source of information is a persistent abstract representation.
  • Language designers define a DSL in three main parts: schema, editor(s), and generator(s).
  • Language users manipulate a DSL through a projectional editor.
  • A language workbench can persist incomplete or contradictory information in its abstract representation.

I think this point is an implementation detail:
  • The primary source of information is a persistent abstract representation.

Why should the source be the AST? What is the benefit from a user's perspective. Shouldn't this be transparent anyway?
I don't want to "feel" like editing the AST, because that's why there are concrete syntaxes, they are closer to what I am thinking about.
It's possible to store the information in a concrete syntax (e.g. a Java File) but view and edit it using other representations (e.g. outline views, call hierarchies,refactoring actions etc.).
Eclipse JDT for example has the complete AST in memory to help the user while editing and exploring the code. They don't need to store the Java files in an XML or a database. So if you know how to define a projection on an AST (like Intentional does), you could do this against a temporary, in-memory AST as well.

IMHO storing programs using the abstract form is a bad idea.
UML-Tool users know why:
You can't use existing text editing tools (diff, editors, etc.). You are always bound to the "Language Workbench". And you have to come up with your own version control system.
People who've used graphical DSL-frameworks like GMF know that in addition it is much harder to migrate your code/model to newer versions of your DSL, because you've to do that using the syntax of the stored AST (ugly XML at it's best). (Of course, Intentional may have developed a special "migration tool" for this as well ;-))

So I 'ld like to change the requirements like so:
  • Users can freely define new languages which are fully integrated with each other.
  • Language designers define a DSL in the following parts: abstract syntax, concrete syntax(s), tooling (editor(s), view(s), wizard(s)), and generator(s) or interpreter(s).
  • Language users can manipulate a DSL using the defined tooling, which can be different editors, actions or wizards.
  • A language workbench can persist incomplete or contradictory information.


  1. Thanks for including interpeters as a part of the DSL. I'm strongly biased towards generic interpretation of models in favor of dull, repetitive, generated code, so I'm glad someone finally mentions that :)

    But my main point is this: Using the AST as the persistent model representation may be an implementation detail, all right. But not using it will require the DSL designer to define a concrete syntax that is a bijective, full projection of the AST (like Java source files are for the Java AST). This must always represent the full content of the model, even if you modify the model in some other projections (views) of the model. In other words: Storing a bijective projection of the AST is effectively the same as storing the AST itself (whatever "storing the AST itself" means technically).

    So you can consider XMI a concrete, textual notation for a given abstract syntax, albeit - and there I agree with you - one that can be very ugly to handle in practice. How can we get rid of it? Maybe provide an EMF ResourceImpl that reads and writes the concrete, human readable syntax instead of XMI.

  2. Yes, I think you are right, that writing a bidirectional mapping may be a bit more complicated. But on the other hand you don't have the mentioned problems regarding versioning, migration, etc..

    Maybe a human readable but generic syntax for storing ASTs would be a compromise (like HUTN - http://www.omg.org/technology/documents/formal/hutn.htm) .

    Anyway it shouldn't be listed in the requirements how an IDE stores the code internally, the IDE just should make sure that the user's always able to work with the code using a concrete syntax/a projection.