sven efftinge's blog: April 2006

Friday, April 21, 2006

Textual DSL Framework "xText"

As you might know (at least if you've read some of my previous posts) model driven software development is one of my special subjects. On the other hand I must say that I don't like describing things using UML (or other graphical languages). Hmmm... wait a minute. Isn't "model" driven all about models? And aren't models usually represented in a graphical manner?

Yes, most of the time. But describing things with boxes and lines is not always the best idea (to be honest, I think it's a bad idea in most cases). Graphical notations are great for providing abstract model views or sketching ideas on the whiteboard, but using them for programming purposes (That's what modelling in the sense of MDSD actually is) is very inconvenient (more or less depending on the tool you use). I think all the UML stuff is a hangover from the old CASE tool days, and of course the OMG needed to put some value into it's MDA "standard".
If you have a real DSL (where the domain is the customer's domain) you should use the syntax your customer prefers of course. Most of the time the DSL is software centric, so you should have a syntax your domain expert (programmer in this case) is used to. This might be UML, but in my case it is not.

To cut a long story short I needed good and simple tooling for textual DSL design.

So I developed a framework called 'xText'. It's based on openArchitectureWare 4.1 (not yet released), EMF and Antlr. With xText one can desribe textual DSLs using an EBNF-like grammar language. The framework generates an EMF-based AST-Metamodel, a corresponding parser and an eclipse text editor plugin from that.

The generated parser instantiates dynamic EMF-models (AST) and the editor provides, syntax highlighting, an outline view and syntax checking as well as semantically checking (based on an additional Check file).

Here is an example of a grammar for a textual language for state machines(shown in the boostrapped grammar editor):

Here is a CD-Player described using the textual DSL (represented in the generated editor):

Grammar

The Grammar language consists of two core abstractions: Rules and Tokens.

Tokens
For now there are the following token types available:
- keyword or symbol (e.g. "state", "{")
- ID (an identifier)
- STRING (a string)

Rules
The main concept are rules. Let's have a look by example:


State :
   "state" name=ID "{"
      (entryActions+=Action)*
      (transitions+=Transition)*
      (exitActions+=Action)*
   "}";

Each rule has a name (State). This is by convention the name of the corresponding AST type, too.
In our example a state starts with the keyword "state" followed by an identifier (i.e. ID) which is assigned to the property 'name' of the AST type.
Then an opening curly bracket (i.e. "{") is expected.
Next up one or more Actions (described in it's own rule) are assigned to the reference 'entryActions'. The '+=' operator specifies that 'entryActions' is a list, and the Action should be added to it.
Then one or more Transitions are added to the 'transitions' reference, before the following actions are added to the 'exitActions' reference.
The description of a state is terminated using the closing curly bracket ("}").

The AST type 'State' needs to have:
- a property 'name': String
- a reference 'entryActions': List
- a reference 'transitions' : List
- a reference 'exitActions' : List

Note that the AST metamodel can be derived from the grammar.

Abstract rules
Another rule type are abstract rules.


Abstract AbstractState :
   State |
   CompositeState;

An abstract rule (preceeded by the Abstract keyword) points to an abstract AST type (AbstractState). The body of the rule consists of alternatives (State and CompositeState) to other rules. The AST types of the called rules (State and CompositeState) must be compatible with this rule's abstract AST type (AbstractState).
If the metamodel is automatically generated xText automatically creates a corresponding type hierarchy. Additionally xText 'normalizes' the types (i.e. moves properties contained in all subtypes to the abstract super type).
The derived metamodel of the statemachine example shows how all the general features for State and CompositeState have been moved to there common super type AbstractState:

Note that the generation of the AST metamodel is optional! You could design it by hand, if you want to.

String rules
String rules can be used, to describe more complex strings. There is no string rule in our statemachine example so this is how the rule for a Java-like fully qualified name (e.g. my.namespace.Type) looks like:


String fqn :
   ID ("." ID)*;

For string rules the tokens are simply concatenated. That's all.

Semantically checking

The editor (and the parser) automatically check whether your description is syntactically correct. If you want to have additional semantically constraints checked (and you should :-)) you just have to write a corresponding Check file.

The screenshot with the CD-Player example shows an error based on this check:

Checks are evaluated by the editor (in the save cycle) and by the generated parser.

What's next

For now the framework is just in CVS (openArchitectureWare on sourceforge). I'm going to get some more experience with it in the next weeks, so I can remove / add / clean up some concepts.
If you have some ideas or thoughts about it, let me know!
Feedback is highly apreciated!

Saturday, April 15, 2006

Model2Model transformation with Xtend

Xtend is a language contained in the new openArchitectureWare 4.0 release. It is normally used to define extensions on your metamodel types.
An extension looks like the following:


attributes(Entity this) :
   features.typeSelect(Attribute);

One can use those extensions like static functions or in a member-style syntax:


myEntity.attributes() == attributes(myentity)

I've thought about bringing some new features to Xtend so that it becomes a useful transformation language.

The first one is the keyword 'cache'. The concept is borrowed from Arno's wombat language and means the following:
For each set of parameters the expression will only be evaluated the first time and return always the same! result.

Elements contained in a model are usually referenced multiple times. Consider the following model structure


   P
  / \ C1  C2
  \ /
   R

A package P contains two classes C1 and C2. C1 contains a reference R of type C2 (P references C2).
We could write the following extensions in order to transform an Ecore (EMF) model to our metamodel (Package, Class, Reference).


toPackage(EPackage x) :
  let p = new Package :
      p.classifiers.addAll(c.eClassifiers.toClass()) ->
      p;

toClass(EClass x) :
  let c = new Class :
      c.attributes.addAll(c.eReferences.toReference()) ->
      c;

toReference(EReference x) :
  let r = new Reference :
      r.setType(c.eType.toClass()) ->
      r;

For an ecore model of the structure from above, the result would be:


   P
  / \ C1  C2
 | 
 R - C2

What happend? The C2 class was created 2 times (one time for the package containment and another time for the Reference's reference)
We can solve the problem by adding the 'cached' keyword to the second extension:


cached toClass(EClass x) :
  let c = new Class :
      c.attributes.addAll(c.eAttributes.toAttribute()) ->
      c;

The process goes like this:
- start create P
- start create C1
- start create R
- start create C2
- end & cache C2
- end R
- end C1
- start get cached C2
- end P

So this works very well. We will get the intended structure.
But what about circular dependencies?
For instance, C2 could contain a Reference R2 of type C1 (bidirectional references):

The transformation would occure like this:
- start create P
- start create C1
- start create R1 (references C2)
- start create C2
- start create R2 (references C1)
- start create C1 ... OOPS!

C1 is already in creation and will not complete until the stack is reduced. Deadlock!
The problem is that the cache caches the return value, but C1 was not returned so far, because it is still in construction.

The solution: create extensions
Today I added so called "create extensions" to Xtend.
The syntax is as follows:


create Package p toPackage(EPackage x) :
   p.classifiers.addAll(c.eClassifiers.toClass());

create Class c toClass(EClass x) :
   c.attributes.addAll(c.eReferences.toReference());

create Reference r toReference(EReference x) :
   r.setType(c.eType.toClass());

This is not only a shorter syntax but it also has the needed semantics:
The created model element will be added to the cache before evaluation of the body. The return value is always the created element.
I know, this is no functional style, because the contained expression is only useful if it has side effects (i.e. assigning stuff to the new model element). So we may should have an imperative style here (object initialization is inherently imperative, isn't it?).
But I didn't want to complicate things by adding another syntax, we have the chain expression evaluating expressions in a sequential manner.
So, just think of the arrows '->' beeing statement terminators (i.e. ';').

The workflow configuration of the Xtend component would look like this:


<component class="oaw.xtend.XtendComponent">
   <metaModel class="oaw.type.emf.EmfMetaModel">
      <metaModelFile value="mymytemodel.ecore"/>
   </metamodel>
   <metaModel class="oaw.type.emf.EmfMetaModel">
      <metaModelPackage value="org.eclipse.emf.ecore.EcorePackage"/>
   </metaModel>
   <invoke value="oaw::tdslg::Tdsl2Ecore.toEPackage(tdslFile)"/>
   <outputSlot value="ecoreModel"/>
</component>

Note that this stuff is not contained in oAW 4.0, but will be available in oAW 4.1.

Friday, April 07, 2006

Inferred Types (Kontextsensitiv)

Bei statisch getypten Sprachen, ist die Syntax ja oft durch Typinformationen aufgebläht.
In Java schreibt man z.B.


public String sayHello(String name) {
   String hello = "Hello ";
   return hello+name;
}

In C# gibt es neuerdings (3.0) sog. inferred Types (abgeleitete Typen). Da sieht obiges Beispiel dann so aus:


public String sayHello(String name) {
   var hello = "Hello ";
   return hello+name;
}

Hilft nicht soo viel, oder?
(C# 3.0 macht übrigens in anderen Bereichen (Lambda Expressions, Anonymous Types) wirklich coole type inference, dazu aber ein anderes mal mehr)
Dem Compiler würde eigentlich folgendes reichen, um die selben Informationen zu bekommen:


public sayHello(String name) {
   hello = "Hello ";
   return hello+name;
}

Die Typinformationen könnten aus den Ausdrücken abgeleitet werden:
- Ein Stringliteral ("Hello ") ist vom typ String
- eine String concatenation (hello+name) ist vom typ String

Die openArchitectureWare4 Sprachen (Xpand, Extend, Check) sind ebenfalls statisch getypt.
Bei der Entwicklung dieser Sprachen hab ich aber versucht eine möglichst wenig geschwätzige Syntax zu finden.
Obiges Beispiel als Extension (das sind im Prinzip statische Funktionen) sähe z.B. so aus:


sayHello(String name) : 
   "Hello "+name;

Der Rückgabewert wird aus der Expression abgeleitet. Das ist erstmal nichts ungewöhnliches.
Extend geht hier aber noch einen Schritt weiter, und zwar wird der Typ kontextsensitiv abgeleitet.
Beispiel:


singletonList(Object o) : 
   {o};

Es wird eine Liste erzeugt, die das übergebene Objekt enthält und zurückgegeben.
Der Typ wäre hier normalerweise List[Object] (in Java Syntax: List<Object>).
Tatsächlich hängt es aber eben vom tatsächlichen Parametertyp ab:


doStuff() : 
   singletonList("Test"); // ReturnTyp ist List[String]

doStuff1() : 
   singletonList(4711); // ReturnTyp ist List[Integer]

doStuff2() : 
   singletonList(true); // ReturnTyp ist List[Boolean]

Das ist wirklich sehr nützlich. Ein weiters Beispiel:


slice(List l, Integer start,Integer end) :
   l.select(e|l.indexOf(e)>=start && l.indexOf(e)<=end);

Mit der obigen Extension kann ich subListen bekommen, und verliere nicht die statische Typinformationen:


 {1,2,3}.slice(1,3) // statischer returnType ist List[Integer]
 {"1","2","3"}.slice(1,3) // statischer returnType ist List[String]

Das Problem könnte übrigen auch mit Generics gelöst werden.
In den oAW Expressions sind aber nur Collections (sind first-class Konzepte der Sprache) mit Typen parametrisierbar.

Wednesday, April 05, 2006

Domänenmodelle

Das Domänenmodell ist das Herzstück vieler Softwaresysteme.
Die gängigen Kernabstraktionen sind:
- Entität
- Datentyp
- Enumeration
- Attribut
- Referenzen

Komischerweise werden Domänenmodelle in den seltesten Fällen wirklich mit diesen Mitteln beschrieben. In EJB 3.0 werden sie z.B. in Java mit Klassen, Feldern, Methoden und all den anderen Konzepten beschrieben:

@Entity
@Name("user")
@Table(name="users")                                                                    
public class User implements Serializable {
  private static final long serialVersionUID = 1881413500711441951L;
 
  private String username;                                                             
  private String password;
  private String name;

  public User(String name, String password, String username) {
     this.name = name;
     this.password = password;
     this.username = username;
  }

  public User() {}                                                                     

  public String getPassword() {
     return password;
  }

  public void setPassword(String password) {
     this.password = password;
  }

  @NotNull
  public String getName() {
     return name;
  }

  public void setName(String name) {
     this.name = name;
  }

  public String getUsername() {
     return username;
  }

  public void setUsername(String username) {
     this.username = username;
  }

}

Bei Ruby on Rails werden Domänenmodelle mit einer Mischung aus DDL (aus dem SQL Standard) und Ruby definiert :


CREATE TABLE 'users' (
  'username' varchar(20),
  'password' varchar(20),
  'name' varchar(20) NOT NULL
)

class User < ActiveRecord::Base

end

Das ist schon ein bischen kompakter, sind aber leider zwei Artefakte.
Ausserdem müssen z.B. Beziehungen zwischen zwei Domänenobjekten in der Ruby Klasse beschrieben werden. Der Foreignkey constraint aber zusätzlich in der Datenbank hinterlegt werden (über DDL). Eine Verletzung des DRY-Prinzips.

Warum aber unbedingt irgendeine Sprache wiederverwenden? Wir können doch eine eigene definieren. Wie wärs z.B. so:

entity User {
  String username;                                                          
  String password;
  notNull String name;
}

Daraus lässt sich z.B. die obige EJB3.0 Entity ableiten. Besser noch: Ich generiere mir eine Javaklasse, die alle möglichen "Features" enthält, die für (meine) Domänenobjekte interessant sind.
Um nur einige zu nennen (getter und setter sind klar):
- gemanagete Referenzen
- Observer pattern
- containment Referenzen
- Treewalker / Visitor
- spezieller Reflectionlayer (der die Konzepte Entity, Attribute, Reference,... kennt)

Gute Idee? Hier geht's lang -> http://www.openarchitectureware.org

Was macht ein gutes Framework aus?

Die richtige Abstraktion
Ein Framework ist für mich nur brauchbar, wenn die Abstraktionen "in meine Richtung" gehen. Beispielsweise kommt für mich z.B. Hibernate natürlich nur in Betracht, wenn ich Objektdaten in einer relationalen Datenbank speichern will.

Der richtige Abstraktionsgrad
Wenn Frameworks zu viel abstrahieren, sind sie nicht flexibel genug (CASE Problematik). Wird zu wenig abstrahiert, hab ich noch zu viel zu tun, weil ich den Rest selbst abstrahieren muss.

Don't Repeat Yourself
Ich "füttere" ein Framework ja sozusagen mit Informationen, in dem ich die Konfigurationssprachen und APIs verwende, mit anderen Worten in dem ich das Programmiermodell verwende.
Hier haben fast alle Frameworks Probleme. Es ist oft relativ schwierig ein Programmiermodel zu schaffen, dass mich nicht dazu zwingt Informationen zu wiederholen. Jedenfalls wenn man das API mit einer bestehenden Sprache (z.B. Java oder Ruby) definiert. Dynamisch getypte Sprachen (z.B. Python, Smalltalk, Ruby, Lisp) haben hier aber eindeutig ganz erhebliche Vorteile gegnüber statisch getypter Sprachen (C#, Java). Aber selbst das umjubelte Ruby on Rails definiert z.B. das domänenmodell noch nicht annähernd so kompakt, wie ich es mir wünsche.

Configuration/Programming by Exception
Hierbei geht es darum, die am häufigsten vorhandenen Sachverhalte als default anzunehmen, so dass nur in Ausnahmefällen (Exception) eine explizite Konfiguration angegeben werden muss.
Java ist da z.B. ein schlechtes Vorbild:
Die Access Modifier (public, private, protected, default), sind für Klassen member so definiert, dass die Modifier, die am häufigsten verwendet werden (public, private), immer angegeben werden müssen. Der Default (ohne keyword) wird hingegen sehr selten benötigt.

Configuration/Programming by Convention
Mit 'Programming by Convention' ist die Verwendung von Konventionen statt der Beschreibung expliziter Informationen gemeint.
Beispielsweise gibt es bei Rails die Konvention, dass der Name des domain objects (DO) der Singular des Namens der Datenbanktabelle ist.
Somit muss bei der Programmierung des DO nicht explizit angegeben werden, auf welche Datenbanktabelle es gemapped ist. Diese Information kann aus dem Namen abgeleitet werden. In Ausnahmefällen (Programming by Exception) kann der Name natürlich auch explizit angegeben werden.

Neben der Abstraktion und dem Programmiermodell sind natürlich noch andere Dinge wie z.B. klare, orthogonale Konzepte und hohe Testabdeckung wichtig.

sven efftinge's blog