Thursday, January 08, 2009

Xtext Scopes and EMF Index

There is a new proposal for a so called EMF Index. At ESE I got the impression that a lot of people are looking for such a thing or have already built their own. To make clear what we expect from such a project, I'll try to explain why and how TMF Xtext needs such an "Index".

The main difference between Xtext and Oslo's MGrammar or other parser generators, is that Xtext not only provides abstractions (mostly DSLs) to describe the syntax of a language, but also for implementing other aspects. One is linking. So where other frameworks create a tree, Xtext also takes care of the cross-links, hence creates a graph (a.k.a model).

How does this work?

Let me explain this by example.
Assume you want to parse the following model:
entity Animal
entity Dog extends Animal
That is, two declarations of something we call 'entity' one 'extending' the other. The extend declaration 'extends Animal' cross-links to the actual declaration 'entity Animal'.
So that we're able to write something like this when working on the parsed model later:
myDog.getExtends().getName().equals("Animal")

What do we need to do, to get this working?

First of all, one has to specify the syntax of the language including the syntax for the cross link. With Xtext one not only specifies the syntax but also writes down how a model is created during parsing:

MyModel : (entities+=Entity)*;
Entity : 'entity' name=ID ('extends' extends=[Entity|ID])?;
This would result in an ecore model of the following structure:

EPackage {
EClass MyModel {
containment entities : Entity[]
}
EClass Entity {
name : EString
extends : Entity // the crosslink
}
}
Naturally a parser is only able to create a tree, so parsing an instance of the DSL defined above would result in an unlinked model, which has to be linked in a second phase using the provided ID (which was 'Animal' in the introductory example).

So how do I find an Entity which is 'identifiable' by the text 'Animal'?
By default Xtext assumes that the name of an EObject (if it's EClass has such an EAttribute) is the identifier. All the named elements within the same file are visible (as long as they have a unique name). We also have a very simple import mechanism:
If you have an EObject, containing a string in an EAttribute called 'importURI', Xtext automatically creates an outer scope containing the content of the referred EMF Resource. "Outer scope" what's that?

Scoping
In Xtext scopes (IScope) are nested. Each scope makes EObjects visible by an identifier (String).
Assume we have added the import feature described above:

MyModel :
(imports+=Import)*
(entities+=Entity)*;

Import :
'import' importURI=STRING;

Entity :
'entity' name=ID ('extends' extends=[Entity|ID])?;
... we would be able to have two files:

myModel1.dsl

entity Animal

and otherModel.dsl
 import "myModel1.dsl"
entity Dog extends Animal
The scope used to do the linking in the declaration of entity 'Dog' would have an outer scope containing the definitions from the imported file ('->' means outer):
 scope (elements from otherModel.dsl) -> scope(elements from myModels1.dsl)

If we would add additional import statements, we would get additional outer scopes in the order of declaration:

import "myModel1.dsl"
import "myModel2.dsl"
import "myModel3.dsl"
entity Dog extends Animal
results in

scope(elements from local resource) ->
scope(elements from myModels1.dsl) ->
scope(elements from myModels2.dsl) ->
scope(elements from myModels3.dsl)
So the linker would ask the most inner scope for an element called 'Animal'. If it contains such an element it returns it if not it asks it's outer scope.
This means that an inner scope overlays elements from the outer scope. So it would be ok to have a declaration of 'Animal' in the local file, but the one imported from 'myModel1.dsl' wouldn't be referenceable anymore.

import "myModel1.dsl"
entity Dog extends Animal
entity Animal // this one overlays the definition imported from myModel.dsl
If you don't want to allow overwriting things, you'll have to add constraints, which is of course possible but is a different topic.
Ok, I hope you have an idea of how linking in TMF Xtext basically works.

Although the described default semantics might be sufficient in many cases, sometimes scoping and linking is a bit more sophisticated. We won't need (and currently have) something like an Index, but it might speed things up, if one wouldn't need to load referenced resources while linking but just ask something like an Index, what's in a resource. The Index could provide a normalized EMF URI, which can then be set into a proxy.
Also there are IDE things like "Find Model Element" or code completion for available resources, which would be easy to implement on top of an EMF Index.

Advanced Scoping and Linking
Anyway, if you want to have something more file-system independent like Java's class path, where one imports name spaces instead of actual URIs, you would need some kind of repository (similar to the class path) containing all referenceable elements. This is because it would far two expensive to "scan the world" each time you want to satisfy a link.

In fact I think that leveraging the Java class path is a very good idea, since it is well understood by Xtext users and is well supported in the development phase (Eclipse JDT, or even the OSGi support from PDT) and at runtime. That's why Xtext has a URIConverter introducing a class path scheme for EMF resources. So what we want to do most of the time is to scan the class path for EMF resources and index them.
We would need to index them per container (jar, class-folder, etc.), because the class path is also scoped hierarchically.
Such a hierarchy could look like so:
 classpathScope{stuff from bin/} ->
classpathScope{stuff from foo.jar/} ->
... ->
classpathScope{stuff from JRE System Library}
And of course, we would like to have these global scopes backed up by the EMF Index transparently integrated into our scoping hierarchy. This turns out to be very natural if we look into a final example, showing how we would implement the scoping for Java:
// file contents scope
import static my.Constants.STATIC;

public class ScopeExample { // class body scope
private Object field = null;

private void method(String param) { // method body scope
String localVar = null;
innerBlock: { // block scope
String innerScopeVar = null;
Object field = null;
// ?SCOPE?
}
}
}

The object scope created in the inner block (//?SCOPE?) would look like so:
 blockScope{field,innerScopeVar}->
methodScope{localVar,param}->
classScope{field}->
fileScope{STATIC}-> //the static import
classpathScope{static fields from bin/} -> // (e.g. my.Constants.STATIC)
classpathScope{static fields from foo.jar/} ->
... ->
classpathScope{static fields JRE System Library}
For performance reasons it would be useful to have some kind of database (EMF Index) backing up the class path scopes. Especially during development (modeling) , because it would be necessary to re-index changed models.

EMF Index
So mainly we want to have something which tells us what elements are available in a given 'world'. Such a 'world' like a Java class path includes EObjects (from several EMF resources). It should be possible to define and configure arbitrary implementations of 'worlds' (databases, web, workspace, etc.). Elements contained in a world, need to be selectable using an identifier (unique within a world). It also should be possible to add arbitrary additional information to such entry.

As mentioned, IMHO such an Index is important to track changes during development (i.e. modeling). Also we want to have code completion for globally available elements, look model elements up by name, etc.. At runtime we need to load all the models anyway, so the need for an index is not that important.

This has been a lengthy post (sorry). But if you made it to this point, it would be very helpful to hear what you think about this. Would the scope abstraction work for the languages you have in mind? What do you expect from an EMF Index? Maybe answers to the latter question better go to the EMFT news group :-)