Thursday, January 08, 2009

Xtext Scopes and EMF Index

There is a new proposal for a so called EMF Index. At ESE I got the impression that a lot of people are looking for such a thing or have already built their own. To make clear what we expect from such a project, I'll try to explain why and how TMF Xtext needs such an "Index".

The main difference between Xtext and Oslo's MGrammar or other parser generators, is that Xtext not only provides abstractions (mostly DSLs) to describe the syntax of a language, but also for implementing other aspects. One is linking. So where other frameworks create a tree, Xtext also takes care of the cross-links, hence creates a graph (a.k.a model).

How does this work?

Let me explain this by example.
Assume you want to parse the following model:
entity Animal
entity Dog extends Animal
That is, two declarations of something we call 'entity' one 'extending' the other. The extend declaration 'extends Animal' cross-links to the actual declaration 'entity Animal'.
So that we're able to write something like this when working on the parsed model later:
myDog.getExtends().getName().equals("Animal")

What do we need to do, to get this working?

First of all, one has to specify the syntax of the language including the syntax for the cross link. With Xtext one not only specifies the syntax but also writes down how a model is created during parsing:

MyModel : (entities+=Entity)*;
Entity : 'entity' name=ID ('extends' extends=[Entity|ID])?;
This would result in an ecore model of the following structure:

EPackage {
EClass MyModel {
containment entities : Entity[]
}
EClass Entity {
name : EString
extends : Entity // the crosslink
}
}
Naturally a parser is only able to create a tree, so parsing an instance of the DSL defined above would result in an unlinked model, which has to be linked in a second phase using the provided ID (which was 'Animal' in the introductory example).

So how do I find an Entity which is 'identifiable' by the text 'Animal'?
By default Xtext assumes that the name of an EObject (if it's EClass has such an EAttribute) is the identifier. All the named elements within the same file are visible (as long as they have a unique name). We also have a very simple import mechanism:
If you have an EObject, containing a string in an EAttribute called 'importURI', Xtext automatically creates an outer scope containing the content of the referred EMF Resource. "Outer scope" what's that?

Scoping
In Xtext scopes (IScope) are nested. Each scope makes EObjects visible by an identifier (String).
Assume we have added the import feature described above:

MyModel :
(imports+=Import)*
(entities+=Entity)*;

Import :
'import' importURI=STRING;

Entity :
'entity' name=ID ('extends' extends=[Entity|ID])?;
... we would be able to have two files:

myModel1.dsl

entity Animal

and otherModel.dsl
 import "myModel1.dsl"
entity Dog extends Animal
The scope used to do the linking in the declaration of entity 'Dog' would have an outer scope containing the definitions from the imported file ('->' means outer):
 scope (elements from otherModel.dsl) -> scope(elements from myModels1.dsl)

If we would add additional import statements, we would get additional outer scopes in the order of declaration:

import "myModel1.dsl"
import "myModel2.dsl"
import "myModel3.dsl"
entity Dog extends Animal
results in

scope(elements from local resource) ->
scope(elements from myModels1.dsl) ->
scope(elements from myModels2.dsl) ->
scope(elements from myModels3.dsl)
So the linker would ask the most inner scope for an element called 'Animal'. If it contains such an element it returns it if not it asks it's outer scope.
This means that an inner scope overlays elements from the outer scope. So it would be ok to have a declaration of 'Animal' in the local file, but the one imported from 'myModel1.dsl' wouldn't be referenceable anymore.

import "myModel1.dsl"
entity Dog extends Animal
entity Animal // this one overlays the definition imported from myModel.dsl
If you don't want to allow overwriting things, you'll have to add constraints, which is of course possible but is a different topic.
Ok, I hope you have an idea of how linking in TMF Xtext basically works.

Although the described default semantics might be sufficient in many cases, sometimes scoping and linking is a bit more sophisticated. We won't need (and currently have) something like an Index, but it might speed things up, if one wouldn't need to load referenced resources while linking but just ask something like an Index, what's in a resource. The Index could provide a normalized EMF URI, which can then be set into a proxy.
Also there are IDE things like "Find Model Element" or code completion for available resources, which would be easy to implement on top of an EMF Index.

Advanced Scoping and Linking
Anyway, if you want to have something more file-system independent like Java's class path, where one imports name spaces instead of actual URIs, you would need some kind of repository (similar to the class path) containing all referenceable elements. This is because it would far two expensive to "scan the world" each time you want to satisfy a link.

In fact I think that leveraging the Java class path is a very good idea, since it is well understood by Xtext users and is well supported in the development phase (Eclipse JDT, or even the OSGi support from PDT) and at runtime. That's why Xtext has a URIConverter introducing a class path scheme for EMF resources. So what we want to do most of the time is to scan the class path for EMF resources and index them.
We would need to index them per container (jar, class-folder, etc.), because the class path is also scoped hierarchically.
Such a hierarchy could look like so:
 classpathScope{stuff from bin/} ->
classpathScope{stuff from foo.jar/} ->
... ->
classpathScope{stuff from JRE System Library}
And of course, we would like to have these global scopes backed up by the EMF Index transparently integrated into our scoping hierarchy. This turns out to be very natural if we look into a final example, showing how we would implement the scoping for Java:
// file contents scope
import static my.Constants.STATIC;

public class ScopeExample { // class body scope
private Object field = null;

private void method(String param) { // method body scope
String localVar = null;
innerBlock: { // block scope
String innerScopeVar = null;
Object field = null;
// ?SCOPE?
}
}
}

The object scope created in the inner block (//?SCOPE?) would look like so:
 blockScope{field,innerScopeVar}->
methodScope{localVar,param}->
classScope{field}->
fileScope{STATIC}-> //the static import
classpathScope{static fields from bin/} -> // (e.g. my.Constants.STATIC)
classpathScope{static fields from foo.jar/} ->
... ->
classpathScope{static fields JRE System Library}
For performance reasons it would be useful to have some kind of database (EMF Index) backing up the class path scopes. Especially during development (modeling) , because it would be necessary to re-index changed models.

EMF Index
So mainly we want to have something which tells us what elements are available in a given 'world'. Such a 'world' like a Java class path includes EObjects (from several EMF resources). It should be possible to define and configure arbitrary implementations of 'worlds' (databases, web, workspace, etc.). Elements contained in a world, need to be selectable using an identifier (unique within a world). It also should be possible to add arbitrary additional information to such entry.

As mentioned, IMHO such an Index is important to track changes during development (i.e. modeling). Also we want to have code completion for globally available elements, look model elements up by name, etc.. At runtime we need to load all the models anyway, so the need for an index is not that important.

This has been a lengthy post (sorry). But if you made it to this point, it would be very helpful to hear what you think about this. Would the scope abstraction work for the languages you have in mind? What do you expect from an EMF Index? Maybe answers to the latter question better go to the EMFT news group :-)

15 comments:

  1. It seems to me that the core problem here is that a model element, e.g. Animal, is being represented simply as a sequence of characters A, N, I etc. While that allows us to use any text editor, who really uses notepad for things like this? If you later change the spelling of Animal in the place where it is defined, that leaves the old spelling of Animal in all the places where it is referenced. The various files need to be updated, often making one small semantic change cause many syntactic changes in many files.

    The root problem is that we are referring to the element by something that might later change: the name "Animal" is handy, but not permanent. Database people solved that problem years ago: a table relating Departments to Personnel doesn't refer to a person's name, but some unique ID. Often such IDs are just internally generated unique numbers, never shown to the user. You choose the person by selecting from a list (possibly found or filtered by typing a name), but what gets stored is the unique internal ID.

    This is also the approach used in modeling tools like MetaEdit+. When you refer to another object in a model, the link is a direct "pointer" to that object. If the object's name changes later, the references show the new name - with no work needed by the modeller or the tool.

    Of course sometimes we deliberately want an indirection: we want to point to something called "US President", and we want the referrer to look up something called "US President" on the fly, possibly giving a different result than before. If the object for Bush changes to "Ex-President", we want instead to pick up the object for Obama, which will now have been renamed to "US President". Cases like this are handled simply by using the string "US President" as the reference, rather than a direct link to a particular object.

    Historically only the graphical tools have been able to do this, but nowadays textual representations in things like Intentional's Domain Workbench and JetBrains' MPS seem to be offering them. Wouldn't it be better for Xtext to move to something like that? Autocomplete needs to find the real objects anyway, so storing the real object as well as its current name shouldn't be a problem. Of course the result isn't a simple text file anymore, but providing it looks like text on screen, and is editable like text, does that matter?

    Or to put it another way: wouldn't the benefits of direct, automatically maintained references outweigh the loss of the pure text format?

    ReplyDelete
  2. > It seems to me that the core problem here is that a model element, e.g. Animal, is
    > being represented simply as a sequence of characters A, N, I etc. While that
    > allows us to use any text editor, who really uses notepad for things like this?

    I know a lot of people using emacs, vi or other cool text editors for their daily programming.

    > If you later change the spelling of Animal in the place where it is defined, that
    > leaves the old spelling of Animal in all the places where it is referenced. The
    > various files need to be updated, often making one small semantic change cause
    > many syntactic changes in many files.

    Actually it's just when you change the (qualified) name at least in most cases.
    Which is not a huge problem to solve with common text (rename refactoring).

    > The root problem is that we are referring to the element by something that might
    > later change: the name "Animal" is handy, but not permanent. Database people
    > solved that problem years ago: a table relating Departments to Personnel doesn't
    > refer to a person's name, but some unique ID. Often such IDs are just internally
    > generated unique numbers, never shown to the user. You choose the person by
    > selecting from a list (possibly found or filtered by typing a name), but what gets
    > stored is the unique internal ID.

    I'm of course aware of this style of storing models. It's just a completely different approach with lots of advantages but also disadvantages over how the programming language people usually do it.

    To cut it short: I haven't seen better Language IDEs than JetBrains or Eclipse JDT, since both are text based (they ultimately store the programm in *.java files), I think it's a decent approach. In contrast tools storing the programs in an object-based (I call it object-based to abstract over XML, RDBMS, or other stuff) manner have always felt very mouse-based and somehow closed.

    I'ld like to store my models in a textual file without any synthetic information (like UUids)

    > This is also the approach used in modeling tools like MetaEdit+. When you refer
    > to another object in a model, the link is a direct "pointer" to that object. If the
    > object's name changes later, the references show the new name - with no work
    > needed by the modeller or the tool.

    Yes, you do so for the graphical models which is very common for graphical modeling tools.
    But how do you store the code generator templates?

    > Historically only the graphical tools have been able to do this, but nowadays
    > textual representations in things like Intentional's Domain Workbench and
    > JetBrains' MPS seem to be offering them. Wouldn't it be better for Xtext to move to
    > something like that?

    For sure it can be done (even with Xtext, since it creates a parser and also a serializer from the grammar).
    But ultimately we're going the text based approach. Which I like much more.
    I want an Eclipse JDT for all of my languages.

    > Autocomplete needs to find the real objects anyway, so storing the real object as
    > well as its current name shouldn't be a problem. Of course the result isn't a simple
    > text file anymore, but providing it looks like text on screen, and is editable like
    > text, does that matter?

    Autocompletion as well as most other services in Xtext already work on the object level. So there wouldn't be a difference whether I use a text-based or object-based storage. 
    That said, also the scoping problem I described in my blog would remain the same, since I want to have proposals and I want to limit what's visible in graphical editors as well, wouldn't I? Actually the EMF Index is not intended to be specific for Xtext or textual syntaxes in general.

    > Or to put it another way: wouldn't the benefits of direct, automatically maintained
    > references outweigh the loss of the pure text format?

    No, because
    1) I consider broken links after renaming the declaration a feature not problem.
    2) and the pure text format is very important, because it integrates with existing project environments so much better (Diff, Merge, etc.) and is usable without tools.

    ReplyDelete
  3. > I want an Eclipse JDT for all of my languages.

    Geez, so do I !

    > It should be possible to define and configure arbitrary implementations of 'worlds' (databases, web, workspace, etc.)

    A triple store !? How about representing RDF concepts using a textual DSL ?

    I don't know exactly how usefull that would be but I like the idea of being able to reference a "world" of concepts using code completion :)

    ReplyDelete
  4. > How about representing RDF concepts using a textual DSL ?

    To be honest, I don't know as I don't completely understand how RDFs work.
    But if you want to make sure it works, you should post it to the newsgroup or even wait for the first drafts of API :-)

    ReplyDelete
  5. Using IDs for object references has serious practical drawbacks. For example, name-based references allow the use of model libraries in a component based way. You can replace libraries by others as long as the "interface" fits. This makes it much easier to scale up the use of models.

    This is a general issue, hooking everything together with ID's results in a monolithic view of the world, which has scalability problems.

    Note that using name based references is very succesfully in use on a large sacale in programming languges like java.

    Managing really large models that are hooked together with IDs is also a problem. Managing large collections of files with name-based refeernces is easier. You can use any proven version / configurayion control system that you use for your soirce code.

    The "problem": of changing names has been solved by using refactoring. Refactoring is also more flexible because the modeler can actively chosse which references need to change or not.

    ReplyDelete
  6. Many languages allow special syntax to reference declarations in other scope specifically (either to disambiguate or because it's required). A suppose the scope API allows for specific implementations to provide these implicit scope members.

    Obvious examples from Java:

    1. "super". I suppose a class would have an outer scope containing its base class. Yet "super" must reference that explicitly.

    2. OuterClass.InnerClass. Think of the example "OuterClass.InnerClass innerObject = outerObject.new InnerClass();".

    3. From within InnerClass I might also write "OuterClass.this".

    So to complete your example I suppose the classScope should also declare "method" and there should be another scope for the members of Object.class.

    Another problem that comes to mind is that there might be multiple methods with the same name (but different parameters) in one scope. But I am sure the design already allows for this :-)

    ReplyDelete
  7. So this importedURI does not work anymore for the current xtext integrated in eclipse? is there an alternative?

    ReplyDelete
  8. @Tjerk: the importURI feature is still available and fully working. It's just that working with the name spaces and an index is often much nicer and allows for more advanced IDE tooling.

    ReplyDelete
  9. I understand your point. I was only asking if thr importedURI feature was still supported... as i cant get it to work with xtext and eclipse... i'm probably doing something wrong as the models dont get imported...

    now i need to use one big model file..

    ReplyDelete
  10. "now i need to use one big model file.."
    no you don't. just go to the TMF Xtext newsgroup/forum at eclipse and provide some details about your problem. I'm sure we can help.

    ReplyDelete
  11. Ah found it via the forum,
    its importURI instead of importedURI,
    only if you call your attribute importURI it will work.

    Whoops :-)

    ReplyDelete
  12. I'm trying to develop a standalone generator having per basis a DSL defined using XText.

    If i try to use the described importURI process it works in the generated plugin for eclipse.

    However it is not working for my standalone app that uses the XText generated parser. References to fields declared on other files aren't being resolved.

    Is there anything else that needs to be done besides changing the MWE2?

    Thanks in advance.

    ReplyDelete
  13. @FreeThinker please ask such questions in the forum / newsgroup at Eclipse and provide more details (i.e. what kind of URIs do you use, how does your mwe2 file look like)

    http://www.eclipse.org/forums/index.php?t=thread&frm_id=27&

    ReplyDelete