Thursday, September 30, 2010

Xbase - A new programming language?


No!

It's the basis for a plethora of new programming languages and domain-specific languages!

What is Xbase?

Xbase is a partial programming language implemented in Xtext and is meant to be embedded and extended within other programming languages and domain-specific languages (DSL) written in Xtext.

Why Xbase?

Developing textual modeling languages (aka DSLs) has become incredibly easy with Xtext. Structural languages which introduce new coarse-grained concepts, such as services, entities, value objects or statemachines can be developed in minutes. However, software systems do not consist of structure only. At some point a system needs to do something, hence we want to specify some behavior which is usually done using so called expressions. Expressions are the heart of every programming language and are not so easy to get right. That is why most people do not add support for expressions in their DSL, but try to solve this differently.

The most often used workaround is to only define the structural information in the DSL and add behavior in a second step by modifying or extending the generated code. It is not only unpleasant to write, read and maintain closely related information in two different places, on two different levels of abstraction and in two different languages, this also only works for compilers (i.e. code generators) but not for interpreters. (Additionally they are a lot of other reasons why mixing generated and hand written code is problematic, which is not the topic of this blog post.)

But still as of today this is the preferred solution since adding support for expressions (and a corresponding compiler) for your language is hard - even with Xtext.

Actually being able to call out to the host language is one big advantage internal DSLs have over external DSLs. With Xbase it will be possible to explicitly allow more complex programming at certain places within your DSL, while still have full control over the syntax and semantics of your language. And you neither have to reinvent the wheel by implementing a full-blown programming language nor do your language's users have a hard time to understand the expression language, since it is closely related to Java and well specified.

Also the more Xbase-based languages we see the more commonly known it will be.

Main Decisions

We want Xbase to be expressive and convenient to use, but at the same time easy to understand and easy to adapt. Understanding not only means learning how to use it but also understanding the language infrastructure, i.e. the parser, compiler, type checkers, etc. Because people shall be able to reuse and adapt that stuff easily.

The main target audience for Xbase are Java developers. That is why an Xbase expression looks like a Java expression (or statement) at a first glance. This means the most commonly used Java statements and expressions (e.g. string literals, if statement, foreach loop, method invocation, constructor call) are also available as is in Xbase. On the other hand Java is a very complicated language, especially when it comes to the details. After all the spec counts over 600 pages and while it is very precise most of the text deals with exceptional conditions often involving the special handling of built-in types, etc.

Xbase shall be significantly simpler, so we have to make some decisions.

Runs on the JVM

The JVM is a great, popular platform. In order to ship a compiler, interpreter as well as static typing, Xbase needs to bind to some target platform.
Other platforms such as C/C++, ObjectiveC or JavaScript are also very interesting target platforms for Xtext languages, but for now the main focus of Xbase is the JVM. This seems to be natural decision, since Xtext itself runs on the JVM. Also we know a lot about this platform and the community.

Compiles to Java

The compiler will translate to Java instead of byte code directly. This is for the following reasons:

  1. Anybody should be able to integrate the expressions compiler with any Java code generator
  2. The output as well as the implementation of the compiler shall be as readable / understandable as possible
  3. The code can be used with non-JVM platforms like GWT or Android
  4. We want to leverage the optimizations coming with proven Java compilers

Another pragmatic reason is, that while we plan to have a debugger for Xbase based languages, it won't be part of next year's release. Therefore people will have to debug on the Java code level, which wouldn't be possible if we were generating byte code directly.

Interpreter

We also want to ship an interpreter in order to allow interpreted DSLs using Xbase.

Statically Typed

Xbase is statically typed. This means that there is a type checker and also that the compiler will use static type information to do it's job. Most important to users, might be the rich tooling we can and plan to provide based on Xtext and Eclipse in general.

However, it should be possible to remove the type checking phase and change the compiler to do dynamic method invocations, etc.

Full Java Generics

Xbase uses fully-fledged Java generics and doesn't change anything here. While Java Generics are not perfect they have been understood (or at least people think they have ;-)) by a lot of people.
Introducing a different type model would hurt adoption. Under the hood this is backed-up by the JVM-Types we introduces with Xtext 1.0.

No built-in types

While the Jvm-Types support every Java type, Xbase will automatically convert any references to built-in types and array types to their corresponding wrapper types resp. lists.
This means you can use built-in types in your languages if you want to (and you should be able to extend Xbase in a way that it can, too), but you don't have to.
The compiler might use built-in types in the generated Java code, but statically and conceptually everything is a subtype of java.lang.Object (i.e. pure OO).

Closures

The main addition in Xbase is the concept of closures. While it looks like Java will have them one day, the lack of them is a major problem with Java.
Xbase comes with a small runtime library, where interfaces for Functions are part of. Closures in Xbase are just sugar for anonymous classes of one of these Function types.

For instance the following Java expression:

new Function1<String,String>() {
public String apply(String s) {
return s.toUpperCase();
}
}

can be written like this in Xbase:

String s | s.toUpperCase()

Xbase also provides sugar for the types of functions. That is

(String)=>String

is a shorthand for

Function1<String,String>

Type Inference

Type inference is another important feature of any modern statically typed language. Type inference basically means that the compiler doesn't force you to write redundant information about types. In Java for example the type of a local variable needs to be specified although it could be inferred from the initialization expression:

Map<String,Person> namesToPerson = new HashMap<String,Person>();

In Xbase you don't have to write the type signature twice, but can write the following instead:


val namesToPerson =  new HashMap<String,Person>();

Of course namesToPerson would be of type HashMap<..> here. If you want to be explicit, you can add the type information optionally:


val Map<String,Person> namesToPerson =  new HashMap<String,Person>();

Xbase does type inference for type arguments in closures as well. That is the argument types don't need to be specified if they can be inferred from the current context.

Also note, that the typing service of Xbase can be used in your language in order to do type inference (for instance for return types in method signatures).

Operator Overloading

Xtext comes with a fixed set of operators, with a fixed precedence and associativity. The difference to Java is, that those operators are not bound to certain built-operations on built.in types but are just shorthands (or sugar) for certain method invocations.

That is if some type T has a method plus(T2), you can either write

myT.plus(myT2)

or

myT + myT2

This concept is known from Groovy (although it's slightly different there).

Simplicity over Syntactical Flexibility

With operator overloading we could have gone a step further as done in Scala. In Scala the operators aren't fixed keywords but words with certain characteristics (usually starting with a certain letter). That would allow to have operators which are not predefined in the language. However, this would have introduced a couple of additional lexer rules, which had limited the available syntactical space dramatically. This had made extending the language much harder (and even impossible in many cases).

In general we decided to prefer simplicity over syntactic flexibility. This is because with Xbase you already have the largest syntactic freedom. You just create a sublanguage and add or remove anything you want.

Languages like Scala really need to have all this flexibility, because they are designed to add new language features as a library. These special rules about identifiers and operators and other syntactic flexibility like newlines as expression separators (and the situations when this doesn't work) as well as the different ways to invoke functions is what makes Scala syntactically flexible but complicated at the same time.

Xbase is designed to let you easily add new language features on the language level. If you need a certain syntax you can just have it. The base language remains simple.

Everything is an Expression

There's just no good reason to separate between expressions and statements. Although most statements are inherently imperative (i.e. about side effects), there's no reason to have this separation (which is a limitation) built into the language.
Instead in Xbase everything is an expression, that is everything returns something (and has a type at compile-time). This allows to use the typical imperative statement constructs deeply nested like in the following expression:

this.setFoo(if (isFoo) "foo" else "bar")

In Java we have the ternary operator to do branches within expressions. In Xbase you can use the if expression, but you can also have for and while loops, try-catch clauses or even the nice switch expression deeply nested.

Powerful Switch Expression

This is one of the new features we added. I like pattern matching, but think it is way too complex for many people to use and most people to integrate in their language.
Also I like polymorphic dispatching, like we always had in Xpand and use a lot in Xtext.

On the other hand the switch expression in Java is just stupid. It is complex (fall through) and limited (finally switch over strings in Java 7 ?).

So what we do in Xbase is

  1. we remove fall through (first match wins)

  2. we allow to switch over anything (based on equals)

  3. we introduce so called type guards (which automatically applies down casts)

Example:

val p = getMeSomeObject();

switch ( p ) {
Foo case p.isSpecialFoo() : "SpecialFoo";
Foo : "OrdinaryFoo";
Bar : "It's a "+p.barKind()+" bar";
default : "don't know";
}

I hope this is intuitive and readable. You can find the details in the Xbase language specification.

Current state

The development of Xbase has just begun. We have a first draft of a language specification and grammars as well as some infrastructure, but we are still in a very early state.

I hope this post made you interested in Xbase. Feedback is very welcome.

26 comments:

  1. Hello, I want to inform you that "xbase" term is used in software industry http://en.wikipedia.org/wiki/XBase.

    Maybe you should choose another name :).

    ReplyDelete
  2. That was my first thought, too (I programmed in DBase, Clipper and FoxPro in the early nineties), all Xbase tools.

    Ideally, you guys should pick another name. However, the Xbase tool market is dead, and most developers under 30 never heard of it, so it might not be a big deal.

    Anyways, I am pretty sure you googled the name before you picked it and decided to go with it regardless.

    ReplyDelete
  3. > Ideally, you guys should pick another name.
    > However, the Xbase tool market is dead, and most
    > developers under 30 never heard of it, so it might
    > not be a big deal.

    That is our impression as well.

    > Anyways, I am pretty sure you googled the name
    > before you picked it and decided to go with it
    > regardless.

    Actually someone told me, but yes, we know.

    ReplyDelete
  4. Nice, a language spec in (La)TeX! :)

    ReplyDelete
  5. It was very interesting to finally read about the much-rumored xbase. Hopefully it'll be possible to reuse just the functionality you need, e.g. just the grammar, grammar and inference or grammar, inference and interpreter/compiler. You and Itemis have shown you can create relevant and good software, in an open process, so I'm really excited about this.

    After reading through description of the language, I'd like to add two items to the wish list:
    - reuse Scala's shorthand for unary functions, where _ is used as a free variable and implicitly reference to a single argument, e.g. s | s.toUpper() could be written as _.toUpper().
    - look for opportunities for improving support for Ecore programming/scripting (also dynamic Ecore instances), like syntax for creating, initializing and modifying EObjects, operators for copying, comparing, serializing them etc.

    ReplyDelete
  6. I just started reading the Xbase specs and chapter 2.3 "Integer Literals" made me think: If the language limits literals to Integer it would not usable for us, because we need Long.

    I guess this is somehow configurable. It also raises the question of floating point (double) arithmetics...

    Michael

    ReplyDelete
  7. @Meinte: It is actually just translated to LaTeX but written in http://github.com/RvonMassow/xDoc/blob/master/org.eclipse.xtext.xdoc/src/org/eclipse/xtext/xdoc/Xdoc.xtext

    @Hallvard:
    1) I also like the implicit variable for one arg closures, but unfortuntely the syntax is hard to parse (no terminal, explicitly making it a closure). While this could be solved for a single language, it wouldn't be fun for Xbase users to deal with the complicated implementation.

    2) EMF provides an excellent mapping to the Java typesystem by means of its generator. I have developed a language which allows different typesystems (Xpand and Xtend) and have learnt that the compromises you have to deal with are not worth the extra value. In the end there are great mappings to Java for most interesting "typsystems" (e.g. also XSD).

    @Michael:
    We also have discussed whether we should instantiate a Long instead. Shouldn't be a big deal. But note that we explicitly do not have float, double, whatever, because number crunching is not something which we think is worth supporting out of the box (not so many people do that these days).

    But we don't need to, because it will be very simple for clients to add a literal for i.e. BigDecimal and have the typical arithmetic operators work for that type. :-)

    Thanks for your feedback!

    ReplyDelete
  8. Looks like Xbase is coming along nicely! Do you also plan to include a library with Xbase containing some useful collections methods such as the getFirst() and arrayList() examples in section 4.4.3?

    ReplyDelete
  9. @Knut: Yes of course. Regarding collections we are aiming to integrate with google collections (guava) as good as possible, so this should be a thin layer.

    ReplyDelete
  10. Great! As far as I understand it's the missing piece for my future idea.
    Marco LOMBARDO

    ReplyDelete
  11. This language looks really interesting!

    Hope to be able to try that soon, and hopefully provide some feedback/development :)

    is the source for the current development available?

    Lore

    ReplyDelete
  12. @Lore: yes the code is in the git repository the project is org.eclipse.xtext.xbase. We'll show a CSS like language based on Xbase at ESE next week.

    ReplyDelete
  13. Hi Sven,

    nice approach and realy like the whole idea - but it seems that it kind of heading towards some sort of "Scala light" at the moment...

    I see that the tooling behind XText is a big advantage for customer projects. Do you think one could actualy mimic a whole set of clojure syntax with this and get a sort of "DSL specific Subset Editor" for Clojure libraries?

    ReplyDelete
  14. @Martin: I think Matthias Köster is showing an Xtext.based implementation of Clojure in December in Berlin. See http://wiki.eclipse.org/Eclipse_DemoCamps_November_2010/Berlin

    ReplyDelete
  15. So basically we're not creating DSLs from scratch, but rather use Xbase as "superclass" ?

    Is it possible to "remove" Xbase features that we don't want to have in our "subclass"?

    ReplyDelete
  16. I've added (and slightly edited) your blog post to:

    http://wiki.eclipse.org/Xbase

    I hope this is okay...

    ReplyDelete
  17. It is possible but so far a bit hacky. You'd have to override a rule and disable it by specifying a non parseable syntax.

    But we are thinking about a better mechanism.

    ReplyDelete
  18. The problems with Wikis is, that people expect them to be uptodate. This is not the case with a dated blog posted. I'll eventually remove the wiki page if it gets irrelevant. I hope that would be ok :-)

    ReplyDelete
  19. @Sven:

    It will be out-of-date sooner or later. Like the case with Xpand Project Plan wiki (http://wiki.eclipse.org/Xpand/Project_Plan) that I've read has confused a lot of people. So I simply make a few changes to reflect the latest state.

    Let me know when you've pages to update. I've had some fun "fixing" a bunch of Eclipsepedia pages lately. ;-)

    The reason is I think Eclipse, *especially* EMF projects, has a lot of potential and together they solve a big part of a programmer's problems. I'm pretty sure the world will be a better place when programmers avoid reinventing the wheel and reuse/build upon technologies such as DSLs. Which are, of course, excellently supported by Xtext and the rest of the EMF ecosystem. (and the upcoming Xbase!)

    The problem at hand, is if someone comes searching for information, and it's confusing, it won't help adoption.

    ...And more adoption is needed to get more people contributing to Wiki. ;)

    So at least, the initial barrier of entry must be low enough... (I have to say for Xtext especially, it's really low, with Xtext.org providing videos and tutorials and stuff, congrats! Not always the case for other EMF/T projects...)

    ReplyDelete
  20. That sounds really good. We need more people like you in the community.

    Feel free to update any wiki pages about the projects I'm involved in :-)

    ReplyDelete
  21. Is there any tutorial on how to use xBase in a project?

    I know it's not finished, but I (as a newbie the xText world) would like to check it out. Problem is: I have no clue how to do that.

    ReplyDelete
  22. @EECOLOR There is no tutorial yet. There will be documentation when Xbase is officially released (June 2011). Also Xbase so far is far from finished.

    ReplyDelete
  23. How does the work on Xbase relate to the work on typesystems in Xtext/TS?

    What do you think about Xtext/TS?

    ReplyDelete
  24. @Jan it does not related somehow. In Xbase the type system needs to be far more flexible and powerful than what Xtext/TS provides. Also I am personally not a fan of APIs where for every little problem a new abstraction was invented.

    ReplyDelete
  25. Do you have an idea about similarities/differences between xbase and the upcoming Alfstandard?

    ReplyDelete
  26. @Klaas: I don't know the ALF standard. But thanks for the pointer.

    ReplyDelete