Thursday, September 30, 2010

Xbase - A new programming language?


No!

It's the basis for a plethora of new programming languages and domain-specific languages!

What is Xbase?

Xbase is a partial programming language implemented in Xtext and is meant to be embedded and extended within other programming languages and domain-specific languages (DSL) written in Xtext.

Why Xbase?

Developing textual modeling languages (aka DSLs) has become incredibly easy with Xtext. Structural languages which introduce new coarse-grained concepts, such as services, entities, value objects or statemachines can be developed in minutes. However, software systems do not consist of structure only. At some point a system needs to do something, hence we want to specify some behavior which is usually done using so called expressions. Expressions are the heart of every programming language and are not so easy to get right. That is why most people do not add support for expressions in their DSL, but try to solve this differently.

The most often used workaround is to only define the structural information in the DSL and add behavior in a second step by modifying or extending the generated code. It is not only unpleasant to write, read and maintain closely related information in two different places, on two different levels of abstraction and in two different languages, this also only works for compilers (i.e. code generators) but not for interpreters. (Additionally they are a lot of other reasons why mixing generated and hand written code is problematic, which is not the topic of this blog post.)

But still as of today this is the preferred solution since adding support for expressions (and a corresponding compiler) for your language is hard - even with Xtext.

Actually being able to call out to the host language is one big advantage internal DSLs have over external DSLs. With Xbase it will be possible to explicitly allow more complex programming at certain places within your DSL, while still have full control over the syntax and semantics of your language. And you neither have to reinvent the wheel by implementing a full-blown programming language nor do your language's users have a hard time to understand the expression language, since it is closely related to Java and well specified.

Also the more Xbase-based languages we see the more commonly known it will be.

Main Decisions

We want Xbase to be expressive and convenient to use, but at the same time easy to understand and easy to adapt. Understanding not only means learning how to use it but also understanding the language infrastructure, i.e. the parser, compiler, type checkers, etc. Because people shall be able to reuse and adapt that stuff easily.

The main target audience for Xbase are Java developers. That is why an Xbase expression looks like a Java expression (or statement) at a first glance. This means the most commonly used Java statements and expressions (e.g. string literals, if statement, foreach loop, method invocation, constructor call) are also available as is in Xbase. On the other hand Java is a very complicated language, especially when it comes to the details. After all the spec counts over 600 pages and while it is very precise most of the text deals with exceptional conditions often involving the special handling of built-in types, etc.

Xbase shall be significantly simpler, so we have to make some decisions.

Runs on the JVM

The JVM is a great, popular platform. In order to ship a compiler, interpreter as well as static typing, Xbase needs to bind to some target platform.
Other platforms such as C/C++, ObjectiveC or JavaScript are also very interesting target platforms for Xtext languages, but for now the main focus of Xbase is the JVM. This seems to be natural decision, since Xtext itself runs on the JVM. Also we know a lot about this platform and the community.

Compiles to Java

The compiler will translate to Java instead of byte code directly. This is for the following reasons:

  1. Anybody should be able to integrate the expressions compiler with any Java code generator
  2. The output as well as the implementation of the compiler shall be as readable / understandable as possible
  3. The code can be used with non-JVM platforms like GWT or Android
  4. We want to leverage the optimizations coming with proven Java compilers

Another pragmatic reason is, that while we plan to have a debugger for Xbase based languages, it won't be part of next year's release. Therefore people will have to debug on the Java code level, which wouldn't be possible if we were generating byte code directly.

Interpreter

We also want to ship an interpreter in order to allow interpreted DSLs using Xbase.

Statically Typed

Xbase is statically typed. This means that there is a type checker and also that the compiler will use static type information to do it's job. Most important to users, might be the rich tooling we can and plan to provide based on Xtext and Eclipse in general.

However, it should be possible to remove the type checking phase and change the compiler to do dynamic method invocations, etc.

Full Java Generics

Xbase uses fully-fledged Java generics and doesn't change anything here. While Java Generics are not perfect they have been understood (or at least people think they have ;-)) by a lot of people.
Introducing a different type model would hurt adoption. Under the hood this is backed-up by the JVM-Types we introduces with Xtext 1.0.

No built-in types

While the Jvm-Types support every Java type, Xbase will automatically convert any references to built-in types and array types to their corresponding wrapper types resp. lists.
This means you can use built-in types in your languages if you want to (and you should be able to extend Xbase in a way that it can, too), but you don't have to.
The compiler might use built-in types in the generated Java code, but statically and conceptually everything is a subtype of java.lang.Object (i.e. pure OO).

Closures

The main addition in Xbase is the concept of closures. While it looks like Java will have them one day, the lack of them is a major problem with Java.
Xbase comes with a small runtime library, where interfaces for Functions are part of. Closures in Xbase are just sugar for anonymous classes of one of these Function types.

For instance the following Java expression:

new Function1<String,String>() {
public String apply(String s) {
return s.toUpperCase();
}
}

can be written like this in Xbase:

String s | s.toUpperCase()

Xbase also provides sugar for the types of functions. That is

(String)=>String

is a shorthand for

Function1<String,String>

Type Inference

Type inference is another important feature of any modern statically typed language. Type inference basically means that the compiler doesn't force you to write redundant information about types. In Java for example the type of a local variable needs to be specified although it could be inferred from the initialization expression:

Map<String,Person> namesToPerson = new HashMap<String,Person>();

In Xbase you don't have to write the type signature twice, but can write the following instead:


val namesToPerson =  new HashMap<String,Person>();

Of course namesToPerson would be of type HashMap<..> here. If you want to be explicit, you can add the type information optionally:


val Map<String,Person> namesToPerson =  new HashMap<String,Person>();

Xbase does type inference for type arguments in closures as well. That is the argument types don't need to be specified if they can be inferred from the current context.

Also note, that the typing service of Xbase can be used in your language in order to do type inference (for instance for return types in method signatures).

Operator Overloading

Xtext comes with a fixed set of operators, with a fixed precedence and associativity. The difference to Java is, that those operators are not bound to certain built-operations on built.in types but are just shorthands (or sugar) for certain method invocations.

That is if some type T has a method plus(T2), you can either write

myT.plus(myT2)

or

myT + myT2

This concept is known from Groovy (although it's slightly different there).

Simplicity over Syntactical Flexibility

With operator overloading we could have gone a step further as done in Scala. In Scala the operators aren't fixed keywords but words with certain characteristics (usually starting with a certain letter). That would allow to have operators which are not predefined in the language. However, this would have introduced a couple of additional lexer rules, which had limited the available syntactical space dramatically. This had made extending the language much harder (and even impossible in many cases).

In general we decided to prefer simplicity over syntactic flexibility. This is because with Xbase you already have the largest syntactic freedom. You just create a sublanguage and add or remove anything you want.

Languages like Scala really need to have all this flexibility, because they are designed to add new language features as a library. These special rules about identifiers and operators and other syntactic flexibility like newlines as expression separators (and the situations when this doesn't work) as well as the different ways to invoke functions is what makes Scala syntactically flexible but complicated at the same time.

Xbase is designed to let you easily add new language features on the language level. If you need a certain syntax you can just have it. The base language remains simple.

Everything is an Expression

There's just no good reason to separate between expressions and statements. Although most statements are inherently imperative (i.e. about side effects), there's no reason to have this separation (which is a limitation) built into the language.
Instead in Xbase everything is an expression, that is everything returns something (and has a type at compile-time). This allows to use the typical imperative statement constructs deeply nested like in the following expression:

this.setFoo(if (isFoo) "foo" else "bar")

In Java we have the ternary operator to do branches within expressions. In Xbase you can use the if expression, but you can also have for and while loops, try-catch clauses or even the nice switch expression deeply nested.

Powerful Switch Expression

This is one of the new features we added. I like pattern matching, but think it is way too complex for many people to use and most people to integrate in their language.
Also I like polymorphic dispatching, like we always had in Xpand and use a lot in Xtext.

On the other hand the switch expression in Java is just stupid. It is complex (fall through) and limited (finally switch over strings in Java 7 ?).

So what we do in Xbase is

  1. we remove fall through (first match wins)

  2. we allow to switch over anything (based on equals)

  3. we introduce so called type guards (which automatically applies down casts)

Example:

val p = getMeSomeObject();

switch ( p ) {
Foo case p.isSpecialFoo() : "SpecialFoo";
Foo : "OrdinaryFoo";
Bar : "It's a "+p.barKind()+" bar";
default : "don't know";
}

I hope this is intuitive and readable. You can find the details in the Xbase language specification.

Current state

The development of Xbase has just begun. We have a first draft of a language specification and grammars as well as some infrastructure, but we are still in a very early state.

I hope this post made you interested in Xbase. Feedback is very welcome.