I wanted to write a few lines about the decision that Xtext derives the meta model from the grammar.
Of course I agree that it is best to come up with a set of abstractions (i.e. the meta model) first and afterwards define a (or two) concrete syntax based on it.
However you'll need a concrete syntax to define the meta model, don't you. For me the Xtext grammar language is such a concrete syntax enriched with concrete syntax information (leave it out if you don't want to care in the first iteration). And to me it is much more handy and convenient then creating the meta model with en EMF editor or even worse with a graphical editor.
Moreover designing a DSL is no big-up-front work. You can't get all the abstractions you will need first and then define a concrete syntax for it. Instead developing a DSL needs to be done in very short increments, you will add, remove, rename concepts and features as your framework grows. Xtext focusses on a very short turnaround enabled by the concrete/abstract syntax grammar language and the right abstractions.
Unfortunately mixing grammar and meta model definition becomes complicated when it comes to more complex languages.
Therefore I'm thinking about separating the two things in one of the next versions in order to support all languages an LL(*) (Antlr 3) parser can work with.
This may have an impact on the turnarounds, but maybe we can do our best by providing meta model refactorings and the kind.