Sunday, June 10, 2012

Xtend - The Movies Example

The upcoming release of the new version of Xtend is just a couple of days away (June 27). Although the technical (OSGi) version is 2.3, for me it's really more like a 1.0 release. It now has everything you need to write beautiful Java programs, like the little movies example we have been using in some workshops we did recently.

The movies example is also included in the example project that will be shipped with the Eclipse plug-in and is about reading a file of movie data in and doing some queries on it.

The Data

The movie database is a plain text file (data.csv) with data sets describing movies. Here's an example data set:

Naked Lunch  1991  6.9  16578  Biography  Comedy  Drama  Fantasy

The values are separated by two spaces. The columns are :

  • title
  • year
  • rating
  • numberOfVotes
  • categories (where any number of categories is allowed)

Let's start by declaring a data type Movie reflecting the data set:

@Data class Movie {
  String title
  int year
  double rating
  long numberOfVotes
  Set categories 
}

The @Data annotation will turn this class into a value object, that is the compiler will create

  • a getter-method for each field,
  • a hashCode()/equals() implementation,
  • implementation of Object.toString() and
  • a constructor accepting values for all fields in the declared order.

Parsing The Data

Let's now define another class which reads the text file into a list of movies so that we can do some analysis on the data. We will access the data from within a JUnit test, so simply initializing a field is appropriate:

import java.io.FileReader
import java.util.Set
import static extension com.google.common.io.CharStreams.*

class Movies {

  val movies = new FileReader('data.csv').readLines.map[ line |
    val segments = line.split('  ').iterator
    return new Movie(
      segments.next, 
      Integer::parseInt(segments.next), 
      Double::parseDouble(segments.next), 
      Long::parseLong(segments.next), 
      segments.toSet
    )
  ]
}

The field's type (List) is inferred from the expression on the right hand-side and we want the field to be final, so we declare it as a value using the keyword code val.

The initialization on the right hand side first creates a fresh instance of java.io.FileReader. Then the method readLines() is invoked on it. But if you have a look at FileReader you won't find such a method. It's in fact a static method coming from Google Guava's CharStream and is imported as an extension :

import static extension com.google.common.io.CharStreams.*

CharStream.readLines(Reader) returns a List on which we call another extension method called map. That one is defined in Xtend's runtime and is always imported and therefore automatically available on all lists. The map-method expects a function as the parameter. It invokes that function for each value in the list and returns a list containing the results of the function invocations.

Function objects are created using lambda expression (the code in squared brackets). Within the lambda we process a single line from the text file and turn it into a movie by splitting the string using the separator and calling iterator() on the result. As you might know java.lang.String.split(String) returns a string array (String[]). But as Xtend auto-converts arrays to lists when needed, we can call iterator() on it.

val segments = line.split('  ').iterator

Now we use the iterator to create an instance of Movie:

return new Movie (
  segments.next, 
  Integer::parseInt(segments.next), 
  Double::parseDouble(segments.next), 
  Long::parseLong(segments.next), 
  segments.toSet
)

Answering Some Questions

Now that we've the text file turned into a List, we are ready to do some queries on it. We use JUnit to make the individual expressions executable.

Question 1: How Many Action Movies Are Contained?:

@Test def void numberOfActionMovies() {
  assertEquals(828, 
    movies.filter[categories.contains('Action')].size)
}

It's using the extension method filter to filter the movies. The lambda expression checks whether the current movie's categories contains the entry 'Action'. Note that unlike the lambda we used to turn the lines in the file into movies, we haven't declared a parameter name this time. We could have given the parameter an explicit name 'movie' by writing the following:

assertEquals(828, movies.filter[movie | movie.categories.contains('Action')].size)

But if we leave out the name and the vertical bar the variable is automatically named 'it' which (like this) is an implicit variable. That's why we can either write

assertEquals(828, movies.filter[it.categories.contains('Action')].size)

or just

assertEquals(828, movies.filter[categories.contains('Action')].size)

Lastly we call size on the resulting iterable, which again is an extension method (java.lang.Iterable doesn't define such a method).

Question 2: What's The Year The Best Movie From The 80ies Was Released.

@Test def void yearOfBestMovieFrom80ies() {
  assertEquals(1989, 
    movies.filter[(1980..1989).contains(year)].sortBy[rating].last.year)
}

Here we filter out all movies where the year is not included in the range from 1980 to 1989 (the 80ies). The range-operator (..) again is an extension defined for two ints and returns an instance of org.eclipse.xtext.xbase.lib.IntegerRange.

The resulting iterable then is sorted by the rating of the movies. Since it's sorted in ascending order, we take the last movie from the list and return its year.

We could have sorted descending and take the head of the list as well (note the minus sign):

movies.filter[(1980..1989).contains(year)].sortBy[-rating].head.year

Btw. the calls to movie.year as well as movie.categories in the previous example of course access the corresponding getter methods, which were generated because of the @Data annotation.

Question 3: The Sum Of All Votes Of The Top Two Movies

@Test def void sumOfVotesOfTop2() {
  assertEquals(47_229, 
    movies.sortBy[-rating].take(2).map[numberOfVotes].reduce[a, b| a + b])
}

First the movies are sorted by rating, then we take the best two. Next the list of movies is turned into a list of their numberOfVotes using the map function. Now we have a List which can be reduced to a single Integer by adding the values.