Wednesday, December 21, 2011

Groovy, Scala, Java, Xtend - an UPDATED stupid comparison

This is an update to yesterday's post, based on feedback I got so far:

Measurement

Henning suggested that the number of iterations is too small:

10000 iterations are probably not enough for a comparison: If I go from 10k to 100k iterations, the Scala version needs 5x time -- the Java version only 2x.

Indeed. Here's what I got when using 1M iterations:

  • Java (the pretty version):
    Took : 245 ms, number of elements : 3000000
  • Xtend :
    Took : 278 ms, number of elements : 3000000
  • Xtend (using switch - see below) :
    Took : 298 ms, number of elements : 3000000
  • Scala (using for comprehensions) :
    Took : 5140 ms, number of elements : 3000000
  • Scala (using while loops) :
    Took : 931 ms, number of elements : 3000000
  • Scala (using while loops and java.util.collections - see below) :
    Took : 315 ms, number of elements : 3000000
  • Groovy (using << operator - see below):
    Took : 1683 ms, number of elements : 3000000

Groovy

Thorsten and Jochen pointed out that I used the wrong operator for adding new elements to a list. I should use '<<' instead of '+=' since the latter would create a new list for the single elements before adding them. I've fixed this and the Groovy program is much faster now.

Jochen also pointed out that the comparison is unfair since, because of the dynamic dispatch Groovy does heavy caching assuming that a method is invoked for the same type of argument most of the time.

My initial idea was to do a comparison where Groovy has a strength (i.e. dynamic dispatch). After all the other examples also do dispatching at runtime. The Java and Scala version don't call another method, but since Xtend does so and is as fast as Java I'd say this extra call doesn't change much.

Jochen also states:

Imagine a class with those three foo methods written in Java. Then in Groovy you can still run your test. I dare to say that in Xtend you cannot make this work without a dispatcher method somewhere - and your dispatch keyword is implying that - thus I assume you cannot make this then work using a Java class.


I think this is actually a big problem with Groovy. Although it's syntax might look a bit like Java it behaves totally different.
In contrast Xtend binds statically just like Java and in fact a multi method in Xtend behaves always the same no matter if you call it from Java or Xtend.

He further says

But let us not stop here... let us assume that you write the base class in Java, where it knows foo(String) foo(Object) and foo(Boolean). Using this class the Integer variant would call the foo(Object) method in your code. Now extend that class in Java with a foo(Integer) method and Groovy would no longer call foo(Object) but foo(Integer).
I doubt you can easily simulate that case.


I can indeed not do that in Java. But I can do that with Xtend and it still behaves as expected no matter form what language you'd call that code.

Scala

Jan said:

I rewrote your little program to use while loops and the java utils collections and the scala version was as fast as java - but nearly as ugly to :-D

Here's the updated Scala version with while and java.util.collections, which is now almost as fast as the Java version:
object Scala extends Application {
  val absoluteResult = new ArrayList[Any]()
  val before = System.currentTimeMillis()
  var i =0
  while (i<1000000) {
    i = i+1
    val result = List("foo", 23, true).iterator
    while (result.hasNext) {
      absoluteResult.add(foo(result.next))
    }
  }
  println("Took : "+(System.currentTimeMillis() - before)
      +" ms, number of elements : "+absoluteResult.size)
  
  def foo(obj : Any) =
    obj match {
          case _:String => "String"
          case _:Boolean => "Boolean"
          case _:Integer => "Integer"
          case _ => throw new IllegalArgumentException()
    }
}

Clojure

Stefan did a nice little example in Clojure. It doesn't use any mutable collections but uses a pure functional style:

(defn foo [x]
(cond
(number? x) "Number"
(string? x) "String"
(= (class x) Boolean) "Boolean"))

(time (println (count 
(map foo 
(apply concat 
(repeat 10000 ["foo" 23 true]))))))
On his machine the Java example takes '31ms' and the Clojure program prints

Prints:
30000
"Elapsed time: 194.046477 msecs"

Xtend

An alternative way to do the dispatch in Xtend is using type guards in switch,
which doesn't make a huge difference performance-wise :

class XtendWithSwitch {
  def static void main(String[] args) {
    val absoluteResult = newArrayList()
    val before = System::currentTimeMillis()
    for (times : 0..1000000) {
      val result = newArrayList('foo', 23, true)
      for (y : result) {
        absoluteResult += foo(y)
      }
    }
    println("Took : "+(System::currentTimeMillis() as int - before)
      +" ms, number of elements : "+absoluteResult.size)
  }

  def static foo(Comparable comp) {
    switch(comp) {
      String : 'String'
      Boolean : 'Boolean'
      Integer : 'Integer'
      default : throw new IllegalArgumentException()
    }  
  }
}

Java


Finally based on Henning's suggestion I beautified the Java version a bit using Guava and static imports:

public static void main(String[] args) {
  List<Object> absoluteResult = newArrayList();
  long before = System.currentTimeMillis();
  for (int i=0; i < 1000000; i++) {
    for (Object y : newArrayList("foo", 23, true)) {
      absoluteResult.add(foo(y));
    }
  }
  System.out.println("Took : "+(System.currentTimeMillis() - before)
    +" ms, number of elements : "+absoluteResult.size());
}

static String foo(Object s) {
  if (s instanceof String) {
    return "String";
  } else if (s instanceof Boolean) {
    return "Boolean";
  } else if (s instanceof Integer) {
    return "Integer";
  } else {
    throw new IllegalArgumentException();
  }
}

19 comments:

  1. Fast, faster, Scala:

    https://gist.github.com/1505997

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Viktor,

    I get similar performance with this code which is much easier to read (no explicit use of recursion).

    Michel : Took : 452 ms, number of elements : 3000000

    Viktor : Took : 421 ms, number of elements : 3000000

    Java collections : Took : 748 ms, number of elements : 3000000

    ReplyDelete
  4. I punted and missed the call to foo in the loop.

    And interestingly it does not change the time at all !

    So what are we comparing here ... only the time to process a collection based on the collection implementation. The time to call the method is so small we don't see it ! Probably a JVM optimization (I'm on Java 6)

    ReplyDelete
  5. Michel,

    on my machine Viktor's version is almost twice as fast as yours. But your's is twice as comprehensive and even a bit faster than the Java version.
    Very cool! :-)

    ReplyDelete
  6. And that's the infamous Klang effect. Was just about to post something about using a List.newBuilder within an @tailrec function. Pretty much would cover the same. Besides, you don't append an item to the end of a list. That takes O(N) time for each list (O(N^2) total runtime.) Instead, you prepend a list of results and then reverse them. Takes O(N) in total.

    ReplyDelete
  7. Of course, I didn't try to write efficient but idiomatic code. Since my Scala knowledge is a bit old and mostly forgotten, I'm glad others helped out a bit.

    If you're looking for a challenge try to beat this Java version. It's approx. three times faster than Viktor's Scala version.

    ReplyDelete
  8. Finally, here is a Clojure version using mutable data:

    https://gist.github.com/1506973

    For large numbers it's runtime converges on that of the Java implementation. Not bad for a dynamically typed language.

    ReplyDelete
  9. Just for fun: A Clojure version using Arrays. About 5 times faster than the optimized Java version. Of course this is far away from functional style ;)

    https://gist.github.com/1507155

    ReplyDelete
  10. Here is a slightly more concise version.

    // Scala needs Java 7 to make use of new fork/join framework (JSR-166)
    object Scala extends Application {
    // collection.parallel.ForkJoinTasks.defaultForkJoinPool.setParallelism(4)
    val before = System.currentTimeMillis()
    val absoluteResult = for (
    _ <- (1 to 100000)/*.par*/; y <- List("foo", 23, true).map(foo)
    ) yield y
    println("Took : " + (System.currentTimeMillis() - before)
    + " ms, number of elements : " + absoluteResult.size)

    def foo(obj: Any) =
    obj match {
    case _: String => "String"
    case _: Boolean => "Boolean"
    case _: Integer => "Integer"
    case _ => throw new IllegalArgumentException()
    }
    }

    Wondering, if parallelism would make a difference (if using Java 7). But uncommenting the above code seems to result in a dead lock. (I'm no Scala expert...)

    - Daniel

    ReplyDelete
  11. Sven I think you did understand me a bit wrong. I wasn't about unfair, I tried to explain why Groovy does it that way and how the result from that then is.

    But I wonder what made you think that, performance wise, dynamic dispatch (or I should say multi method dispatch) is a strength?

    Anyway... I don't agree that the other examples do dispatch at runtime. For that there should be at least one more method call from your dispatcher method. When I say dispatcher method, then I mean a method, that does some kind of method invocation by calling another method. In your case that would be what follows the instanceof checks. Since you there simply return the value, you do what JIT would have done otherwise. But only if JIT can do that. And that is actually a critical point. If you call bigger methods, then the JIT won't inline the dispatcher anymore very fast. As result you loose inlining of the method you want to dispatch to for everything that calls the dispatcher method. And this then can have severe consequences. The only way around that would be to have the checks at the call sites - with other problems.

    I won't now conclude the Groovy version is better in that, I just try to point out a general problem of such dispatcher methods. And I don't know how Xtend does these things, but assuming that it will be only a little bit less fast than a direct method call in Java might in real life not be the real case. For your test here it is still true what Micheal said in the comments, you test mostly ArrayList performance. So at least for Scala, Java and Xtend, you don't test method call performance with this one at all.

    I also don't understand how you can claim Xtends doing multi method dispatch while at the same time claiming it will behave like Java in its calling logic... Java does no multi method dispatch. I don't think that Xtend can have per default multi method dispatch and Java dispatch at the same time.

    You should also note that Groovy has this multi method logic per default, while in your case, you have to use your keyword to enable it. But assuming I want to enable it... can I subclass that class in Java and add another foo method, that then will be part of the dynamic dispatch, without having to rewrite the dispatcher method? Your text seems to say, that this is possible. If that is possible, then I would be very interested in knowing how. I am always eager to learn how to make something work, that I thought is impossible to do efficient.

    ReplyDelete
  12. @Jochen:
    I understood that you think the other samples don't have to do dynamic dispatch. If that is really the case the comparison is unfair (despite from that it's actually really unfair since I just hacked away some code without specifying any kind of requirements or even have deeper knowledge of Groovy. And also my measurement technique was bogus...).

    But I still think all the examples do dynamic dispatch. Yes, only Groovy and Xtend have to call a different methods for each case, but that doesn't really make a big difference. Calling a method in Java is fast!. If I delegate each case to a method on it's own in the Java example and disable JIT it's only marginally slower (~60ms difference on my machine).

    Xtend binds statically just like Java. For a set of visible overloaded dispatch methods (i.e. a multi method) a synthetic dispatch method is inferred using the common super types of the arguments. So it is declaration site dispatching unlike in Groovy where dispatching is done on the call site.

    If I subclass such a class I can add new cases (within the bounds of the argument types) and the synthetic dispatch method will be overridden. More details

    ReplyDelete
  13. @Stefan: Did you mean 5 times faster than this Java snippet?

    ReplyDelete
  14. @Sven: your java snippet is unbeatable for me (so far). can we add additional languages - e.g. assembler ;-)

    ReplyDelete
  15. @Sven Sorry I did a small mistake which lead to wrong figures. It is actually exactly as fast as your optimized snippet.

    ReplyDelete
  16. I update both the java and scala versions of the code and got the fastest performance using pure arrays instead of lists.

    Both run in exactly the same time.

    Java ArrayList performance is similar to Scala ListBuffer.

    If you really want performance in scala, use arrays and while loops (there is an otherhead to use 1 to n construct since it generates a list).

    ReplyDelete
  17. If you change the code to use an ArrayBuffer (which is the equivalent to ArrayList) and use tail recursion the Scala version actually runs faster than the Java one.

    I cheated a little by adding the Scala Library to the bootclasspath - if you invoke it using the scala shell script the execution time is the same as for the Java version.

    ReplyDelete
  18. And by the way: the main reason why your Scala example is really slow is because you used the deprecated Application trait. The more recent App trait fixes the issues with Application.

    ReplyDelete
  19. This is a bit off topic, but are there any major Eclipse projects, not from those of you who brought us XText/XTend, that are coded in, or anticipating migrating to, XTend? XTend compiles to Java source code, so it can't be too great of a risk to migrate.

    ReplyDelete