Some articles on performance really make me angry. Especially, when they compare apples with oranges. But some even cheat.
Recently, I came along an article on kotlin (http://www.baeldung.com/kotlin-tail-recursion)
in which two formidable performance numbers are mentioned (1):
(sic)
Calculating factorial(50) 1,000,000 times without tail recursion takes ~70ms
Calculating factorial(50) 1,000,000 times with tail recursion takes ~45ms
In ST/X, there is not much of a difference when calculating factorials either recursive or iterative, the CPU will spend >95% multiplying large integers and the call overhead is negligible (and that is to be expected, unless some clever factorial tricks are used).
Wow, I thought, how do they achieve such an impressive performance?
First of all, the author does not mention the CPU, clock rate and configuration
of his system. Was there a warmup (for the JITTER) or was it run anew from the command line?
Not knowing much of kotlin's number handling,
I tried the following code on my Mac Powerbook, with a 2.6Ghz Core I7, 1600Mhz DDR3, Mid 2012 system:
compiled and ran it with:
fun recursiveFactorial(n: Long) : Long {
return if (n <= 1) {
n
} else {
n * recursiveFactorial(n - 1)
}
}
fun main(args: Array
"time java -jar fac.jar"
and got:
Not quite 70ms, but still impressive 240ms.
I assume that is due to the startup time, or the way the loop is constructed
(a while may be faster, perhaps).
Running similar code in ST/X:
real 0m0.244s
user 0m0.238s
sys 0m0.038s
gives me 2760ms.
Time millisecondsToRun:[
1000000 timesRepeat:[
50 factorial
]
]
Dam - "A factor of 10 slower", I thought.
But when thinking about it, the question to ask is: "is it possible (without cheating) to compute (50 factorial) in 70ns?"
It might be possible to perform 50 "normal" multiplications in 70ns (i.e. 64bit integer arithmetic and a very clever code generator, propably using mmx or other parallel instructions...)
But definitely not with a bignum library! Because (50 factorial) is "30414093201713378043612608166064768844377641568960512000000000000", definitly not fitting in 64bits! (not even 128 bits, by the way, as this integer needs 216 bits).
Is it possible, that kotlin uses plain 64bit integers and does no overflow checks?
Conclusion: So lets check!
Adding two lines to the kotlin main function:
results in the output:
print("fac(50) is: ");
println(recursiveFactorial(50));
So kotlin is using 64bit integer arithmetic, without overflow checking and
without falling back to largeInteger arithmetic in case of one.
And therefore, the whole computation is wrong.
fac(50) is: -3258495067890909184
The largest factorial which can be computed in 64 bit is "(20 factorial)",
and, in kotlin, that takes:
and in Smalltalk:
real 0m0.206s
user 0m0.198s
sys 0m0.039s
gives me 50ms (2)
Time millisecondsToRun:[
1000000 timesRepeat:[
20 factorial
]
]
(1) For fairness, this article is about the speedup of tail-recursion vs. regular recursion, so using factorial as an example is probably not a good choice, given that factorial will spend most of its time in multiplying huge numbers in both cases.
(2) I have to admit, that the factorial library function of ST/X is heavily tuned. If hand-programmed using recursion, I get 250ms, which is slightly slower than the kotlin/java code, but a very reasonable result, given that additional overflow checks are present in the generated code.
More to come.