Wednesday, July 6, 2011

How fast PyPy really is?

Martijn Faassen used to ask "how fast is PyPy", so we added the --faassen
option to compile toolchain to enable all the optimizations we had. Back then,
we didn't have a JIT and PyPy's interpreter was quite a bit slower than
CPython's (it still is), but the situation has changed quite drastically
since the introduction of the JIT. We even had to remove --faassen command
line option!

So, let's repeat Martijn's question: how fast PyPy is these days?
According to the speed website it's
3.9x times faster than CPython, but there are benchmarks where it's 6-12x
faster, 3x faster, 20% slower and so on. In addition people are asking
and asking again when PyPy will be as fast as V8 (Google Chromium's
JS engine) or Tracemonkey (Firefox's JS engine).

To answer this question really, we have to consider various categories of
benchmarks/applications we're running. I'll try to pinpoint some main groups
of such, as well as PyPy's status and approaches.


To compute fibonacci numbers in a very inneffective way is the world's most
famous benchmark and everybody bases their opinions on top of this. In python
it goes like this:

def fib(n):
if n == 0 or n == 1:
return 1
return fib(n - 1) + fib(n - 2)

I would not comment much on this, good news however:
PyPy trunk is 9x faster on this
benchmark than PyPy 1.5 and we finally beat CPython!

Algorithmic benchmarks

This is the broadest category, but also the easiest one to compare various
languages in, hence websites like computer language shootout or
attractive chaos include mostly those. This is not really a bad thing
- it's just plain impossible to implement the equivalent, say, web server in
various languages, so we stick with simple, yet necessary things. Those
benchmarks tend to put a very high pressure on numerical operations.

This is also the area where traditionally Python did not perform well
and Python programmers tended to say that you should not benchmark Python
implementations on numerics since this is not how you use Python. This is
also an area where PyPy really shines compared to CPython, often featuring
10-100x speedups and where PyPy sometimes gets to speeds of C.

From my own perspective, I disagree it should not be considered - Python
is not traditionally used in
this area (because of poor performance), but maybe it's time to move on
and start using it? The real-time video processing demo at Europython (sorry
no link yet) is one example where Python can be used where it was not feasible
before. However, this is not the only area where people should be concentrating
their efforts.

This is also the only area where PyPy can be compared
against V8 and indeed it is usually slower (but it's also usually
not slower than tracemonkey). [citation needed ;-)]

Everything else

This is the category of everything else. It can be anything - template
engines, network libraries, django, our own translation toolchain -
seriously anything. There is a very common historical misconception here -
if CPython runs fibonacci 60x slower than C, rewriting for example twisted in C
would lead to 60x speedups. Going further, since PyPy speeds up twisted
by at most 2x, there is 30x to go.

It sounds simple, but it's probably very untrue. There are operations that
in C take absolutely the same time than in Python, like dictionary lookups,
or even less, because you're very likely not going to come with a more
efficient dict implementation than CPython or PyPy.

In this area using PyPy is usually a win, but nowhere near close to what
you get in algorithmic code. This is also the area where a bottleneck
can be anything or it might even not exist in a single place at all, hence
rewriting to C or shedskin or Cython might simply not be infeasible.

This is also why writing a fast interpreter in Python that speeds up everything
over CPython is hard - the bottleneck might be in string processing, regular
expressions, bz2 module, file reading, executing complex functions
with *args and **kwds, json importing or even deepcopy module. Every single
aspect of this should be fast enough or preferably even faster. This is simply
something that's in my opinion unprecedented - people don't only want a fast
language that's pretty rich on it's own - they also want a very vast and fast
standard library.

So, how fast PyPy really is?

The answer is - it depends. You should go and measure yourself and most
certainly you should come to us and complain if it's too slow. We love
slow benchmarks and we'll be happy to help you.