Tuesday, February 14, 2012

PyPy and its future challenges

Obviously I'm biased, but I think PyPy is progressing fairly well. However,
I would like to mention some areas where I think pypy is lagging ---
not living up to its promises or the design decisions simply didn't
turn out as good as we hoped for them. In a fairly arbitrary order:

  • Whole program type inference. This decision has been haunting
    separate compilation effort for a while. It's also one of the reasons
    why RPython errors are confusing and why the compilation time is so long.
    This is less of a concern for users, but more of a concern for developers
    and potential developers.

  • Memory impact. We never scientifically measured
    memory impact of PyPy on examples. There are reports of outrageous pypy
    memory usage, but they're usually very cryptic "my app uses 300M" and not
    really reported in a way that's reproducible for us. We simply have to start
    measuring memory impact on benchmarks. You can definitely help by providing
    us with reproducible examples (they don't have to be small, but they have
    to be open source).

The next group all are connected. The fundamental question is: What to do
in the situation where the JIT does not help? There are many causes, but,
in general, PyPy often is inferior to CPython for all of the examples.
A representative, difficult exammple is running tests. Ideally, for
perfect unit tests, each piece of code should be executed only once. There
are other examples, like short running scripts. It all can
be addressed by one or more of the following:

  • Slow runtime. Our runtime is slow. This is caused by a combination
    of using a higher
    level language than C and a relative immaturity compared to CPython. The
    former is at least partly a GCC problem. We emit code that does not look
    like hand-written C and GCC is doing worse job at optimizing it. A good
    example is operations on longs, which are about 2x slower than CPython's,
    partly because GCC is unable to effectively optimize code generated
    by PyPy's translator.

  • Too large JIT warmup time. This is again a combination of issues.
    Partly this is one of the design decisions of tracing on the metalevel,
    which takes more time, but partly this is an issue with our current
    implementation that can be addressed. It's also true that in some edge
    cases, like running large and complex programs with lots and lots
    of megamorphic call sites, we don't do a very good job tracing. Because
    a good example of this case is running PyPy's own test suite, I expect
    we will invest some work into this eventually.

  • Slow interpreter. This one is very similar to the slow runtime - it's
    a combination of using RPython and the fact that we did not spend much
    time optimizing it. Unlike the runtime, we might solve it by having an
    unoptimizing JIT or some other medium-level solution that would work good
    enough. There were some efforts invested, but, as usual, we lack enough
    manpower to proceed as rapidly as we would like.

Thanks for bearing with me this far. This blog post was partly influenced
by accusations that we're doing dishonest PR that PyPy is always fast. I don't
think this is the case and I hope I clarified some of the weak spots, both here
and on the performance page.

EDIT:For what is worth I don't mention interfacing with C here and that's not because I think it's not relevant, it's simply because it did not quite fit with other stuff in this blog post. Consider the list non-exhaustive




  1. Is there any reason why you couldn't persist the jit traces to disk and run starting with parts of the code traced? This has always been a frustration to me about jits, it seems like every program run, good work is done and then thrown away on process exit.

  2. Yes, there is a very good reason. JIT code contains references to actual memory in the assembler which might come from various strange places. It's very hard to check whether the initialization produced the same kind of objects and where.

  3. So, if you had a strong initialization semantics that fed forward into the jitted trace, e.g. like a loader for jit traces, it would work, but the problem is that at the time of the trace, you can't backtrack out the correct initialization procedure for all of the objects, you just rely on their existence in an initialized state, which is the only state the tracer ever sees them in?

  4. I'm looking at http://doc.pypy.org/en/latest/translation.html

    It looks like some caching can be done for early steps like type inference by the annotator, would that make compilation faster?

  5. I know no one would like this, but when finally the port to python 3.0 is done rpython could use the new function annotations and start using only local type inference. I (and most python programmers) love dynamic typing in python, but global type inference is a huge pain both to programmers and to the translation process.

  6. Leonardo you're completely missing the point. The problem is not really lack of *syntax* for expressing this. We're already using decorators and it works just fine. The problem is how to do all the semantics of type system and be able to specify types. Python 3 does not help with it at all.

  7. I was talking about changing to local type inference like scala for example http://www.scala-lang.org/node/127. There are other languages that do this and even c++ has something like that with the auto keyword. For local type inference you would have to type every function argument but then you get a much faster translation, simpler work to do separate compilation and less cryptic error messages. Sorry to talk about the syntax thing and get you focused on that.

  8. @Leonardo we have that, it's called annenforceargs. Grep the source. The problem is we can't express all the subtleties of our type system like that and this is something we want to work on short-to-mid term.

  9. s/it's/its/ - Please :-) [http://eli.thegreenplace.net/2008/12/13/lets-lets-its-its/]