Saturday, October 8, 2011

PyPy's future directions

The PyPy project was long criticised for being insufficiently
transparent about the direction of its development. This changed
drastically with the introduction of the PyPy blog, Twitter stream,
etc., but I think there is still a gap between the achievements
reported in the blog and our ongoing plans.

This post is an attempt to bridge that gap. Note, however, that it is
not a roadmap -- merely a personal opinion about some interesting
directions currently being pursued in the PyPy project. It is not
intended to be exhaustive.

NumPy for PyPy

Even though people might not quite believe that we can deliver it,
there is an ongoing effort to bring NumPy to PyPy by reimplementing
the interface pieces originally written in C in RPython. A lot work
has recently been done by Justin Peel and Alex Gaynor, and there have
been many smaller contributions from various volunteers.
This is very exciting, since PyPy is shining in numerics, which means that
with the full power of NumPy, we can provide a good alternative to
Matlab, etc.. We also have a vague plan to leverage platform-level vector
instructions like SSE to provide an even faster NumPy. Stay tuned!

Concurrent GC

There is a branch where Armin is experimenting with a simple
concurrent GC. This will offload your GC work to another thread
transparently in the background. Besides improved performance, this
should also remove GC pauses which is crucial for real-time
applications like games.

JSON improvements

There is ongoing work to make JSON encoding fast. We aim to beat the C
extension in CPython's standard library by using only pure Python.
Stay tuned, we'll get there. :-)

GIL removal

There is another branch and an advertised plan to remove the GIL using
software transactional memory. While implementing an STM inside a
dynamic language with lots of side effects is clearly a research
project, the prospects look promising. There is a risk that the
overhead per thread will end up fairly high, but we hope to avoid this
(the JIT may help here) -- and Armin Rigo is well known for
delivering the impossible.

Minor improvements left and right

Under the radar, PyPy is constantly improving itself. Current trunk is
faster than 1.6 and has fewer bugs. We're always looking at bug
reports and improving the speed of various common constructions, such
as str % tuple, str.join(list), itertools or the filter
function. Individually, these are minor changes, but together they
speed up applications quite significantly from release to release.

All of the above is the ongoing work. Most of it will probably work out
one day, but the deadline is not given. It's however exciting to see so
many different opportunities arising within the PyPy project.



  1. Wow! I tried to think of something else to say, but I think Wow! says it all.

  2. This looks awesome. Could you sneak in embedding multiple PyPy VMs into a single process?

  3. Wow indeed, this project is incredibly active and going ahead with enormous speed. This is very good news for the Python community. Keep up the good work!

  4. Numpy running in pypy would be incredible. I can't wait seeing it! in our company we develope a lot using numpy, scipy and scikit learn.

  5. If your company want to get it done faster, get in touch with me

  6. On that note: I see the donation progress bar for Python 3 support in PyPy is up! How is the donation page for Numpy coming?

  7. Thanks, this is really great!

    p.s.: MongoDB looks like one of popular use-cases for PyPy (maybe because brave people who doesn't afraid to use MongoDB and PyPy in production / new projects are the same people :-), but the problem with that is that it's BSON encoder is really slower on PyPy than it's CPython/C version. So it would be great if BSON would be next after you finish with JSON. Thanks :-)

  8. I'm afraid BSON is not a part of standard library (JSON is)