Lost in JIT: Making things happen one unittest at a time

Hello.

There was a lot of discussions about our (PyPy's) plan with regard
to reimplementing Numpy. I would like to give a slightly more personal view
on things as they go as well as arguments about the approach in general.

Maybe let's start with a bit of background: the numpy effort in PyPy is the work
of volunteers who either need to extend it a little or find it fun to work on.
As of now it implements a very small subset of numpy – single dimensional float
arrays with a couple of ufuncs to be precise – and is relatively fast.

There are two obvious questions: (1) whether the approach of reimplementing numpy
might potentially work and (2) whether it makes sense from a long-term perspective.

The first part I'll leave alone. I would think that we have enough street cred
that we can build things that work reasonably well, but hey, predicting the future
is hard.

To answer the second part, there are two dimensions to the problem. One is the
actual technical perspective in short-mid-long term, the other being how
likely are people willing to spend time on it. It's actually pretty crucial that
both goals are fulfilled. Creating something impossible is hard
(has been tried before), while creating something that's tedious from
the start makes people not want to work on it. It's maybe less of a problem
in a corporate environment, but in open source it's completely crucial.

Technical part

Everyone seems to agree, with varying degrees of trust, that the JITted numpy
is the way to go in the long term. What can a JIT give you? Faster array
manipulations (even faster than numpyexpr) and most importantly faster
array iterations without hacks like using cython or weave. This it the
thing you get for free when you implement numpy in RPython and you don't
get at all when using cpyext. Note that it'll still reuse all parts of numpy
and scipy that are written in C -- this is most of it. The only part requiring
rewriting is the interface part.

With cpyext:

short term: nothing works, segfaults

mid term: crappy slow numpy, 100% compatible

long term: ? I really don't know, start from scratch?

With reimplementing parts in RPython:

short term: nice, clean and fast small subset of numpy

mid term: relatively complete numpy implementation, not everything though,
super fast, reusing most parts of pure C or Fortran

long term: complete JITted numpy, hopefully achieving a better split
of numpy into those parts that are CPython-specific and those that actually implement functionality.

If you present it like that, there is really not that much choice, is there?

To be fair, there is a one missing part, which is that the first approach
gives you a much better cpyext, but that's not my goal for now :)

Personal part

I plan to spend some time in the near future working on making numpy on PyPy
happen, without any other day job. If you have a thing that requires numpy
and will greatly benefit from having a fast python interpreter with a fast
numpy, this is the right point to contact me, money can make some APIs
appear faster than others :)

Cheers,

fijal

Lost in JIT

Wednesday, June 22, 2011

Making things happen one unittest at a time

Technical part

Personal part

No comments:

Post a Comment

About Me

Blog Archive

Wednesday, June 22, 2011

Making things happen one unittest at a time

Technical part

Social part

Personal part

No comments:

Post a Comment