Hello
Recent PyPys effort to bring NumPy and the associated fundraiser
caused a lot of discussion in the SciPy community regarding PyPy, NumPy,
SciPy and the future of numeric computing in Python.
There were discussions on the topic as well as various blog posts
from the SciPy community who addressed few issues. It seems there was a lot
of talking past each other and I would like to clarify on a few points here,
although this should be taken as my personal opinion on the subject.
So, let's start from the beginning. There are no plans for PyPy to
reimplement everything that's out there in RPython. That has been pointed
out from the beginning as a fallacy of our approach -- we simply don't plan
to do that. We agree that Python is a great glue language and we would like
to keep it that way. PyPy can nicely interface with C using ctypes with
a slightly worse story for C++ (even though there were experiments).
What we know by now is that CPython C API is not a very good glue for PyPy,
it's too tied to CPython and it prevents a lot of interesting optimizations
from happening. The contenders are a few with Cython being a favorite
for now, however for Cython to be usable we need to have a story for C++
(I know Cython does have a story but it's unclear how that would work with
the PyPy backend).
Which brings me to second point that while a lot of code in packages like
SciPy or matplotlib should be reusable in PyPy, it's probably not in
the current form. Either a lot of it has to move to Cython or some other
way of interfacing with C will come across. This should make it clear that
we want to interface with SciPy and reuse as much as possible.
Another recurring topic that seems to pop up is why we just don't reuse Cython
for NumPy instead of reimplementing everything. The problem is that we need
a robust array type with all the interface before we can start using Cython
for anything. Since we're going to implement it anyway, why not go all the way
and implement the full NumPy module? And that is the topic of the current
funding proposal is exactly that -- to provide full NumPy module. That
would be a very good start for integrating the full stack of SciPy and
matplotlib and all other libraries out there.
But also the trick is that a robust array module can go a long way alone.
It allows you to prototype a lot of algorithms on it's own and generally has
it's uses, without having to worry "but if I read all the elements from the
array it's going to be dog slow".
The last accusation is that we're trying to split the community. The answer is
simply no. We have a relatively good roadmap how to get to support what's out
there in scientific community and ideally support all people out there. This
will however take some time and the group of people that can run their
stuff on top of PyPy will be growing over time. This is indeed precisely what
is happening in other areas of python world -- more and more stuff run on PyPy
and people find it more and more interesting to try and to adapt their
own stuff to run.
To summarize, I don't really think there is that much of a gap between us
and SciPy people. We'll start small (by providing full NumPy implementation)
and then gradually move forward reusing as much as possible from the entire
stack.
Cheers,
fijal