Monday, November 16, 2015

Python C API, PyPy and the road into the future

tl;dr; We are willing to commercially support CPython C API in PyPy, so if you want pypy to support library X, get in touch with me at fijal@baroquesoftware.com. Read further for more details.

Python owes a whole lot of its success to the ease of integration with existing POSIX APIs and legacy applications. For the most parts, this means calling C (or Fortran) using various ways with the Python C API being the most commonly used one, either directly or with the help of tools like Cython.

Historically, calling C was one of the sore points of PyPy. We originally had no way to do it, then we implemented ctypes which is loved by few, hated by many. Next we went ahead and implemented cffi which is both a better way to call C and a much better supported one. cffi has been a stunning success, becoming one of the most commonly used pypi package with over 1.5mln downloads happening every month and growing.

This addressed the basic problem of PyPy -- how do I call C? It has however not addressed a very important problem -- "how do I integrate with legacy applications?". We were willing to take a step further, implementing a subset of CPython C API, labelling it "forever beta", "incomplete" and "slow", just to support the legacy software.

Now, to address the growing need, we're willing to take a step further. We discussed ways to make the CPython C API faster and more stable with the promise of supporting it completely in the future. However, since it is about supporting mostly legacy codebase, me and my company, baroquesoftware want to structure it as a commercial contract.

This is an open bid to find funding primarily through commercial partners to implement speed and completeness improvements to CPython C API and pool it together into pypy. The end product will be available, for free for everyone under the MIT license, but the funding will be structured as a commercial contract with all the benefits of one.

Get in touch with me, preferably using mail at fijal@baroquesoftware.com for more details regarding what can be done for what sort of budget.

Best regards,
Maciej Fijalkowski


Friday, March 20, 2015

HippyVM goes to Y Combinator and fails

tl;dr; We decided to go to Y combinator with HippyVM, our high performance PHP implementation, and we did not get through after two rounds of interviews.

But I suppose there is more to it, so keep reading....

The whole story started with a small disaster, but let's start at the beginning. We applied to Y combinator a bit haphazardly in 2012 for the 2013 summer batch, without expecting the interview to get through. The main reason for me to apply was precisely the 7 ideas talk done by Paul Graham at Pycon US as a keynote mentioning the "sufficiently smart compiler". For those readers who don't know, PyPy is a fast Python compiler, but we also developed a language and a framework called RPython that's suitable for implementing fast dynamic languages, so we decided to check if it works for PHP, which is how HippyVM was born.

Well, I thought, we have a framework that's as close as it gets this days to "sufficiently smart compiler"; so I decided to submit -- why not. When we got the Y combinator invitation, I was in Europe at the time, out from my usual place of residence, South Africa. We got tickets, went to the airport and.... it turned out my visa for the US had been left at home. Note: US tries not to admit the fact that they keep visa information in any sort of system, so if you get a new passport you are either allowed to use your old passport or you need to apply for a new visa. No way to transfer to a new passport. Oh well -- fortunately for the most part we live in the 21st century and a few calls, DHLs and tickets later, I landed in San Francisco for a weekend with the interview scheduled for Saturday. That ended up in 3h of being detained at SFO, since nobody flies to SF for a weekend carrying two sets of clothes a laptop and a sleeping bag.

The idea

The idea was simple - we have enough expertise in compilers to do hard things and PHP is the most widely deployed dynamic language. Also, people are selling various "PHP optimizers" for money that don't really do much. We can do better. At the time HHVM was really not working very well and there was no other competition.

The actual interview

We actually ended up having two sets of interviews, which I think is pretty unusual. The first team was probably very confused, so they sent us down to the second one. The positive part of the interview was that people (at least those that use Python) generally recognize our work. The negative part was that 3 months is by far not enough to bring any tangible results in the compiler world. We required 1-2 years of work to provide anything tangible, and that does not fit into their model. Paul Buchheit asked us half-jokingly why all the cool compiler guys are from Europe (which is as far as I know not true, but Europe is overrepresented). I didn't have an answer at the time but later that day it become blatantly obvious that it's all about long-term vision. Compilers take more time than Americans typically have in their sights. PyPy is 10 years old and it's "the new kid on the block". We were told we should be home cranking code until we can get to something showable. I walked out from the interview pretty sure we would not get in.

The aftermath

Unsurprisingly, we didn't get in. We ended up having a very good one day PyPy sprint in San Francisco. We do not fit in the model. Now this brings me to an interesting question, which is what Lars Bak told me -- there is no money in infrastructure like programming languages. Very few people are willing to invest in such companies and the contenders these days are all Open Source without a decent funding model or backed by a large corpo (Oracle, Google, Microsoft) or both. I have no idea how to go about sponsoring research like PyPy or building a business model around it. Despite bringing a lot of value to the system (and I don't mean just PyPy; also CPython, Ruby etc.) there does not seem to be a good way to build a business model.

There are good reasons why you want your infrastructure to be either Open Source or backed by a large stable entity, and I'm very much for that, the world is a better place than it was during the coldfusion days and we're all better off. However, we're missing a business model where infrastructure people can get attention from VCs and a revenue model that somehow corresponds to the value they're bringing.

Post-mortem

HippyVM got a little funding at the beginning to get us to some sort of prototype. Within a bit over than a year to a point where we were able to run mediawiki with a significant speedup over Zend PHP. However, the HHVM team these days counts between 30-60 people (that's what I can guess from the photo) and is available for free. Sure, it's tied to Facebook, but it seems to be enough to deter any business in this area. We would not be able to outcompete HHVM by enough (usually enough is 2x faster) on real life workloads with a fraction of team and a fraction of their funding, so we went onto improving PyPy. We did achieve most of what HHVM does at a small percentage of the cost, but the difficulties in funding generally caused the HippyVM project to come to a stall.

What now?

I do consulting. Most of it is PyPy-related, so I'm pretty happy, however I'm still trying to find a model where basic research and infrastructure work can provide revenue which is related to the amount of value it's bringing to companies. Ideas welcome :-)