Tuesday, November 5, 2013

Interview questions and real software engineering

Note: This post started as an exploration of interview questions; however, the problem space is much bigger than that. I'm trying to really understand what software engineering  is and why some people are much better at it than others.

As far as my experience goes, there seem to be four kinds of hiring strategies employed by various companies. They often come in combinations, with different values attached to different parts.

  • First comes the classic HR questionnaire that you have to fill out and/or answer in person. It's about personal values, why you want to work for a company (hint: "because I have no money" is not the right answer), etc.. This can essentially be replaced by a single question: "are you happy being a replaceable gear?". Tends to be mostly absent these days in pure-tech companies, but is still employed by large corps with IT departments, e.g. banks or defence contractors.
  • Second comes the "I know you already" kind. Very popular among startups and companies entrenched in Open Source - they essentially view your work on GitHub and/or know you already from conferences and it all boils down to an informal chat to establish whether you are the right match. If there is a technical interview, it's very informal and mostly revolves around technology choices and the rationale behind them. These companies also tend to have a decent way of exploring the common ground in the first month or week, so the decision to hire is not permanent.
  • Third is the classic knowledge-based questionnaire. They'll ask you unusual tips & tricks in the technology that you know (or you don't), like handling of **args in Python or whether can you do some magic JOIN in SQL. Usually these answers can easily be googled, and while the rationale is "if you know all these quirks you have spent a lot of time on this platform", it heavily favours people like me with a photographic memory for obscure details. This photographic memory usually brings no value whatsoever - I know a lot of crazy details about all kinds of platforms that are completely useless. If you want to impress the potential interviewer, put the BUGS section from a random manual into your fortune cookie each time you start a terminal.
  • Last, but not least, is the classic "Google-style" interview. It involves asking you questions like this one. You have a well-defined problem and you have to find the right solution. There are a few good solutions, but only one that's really correct. The problem space is very finite. Repeat for the whole day in stressful conditions, until you're tired (and way after that).

So once we have categorized these groups, let's have some opinions. First I would like to dismiss the classic HR one as not applicable to the software development I know. I can't imagine a good project where people are treated as replaceable gears. The third kind, and to some extent the fourth kind have some sort of correlation to real knowledge, but there is definitely no causation -- the only thing you can say is that a person who does not know a platform at all will not be able to answer questions about its quirks. We have something to work with, but the problem with it is that your company does not exist in a perfectly uniform vacuum. If everyone does these kinds of interviews, the candidates who know these quirks will have a lot more job offers than those who don't. At the end of the day, if you're the only company who does not do that, you can hire all the good people who don't have photographic memory and you win.

Now, let's sidestep a bit and talk about how software development really works in my opinion. The sequence looks vaguely like this:

  • step 0 - customer problem "I want to send messages to my peers"
  • step 1 - technical problem "we need to send messages between computers with browsers"
  • step 2 - technology and architecture choice "AJAX program, JS logic, Python in the backend, HTTP"
  • step 3 - implementation overview "this is the backend API, JS has to respond to this..."
  • step 4 - actual implementation (bunch of code, usually a lot)

Where each step corresponds to moving from the previous state to the next. Eg. step 1 means "understand what customer wants and explain it in technical terms".

Now, as a person working for a company as a software engineer, especially a large company you might not participate in steps 1 or 2. Working for a startup or a small company will require you to deal with it much more, so the interview style where you just chat makes a lot of sense. After all, this will be your primary skill.

The problem with the rest is that step 3 is the hard step these days, since we don't code on punch cards in assembler. However, most interviews will focus on step 4 -- which is not quite, but almost, purely mechanical. In my opinion, the difference is as follows:

  • step 4 covers a problem space within constraints that make it finite and explorable. An interview concentrating on this will cover how quickly you can navigate a finite space. I believe this step can be solved completely by better programming languages.
  • step 3 covers a space which can be considered infinite - the number of possible combinations is just too large. Most of actual software engineering revolves around "I need to find a simple solution for this formal, but vague problem. How do I constraint my infinite space so that it's possible?".

Note that even in a completely formal environment -- like implementing PyPy, which is a Python interpreter -- step 3 is far more important than step 4. You're presented with a piece of code to be optimized. You need to find an optimizer model that's simple enough and correct to optimize this piece of code. While the problem is formal, the potential space is infinite, so you need to constrain it in various ways first, before making any real progress. Once you have done that, the actual task of e.g. writing a tree-walking algorithm correctly is trivial.

Now, to summarize, I don't really know how to conduct technical interviews that cover the most important parts of software engineering. I think a lot of interviews focus on the wrong things and only vaguely resemble what is actually happening. That's why we generally go with "let's have a chat and we'll see as we go along", but maybe there are better ways. I don't think enough research has been done on the human ability to explore complex systems, but on the other hand I might just be highly ignorant of that topic. Please mail me interesting papers if you know some.

On a sidenote - we're hiring. If you want to work with smart people on challenging problems, get in touch. Contact info findable courtesy of google ;-)

Maciej Fijalkowski

Friday, December 7, 2012

Why I am no longer a voting member of the Python Software Foundation

I am from Europe. It's a place behind a big body of water from the United States, generally in the direction of the east. There are few key differencies, among other things, approach to democracy. We would typically have a large body of people making all important decisions (like a parliment) and a small body of people making more mundane decisions (like a government). In a typical scenario, the government has seriously less power than the parliment, however it's also much more agile, hence more suited for making quick decisions. A good example is budget - the government would create a budget that would then be voted by the parliment. As far as I understand, the idea is to not vote all the details, but instead create a budget that will be approved by the parliment.

The PSF is almost like this. There is a large body of people (PSF members) and a small body of people (PSF board). There is one crucial difference - the PSF members have power only on paper. The only voting that ever happens are for either broadening the powers of the board, voting for the board or for voting in new members. The board make all the actual decisions.

This is not to say that the board makes bad decisions - I seriously cannot pinpoint a single time where it did happen. I'm very happy with the board and with it's policies. However I don't feel I have any voting power. I perfectly understand why it is that way - the PSF members is a big group and even finding a way for everyone to vote in a reasonable manner would be a mission. As a European, I would think it's a mission worth trying though, but as of now I would stay as a non-voting member (also known as emeritus) and wait for the board to make a decision on everything.


Thursday, July 12, 2012

Call for a new open source economy model

DISCLAIMER: This post is incredibly self-serving. It only makes sense if you believe that open source is a cost-effective way of building software and if you believe my contributions to the PyPy project are beneficial to the ecosystem as a whole. If you would prefer me to go and "get a real job", you may as well stop reading here.

There is a lot of evidence that startup creation costs are plummeting. The most commonly mentioned factors are the cloud, which brings down hosting costs, Moore's law, which does the same, ubiquitous internet, platforms and open source.

Putting all the other things aside, I would like to concentrate on open source today. Not because it's the most important factor -- I don't have enough data to support that -- but because I'm an open source platform provider working on PyPy.

Open source is cost-efficient. As Alex points out, PyPy is operating on a fraction of the funds and manpower that Google is putting into V8 or Mozilla into {Trace,Jaeger,Spider}Monkey, yet you can list all three projects in the same sentence without hesitation. You would call them "optimizing dynamic language VMs". The same can be said about projects like GCC.

Open source is also people - there is typically one or a handful of individuals who "maintain" the project. Those people are employed in a variety of professions. In my experience they either work on their own or for corporations (and corporate interests often take precedence over open source software), have shitty jobs (which don't actually require you to do a lot of work) or scramble along like me or Armin Rigo.

Let me step back a bit and explain what do I do for a living. I work on NumPy, which has managed to generate slightly above $40,000 in donations so far. I do consulting about optimization under PyPy. I look for other jobs and do random stuff. I think I've been relatively lucky. Considering that I live in a relatively cheap place, I can dedicate roughly half of my time to other pieces of PyPy without too much trouble. That includes stuff that noone else cares about, like performance tools, buildbot maintenance, release management, making json faster etc., etc..

Now, the main problem for me with regards to this lifestyle is that you can only gather donations for "large" and "sellable" projects. How many people would actually donate to "reviews, documentation and random small performance improvements"? The other part is that predicting what will happen in the near future is always very hard for me. Will I be able to continue contributing to PyPy or will I need to find a "real job" at some point?

I believe we can come up with a solution that both creates a reasonable economy that makes working on open source a viable job and comes with relatively low overheads. Gittip and Kickstarter are recent additions to the table and I think both fit very well into some niches, although not particularly the one I'm talking about.

I might not have the solution, but I do have a few postulates about such an economical model:

  • It cannot be project-based (like kickstarter), in my opinion, it's much more efficient just to tell individuals "do what you want". In other words -- no strings attached. It would be quite a lot of admin to deliver each simple feature as a kickstarter project. This can be more in the shades of gray "do stuff on PyPy" is for example a possible project that's vague enough to make sense.
  • It must be democratic -- I don't think a government agency or any sort of intermediate panel should decide.
  • It should be possible for both corporations and individuals to donate. This is probably the major shortcoming of Gittip.
  • There should be a cap, so we don't end up with a Hollywood-ish system where the privileged few make lots of money while everyone is struggling. Personally, I would like to have a cap even before we achieve this sort of amount, at (say) 2/3 of what you could earn at a large company.
  • It might sound silly, but there can't be a requirement that a receipent must reside in the U.S. It might sound selfish, but this completely rules out Kickstarter for me.

The problem is that I don't really have any good solution -- can we make startups donate 5% of their future exit to fund individuals who work on open source with no strings attached? I heavily doubt it. Can we make VCs fund such work? The potential benefits are far beyond their event horizon, I fear. Can we make individuals donate enough money? I doubt it, but I would love to be proven wrong.

Yours, leaving more questions than answers,

Thursday, April 19, 2012

Call for a global Immigration Reform

I'm a technology nomad. We changed camels for high powered, fossil fuel burning
jets. I'm working from any place that has internet connection which is
typically within hundreds of meters from any physical location I happen to
be at. I create open source software that brings value to various people,
using mostly loose change and scraps from large corporations for a living.
It's not that much value, after all, who uses PyPy, but the important part
is the sign - it's a small, albeit positive change in the open source ecosystem
that in turn makes it cheaper to create software stacks which ends up in
young companies trying to make a dent in the universe. I'm a plumber fixing
your pipes, one of the many.

And I need a visa for that. I want to have a stamp in my passport that will
state all of the above and provide few clues as to what it actually means:

  • I will not stay in your country for very long.

  • The exact place does not matter at all to me - it's all one big internet.

  • I'll not seek employment at McDonalds and I have a pretty good track record,
    go read my bitbucket contributions.

  • Open Source is software you run into everyday - and this is also because
    of people like me.

And yet I'm failing. People running immigration are so detached from my reality
we don't even send postcards to each other. Every single border officer is
suspicious and completely confused as to why and how I do all of that.

How can we change it? How can we end the madness of pointless paperwork?



Tuesday, February 14, 2012

PyPy and its future challenges

Obviously I'm biased, but I think PyPy is progressing fairly well. However,
I would like to mention some areas where I think pypy is lagging ---
not living up to its promises or the design decisions simply didn't
turn out as good as we hoped for them. In a fairly arbitrary order:

  • Whole program type inference. This decision has been haunting
    separate compilation effort for a while. It's also one of the reasons
    why RPython errors are confusing and why the compilation time is so long.
    This is less of a concern for users, but more of a concern for developers
    and potential developers.

  • Memory impact. We never scientifically measured
    memory impact of PyPy on examples. There are reports of outrageous pypy
    memory usage, but they're usually very cryptic "my app uses 300M" and not
    really reported in a way that's reproducible for us. We simply have to start
    measuring memory impact on benchmarks. You can definitely help by providing
    us with reproducible examples (they don't have to be small, but they have
    to be open source).

The next group all are connected. The fundamental question is: What to do
in the situation where the JIT does not help? There are many causes, but,
in general, PyPy often is inferior to CPython for all of the examples.
A representative, difficult exammple is running tests. Ideally, for
perfect unit tests, each piece of code should be executed only once. There
are other examples, like short running scripts. It all can
be addressed by one or more of the following:

  • Slow runtime. Our runtime is slow. This is caused by a combination
    of using a higher
    level language than C and a relative immaturity compared to CPython. The
    former is at least partly a GCC problem. We emit code that does not look
    like hand-written C and GCC is doing worse job at optimizing it. A good
    example is operations on longs, which are about 2x slower than CPython's,
    partly because GCC is unable to effectively optimize code generated
    by PyPy's translator.

  • Too large JIT warmup time. This is again a combination of issues.
    Partly this is one of the design decisions of tracing on the metalevel,
    which takes more time, but partly this is an issue with our current
    implementation that can be addressed. It's also true that in some edge
    cases, like running large and complex programs with lots and lots
    of megamorphic call sites, we don't do a very good job tracing. Because
    a good example of this case is running PyPy's own test suite, I expect
    we will invest some work into this eventually.

  • Slow interpreter. This one is very similar to the slow runtime - it's
    a combination of using RPython and the fact that we did not spend much
    time optimizing it. Unlike the runtime, we might solve it by having an
    unoptimizing JIT or some other medium-level solution that would work good
    enough. There were some efforts invested, but, as usual, we lack enough
    manpower to proceed as rapidly as we would like.

Thanks for bearing with me this far. This blog post was partly influenced
by accusations that we're doing dishonest PR that PyPy is always fast. I don't
think this is the case and I hope I clarified some of the weak spots, both here
and on the performance page.

EDIT:For what is worth I don't mention interfacing with C here and that's not because I think it's not relevant, it's simply because it did not quite fit with other stuff in this blog post. Consider the list non-exhaustive



Wednesday, December 28, 2011

Personal view on PyPy's future


PyPy has seen some dramatic progress in the past year with each release
delivering roughly 30% speed improvements over the previous one. I think
with a roughly 3-month release cycle, this is a pretty good achievement and
you haven't seen all of it yet :) We have quite a few improvements in process,
with the current trunk already showing faster performance than 1.7, even
though only a month has passed.

It might be worth mentioning that PyPy has many facets and progress is done
by multiple people in various directions. Examples include:

  • STM work done by Armin, read details

  • NumPy speed improvements, done mostly by Alex Gaynor and myself

  • NumPy completeness, done by a lot of people, including myself, Alex, Matti
    Picus and some others, if you're interested in this progressing faster, feel
    free to donate towards NumPy on PyPy

  • Specialized object-instances, worked by Carl Friedrich Bolz and
    Lukas Diekmann and various other specialized types

  • ARM JIT backend, work by David Schneider and Armin

  • PPC JIT backend, work by Sven Hager and David Edelsohn, with help from
    David Schneider

  • Speed improvements, done by everyone

Overall this is about 30 commits daily -- quite a bit of activity.

Things I can do professionally that can help the community

There is however another aspect of PyPy development, which is things that are
interesting, would work but potentially are useful only to some, but are
out of the current focus of core developers. Personally, I'm working full
time on PyPy, even though I did not receive any compensation for it for the
last half a year. I hope to make working on PyPy my full-time job from
both the donation-based income like numpy donations and pypy consulting.
I promise not to spend all the numpy-related donations on surfing addiction :-)

Examples of what I can professionally do include, but are not limited to:

  • Faster json decoder. Fast json encoder was done and details can be found.

  • Finshing matplotlib hacks to support full matplotlib and scipy.

  • Make your favorite module X work on PyPy.

  • Make your favorite module X faster on PyPy.

Additionally, I recently started offering consulting services for PyPy,
please consult my website for details.

Maciej FijaƂkowski

Monday, December 12, 2011

Talent shortage? What talent shortage?

I've been recently reading a lot of articles like talent shortage in Austin
or even more ridiculous ideas, like an offshore office platform on the coast of
california to battle American immigration law.

This all sounds ridiculous to me, since I don't think there is any shortage of
talent and I know plenty of software developers who are more than capable and
will be willing to do work for you, provided few simple things on your side.
The crucial part is understanding that productivity differs vastly between
different people and sometimes people are productive not in a way that you
think they are. They also usually tend to have experience how to make
themselves productive.

Let me explain a bit where I come from. I'm one of the core developers of the
PyPy project and I consider myself relatively competent at least in several
fields of software engineering. My work on PyPy is not only technical -- there
is a lot of helping people, reviewing patches, communicating with
the team, whcih is entirely distributed across the world and timezones. The main
project contributors are or have been living in Germany, United States, Poland,
France, Sweden, South Africa, Italy and whatever I forgot. It does not really
matter. Personally, I claim PyPy has been a very successful project -- it works
reasonably well and it's a good way to speed up most of Python programs. Consult
the speed website for benchmark details. And on top of that there are
constantly very bright people coming and contributing in a non-trivial way,
even though PyPy has next to no money.

My experience with PyPy makes me quite competent in all-things-Python,
open source project management and understanding of nitty gritty details
of performance for the entire software stack, but also made me next to
unemployable for most companies. Let me explain few simple things you
can do to your company
that would make it a dream place to work for me and people like me. This list is
pretty personal, but I suppose the amount of flexibility is pretty universal
for a lot of smart geeks out there.

Cool technologies. PyPy is a really cool technology and I want to stay
bleeding edge with such stuff. That usually require at least two
working days a week to work on open source stuff. Your company probably
uses tons of open source, so why not contribute back? Your geeks would very
likely boost their productivity that way anyway, since most of the time
they will be working on tools they use for themselves.

Tooling. In PyPy we spend about 20-40% of our time on tooling, using
my very unscientific measurments. It's all, from buildbot maintenance,
testing tools to things like jitviewer. Overall it pays off, but it does
not show up in the near term productivity reports. Also, geeks love tools.

Telecommuting. Moving to a new place is hard for many people, especially
when you have to deal with all the immigration b*shit. Personally, I'm also
an outdoor person, so living in Cape Town is pretty ideal, I would not change
that to downtown New York for example. This is a deal breaker for many people,
but even if you don't know how to make people who are 100% telecommuting
productive, chances are they know and they already have years of experience
doing that. It also cuts a lot of costs and noone is thrilled spending
extra 2h a day driving somewhere.

The list of those three items was pretty much a dealbreaker for most of the
companies I chatted with over the past few years. There are companies who
hire remotely, but don't allow you to work on open source or the other way
around. Personally, if someone satisfies all three, I would be even willing
to have a serious paycut, just so I can live the way I like it. Living in
a cheaper place also simplifies a lot in this regard. I also don't fully
understand, why not. If today your company will start working with geeks that
way, it would be able to attract top talent, even if the end product of yours
is not very exciting. It's also a self-fulfilling prophecy -- chances are,
even if your product is not exciting, the work will be because you have
the best people on board. I would welcome some comments from
actual employers why they don't want to do all of that. If someone is
willing to do it, I'm also available for a chat, write to me at fijall at gmail
(yes, double l, gmail has 6-characters-minimum limit) or find me on twitter.