Making Python massively parallel (& burgers)

20170819 Burgers

I like burgers. I suspect I start most of my blog articles with a similar sentence. Most burgers are sufficiently large, such that a single burger will suffice for a meal. However, occasionally you get burger sliders, mini burgers of different flavours, which are also easier to share. It is an obvious point that a plate of burger sliders is likely to end up getting finished quicker than a single burger. A plate of sliders can be easily consumed by several people at once, unlike a single burger. One person cannot take multiple bites of a burger at the same time.


Why do I cite this example? We can draw a parallel (and that pun is very much intended) with solving computational problems in finance. Let’s say we want to backtest a trading strategy which trades multiple currency pairs. The simplest way to do this is with a “one burger” strategy. We simply do all the computations for the backtest currency by currency. This is also usually the slowest way. The “burger slider” strategy, would involve conducting a backtest for each currency pair simultaneously, and then aggregating the result. Unfortunately, Python has the global interpreter lock, which basically means that only one computation can be done at a time, which is in place even if you use Python’s threading library. It should be noted though if you’re largely waiting for IO processes to return, using the threading library can still be useful (or Python’s asyncio library). However, what if you really want get true parallelisation in Python, so you can run computations on more than one computing core at the same time (and get around the GIL)?


The simplest approach is to use Python’s multiprocessing library. This basically kicks off new Python processes, when run on Windows, which can be used for true parallel computation. The implementation of multiprocessing for Python in Linux uses forking, which ends up resulting in a noticeably quicker startup time. It’s also worth checking out other similar libraries such as multiprocess, which has fewer restrictions on the types of Python objects that can be run in parallel (basically related to the way it serialises Python objects). The pathos project, which uses multiprocess library, also allows you to cache the process pools. It also allows you to use other machines to run your computations. If you Cythonise your code, you can also release the GIL (Cython basically allows you to write Python-like code which generates C code which can be compiled).


Another approach is to use Celery, which is a distributed task queue, which allows you to use multiple workers for computation. In between your Celery process and your original Python process, sits a message broker, typically something like RabbitMQ, which sends messages, back and forth (rabbits, celery, etc, I really hope this was an intentional joke on their part). Celery can be set to run your tasks asynchronously (ie. in the background) or synchronously (so your process waits for the return). Setting up Celery is a bit more involved than any of the other solutions above, however, once you’ve installed all the components, I’ve found it fairly easy to use. Celery scales relatively easily, so you can put workers on multiple machines, by changing the configuration settings (I guess that’s the distributed bit!). As a result, it is possible to make your computations massively parallel! Instagram use Celery combined with RabbitMQ extensively to compute many of their computation tasks, which is proof enough for me, that it scales!


So yes, the GIL in Python can provide barriers for doing parallel computation, but there are many ways around it and it is possible to make Python massively parallel. The precise one you choose though will of course depend on the problem at hand.