Tips and tricks to speed up Python

20180203 Python

I can’t walk 5km that quickly. I could just blame it on my legs. Alternatively, I could instead try running, and hey presto, I can cover 5km a lot quicker! Python isn’t the fastest language out there. It’s certainly quite a bit slower than C++ or Java. If my Python code is slow, I could just blame it on Python in every case, or as with my walking analogy, I could try to “run” (ie. thinking of ways I can speed up my code). In practice what are the things you can try to “run” with your Python code? If you are analysing financial markets, in particular at higher frequencies, you can end up with large quantities of data which need to be number crunched. When prototyping a trading strategy, we want to be able to code it up quickly, but at the same time, we don’t want the execution of the backtest to last ages. We need to bear in the mind that there is often a trade off between readability in code and optimisation. The more we optimise, the less easy it is to read and maintain that code. I’m going through a few ideas which you can try to speed up your Python code. The list is of course not exhaustive, so it’s more of a starting point, based on stuff I have read, and also based on my own Python coding.


Before even starting to optimise your code, find where the bottlenecks are in your code


I find it quite useful to use a code profiler, which gives a breakdown of the execution time for each method and also how many times it is called. I use PyCharm IDE, which has a built inversion in the Professional Edition. Or you can use the timeit function in Python (sometimes I find simply looking at a timestamped log can be useful too). The bottleneck in the code is not always where you might think it is. If you don’t identify the bottleneck correctly, you’ll end up wasting time optimising the wrong bit of code (even if somehow it might make you feel smarter, to reduce the number of lines of code!). The famous computer scientist, Donald Knuth had something to say about this:


“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.”


Using for loops: Vectorise your code with NumPy or try Numba


If you are doing calculations with for loops in Python your code will be very slow. If you can rewrite you code in a vectorised fashion and do in NumPy it should be a lot faster (given that NumPy is mostly based on faster C code). Vectorising code is something that Matlab users are likely to be familar with. Numba can also help to speed up calculations you’d usually use for loop for. You can target either CPU or GPU with it. GPUs will help when you are doing very large computations.


Using pandas? Try using NumPy for some calculations instead


Pandas is a great library in Python for working with time series. However, it can often be slower to do calculations on Pandas based time series versus the underlying NumPy arrays which are embedded in pandas dataframes. Often, by accessing the NumPy arrays directly (usually using .value) and doing calculations on them directly (or accessing individual elements), can be much faster. You do however, lose some of the error checking which pandas does behind the scenes (if there are for example NaNs in your time series). Accessing individual elements of NumPy arrays can be quicker too. I don’t use NumPy in this way everywhere, just where there are particular bottlenecks, that need to be cleared up. Pandas based code tends to be higher level and a bit easier to follow. I haven’t done benchmarks but some distributions of Python have optimised versions of NumPy (for example Intel).


Use your computing cores


Modern computers tend to have quite a few cores, so you run calculations in parallel. After vectorising code, you might want to think if you can make it run in parallel as a further speed up. There are quite a few libraries that you can use to accomplish this. Because of the GIL (global interpreter lock), Python threads can only run one at a time, even if it looks like they are in parallel. This is OK, when your code is IO bound (eg. web scraping). However, if you want to actually run compute intensive calculations you need to resort to other ways. The multiprocess library, for example, kicks off new Python processes to run parallel calculations. There is overhead in running an additional process. You can also use libraries like Celery, for distributed computation. Obviously, if we distribute the computation across many machines the speed of the network could be a further impact the speed of interprocess communication. If you mostly use dataframe like structures, Dask is another choice. One thing I would say though, is that you don’t need to always resort to making code parallel as a first step.


Plotting can be slow, but we can speed up


Plotting visualisations can be slow. If you need to speed them up consider using GPU accelerated libraries, like VisPy. Plotly also has the GPU accelerated ScatterGl module too, which can do a few chart styles faster. I do really like Plotly, but sometimes when constructing the JSON necessary to feed into the plotter it can sometime be slow with lots of points. I’ve sometimes found that I’ve needed to create my own versions of the plotly code and optimise it to make it quicker. For example, when constructing Plotly candlesticks JSON based representation, with a few NumPy tricks I was able to make it closer to 10 times quicker, to generate the JSON. I’m sure there must be lots of examples within the Plotly code base (and indeed other plotting libraries), where this might be the case.


Using a cache and careful about import statements


Using a cache for stuff which takes a long time to load and calculate can be helpful. For example, you can use Redis to cache dataframes (after some compression), rather than reading from disk all the time whether from a database or flat file. There is a limit of course to this. If your dataset is a terabyte in size, caching it all in memory, isn’t going to work, no matter how smart your compression is (unless you have absolutely loads of memory). We need to be aware of memory constraints! Also when persisting objects in memory, you reduce the overhead when instantiating them. If you import loads of modules in a script, it will take time to initialise. Hence, it’s better to keep that object in memory, so don’t keep having to run those imports. Also only import what you need.


You can rewrite some of the Python using Cython


Cython allows you to write scripts in subset of the Python language which can be translated into C and then compiled into machine code, which is faster than interpreted Python. You can also specify types to help speed it up. Cython also allows you to “release the GIL” for parallel code. My background is primarily in typed languages (especially Java), so I’m not particularly fussed about that (I kind of prefer having to define data types!). However, if you’ve only used Python, this might be something slightly different to what you’re used to. Again, the aim is not to Cythonise your whole Python code base, but to pick specific areas which will benefit.


If you really need something very fast… needs experience too


You’ll have to spend some time optimising the Python code. Obviously, if you end up spending in ordinate amounts of time doing it, maybe you should have just used C++ (or maybe Java), in the first place! The main point I do want to stress is to use your time effectively to clear up bottlenecks. Optimising code, which doesn’t need to be optimised, just makes it more difficult to read, without the benefits of speeding anything up. Re-read the Donald Knuth code again too! The above ideas are just a small selection of things to try, but ultimately, getting more experience in coding is what will give you ideas of what precisely to do to clear an execution bottleneck.