Tips for maintaining Python code

I like making cookies. It’s kind of fun, but it takes a while, mostly because I’m very slow at weighing. I also insist on chopping up at the chocolate chips myself, which takes me ages: I still haven’t got the knack of chopping quickly. There are of course pre-made chocolate chips available, but the quality isn’t generally as good. However, it’s all worth it when the cookies are piping hot, and it’s time to eat them! When it comes to coding, developing is a bit like cooking and eating all in one, it’s the fun bit (well, I enjoy it..)! Yes, it takes time, but it’s pretty cool once you’ve developed a solution that folks find useful. However, developing isn’t the only side of coding.

 

Maintenance is the flip side of coding. I don’t think I’ve ever met a developer who relishes maintaining code as much as writing it from scratch. Of course maintaining code is incredibly important, and is necessary to keep it running properly. Maintenance can end up costing far more than developing code, and costs can balloon if you’re not careful. It’s probably the main reason that using outside software products can be so much cheaper than developing stuff in house. Indeed, it’s one of the selling points of Cuemacro’s tcapy FX TCA library for clients. We maintain the library, so you don’t have to incur the massive costs of doing it yourself!

 

In Python, there are many things you can do to make your code easier to maintain (and some of these points are also applicable to many others programming languages and are not specific to finance too). This isn’t exactly an exhaustive list below, but hopefully, it’s a start.

Use conda to maintain your Python dependencies where possible

The great thing with Python is that there lots of libraries like Pandas which means you don’t have to reinvent the wheel, and they get regularly updated fixing bugs and adding new functionality. The problem is that updates can sometimes break your code. Furthermore, it might be the case that some libraries you use, get broken when you update heavily used dependencies like Pandas or NumPy. The conda package manager which comes with the Anaconda distribution of Python helps to manage version conflicts between libraries. This version conflict isn’t unique to Python (remember DLL hell). Also conda can more easily install binaries for libraries. Also conda makes it pretty easy to rollback your environment, before any changes. The most common Python library installer pip is more easily impacted by version conflicts. For some libraries you will only have pip installations. Don’t install the same libraries with pip and conda too, unless you really want to pull all your hair out.

Use different Python environments

Conda allows you to create different Python environments with different versions of Python and libraries. I’d strongly recommend you create these environments, rather than installing everything in your base environment. For pip you have something similar called virtualenv. You can clone an environment and make changes in it, to help with testing. You might also consider using Docker to wrap up non-Python dependencies you use like databases. Upgrades in non-Python dependencies can also break your code too.

Write test cases to find bugs

Libraries like pytest make it easy to write test harnesses for your code. Basically, you can write small code snippets to test functionality in your code and then automatically check the output that it works as expected. You can run these tests when any changes are made either to your code or any dependencies. It should help to pinpoint any bugs introduced by changes. Recently I started work on doing a full scale upgrade of the libraries I use for tcapy (including later versions of pandas, upgrading to Python 3.6 too etc.). The test cases helped me to identify bits of code I would have to change as a result of updating the Python dependencies. Make sure tests don’t take ages to run too and that they cover a lot of the functionality of the library.

Do you really need to maintain this Python code library? Find external code solutions instead

If you can replace your Python library with something else off the shelf, have a think about this too. It might be the case that many years ago there were no similar libraries available, so you had to build your own. Even if you have to buy it, you’ll probably save yourself a bundle of money on maintenance over the years to come. Before Pandas was developed, many funds had their own time series libraries. Since then, many have replaced their in-house time series libraries. Why? It’s a lot cheaper to maintain Plus, loads of people know how to use Pandas so it’s easier to hire developers to use Pandas, rather than wasting time training them to use a proprietary library. It’s always more challenging to maintain a proprietary library which has less people looking at the code, than a library with many firms looking at the source code. There’s a tendency for quants to want to build everything. This is costly, and wholly unnecessary for more generic functionality.

Think ahead about software design when developing

It’s possible to hack together a bit of code to work on a specific problem. However, the difficulty is designing your software in a reusable fashion, thinking about things with your software engineering hat. If you spend time properly designing your software library at the start, it’ll be easier to make changing in the future. It isn’t easy, but it is worth the effort. It will save you time when it comes to maintenance and needing to add new features. Also write comments in your code, and proper documentation. No one will remember why they wrote code in a specific way, in 6 months time, so comments help.