Installing Python for financial data science

What’s the most important part of a holiday? Well, if you can’t get to the airport on time, it’s unlikely you’ll be going anywhere. When it comes to learning Python, the most important thing is to install Python in the first place!

 

So how do you install Python, together with all the libraries you might need for financial data science? You could avoid installing Python on your own machine, and use something like Google Colab, which is a Jupyter notebook already setup on the cloud or something like Repl.it, which is another online Python option. In many online installations of Python, you might not necessarily have all the libraries you want installed, so you’ll have to install them in addition though using tools like pip or conda which we discuss later.

Installing Python on your own machine (or a cloud machine)

There are many Python distributions you could use (and I have to admit I haven’t used them all!), and you might find there’s often already one installed on your computer. Whenever I teach Python, I tend to recommend downloading and installing the Anaconda Python distribution. One reason is that it’s already got a lot of data science libraries included in the standard Anaconda installation. It also has a few GUI tools, although I must admit, I tend to prefer just to use command line tools (which are much quicker).

pip vs conda to install Python packages

Whatever Python distribution you choose to use, it’s likely you’ll need to install additional libraries. The most common way to do this is using pip, which automatically downloads the library and installs it. However, let’s say you want to install a library like blpapi, which is Bloomberg’s open source API. If you use pip to install it, you might need to do a number of additional steps, like setting your environment variables etc.

 

By contrast, if you use conda, which is included with Anaconda Python, to install the blpapi, it’ll do a lot of these additional steps for you making it much easier. conda also seems a bit better at handling version conflicts with various libraries, compared with pip. Furthermore, conda makes it a bit easier to manage your own different Python environments (if you don’t use conda, you can do something similar with pip and virtualenv).

 

It’s generally a very good idea to create your own conda environments with various libraries, rather than using the standard “base” environment. You can have multiple conda environments for different Python versions, and furthermore, if you end up breaking them, you can easily delete them (with the “base” environment, it can be the case that you need to resort to reinstalling Anaconda). conda allows you to easily export you conda environment to a YML file, which can used later by anyone to recreate the exact same conda environment which you have.

mamba to the rescue

conda is a great tool, however, it can become very slow when it’s trying to work out which libraries to install to avoid a possible version conflicts. Sometimes this is just a few minutes, or very much longer. mamba is a drop in replacement for conda. The key difference is that it’s very fast, and indeed when I’ve used to install new packages, it seems much faster than conda. I only recently started to use mamba, and I really wish I’d started to use it much earlier, as it would have saved me a lot of time.

Is there a premade conda environment for financial data science? Yes!

For my Python teaching, which revolves around workshops for Python for finance, as well as more alt data oriented workshops, I’ve spent quite a bit of time creating a conda environment for my students, which I’ve checked against my course code (especially any versioning issues). It has a lot of the packages you might use doing data science and finance. I know it can be a real hassle, hence I’ve open sourced my py37class conda environments for Windows, Linux and Mac. You find full instructions on how to download and install these on my GitHub teaching site.

 

My py37class conda environment includes the standard Anaconda packages like pandas (for time series), as well as my own open source finmarketpy library, for backtesting trading strategies, findatapy for downloading market data and chartpy for visualization, and other finance specific libraries like blpapi.

Conclusion

We discussed the benefits of installing Anaconda Python, noting how it’s designed for data science. We also talked about the differences between pip and conda for installing Python packages and managing different Python environments, as well as the speed advantages of mamba. Lastly, I talked about my own conda environment, which I’ve open sourced, which makes it easy to get started with financial data science in Python.