Which language for quant analysis?

I put up a poll on Twitter asking how people do their quant analysis of markets this week. The choices were R, Python, Excel and Other. In the end, the poll got over 130 votes. I also got a few other responses including “soothsaying” (well, it worked for the Romans I guess, so might be worth a go!). Admittedly, despite the relatively healthy number of votes for the ooll, we should of course note that the sample is likely to have some bias in it (I suspect my Twitter followers are likely to be skewed towards quant). Despite that caveat, it’s still worth thinking about the results. In first place was Excel, but only just with 38% of the vote. Second was Python with 37%. Then there was R on 16% and last was other with 8%.



First, it seems clear that a lot of people still use Excel. This doesn’t seem much of a surprise. After all, Excel is fairly easy to use and the barrier to entry to do some analysis in it is pretty low, once you’ve got an idea about a few Excel functions like VLOOKUP. However, when the complexity of your problems is increased Excel isn’t always the best tool.


At the same time, you could argue, on the flipside that a lot of people also *don’t* use Excel. Indeed, the number of Python quant users is nearly on a par with Excel. Many quant funds are now focused towards Python and in some cases they have even open sourced some of their (non-trading!) code. For example AHL open sourced their Arctic library for storing pandas dataframes efficiently in MongoDB. The open source time series library pandas started life as a project at AQR. The barrier to entry is still higher to use Python (you need to code!), but in terms of quant analysis there are many nice libraries out there to help you. Notably, there’s pandas for time series analysis, which we’ve already mentioned, along with with NumPy and SciPy for number crunching. Of course, you also have open source Python libraries which I’ve written (!), chartpy for funky visualisations, findatapy for downloading market data and finmarketpy for simplifying the task of backtesting trading strategies. Sure, Python isn’t as fast as C++, but it’s quicker to code up stuff in it, and with computing power getting cheaper and more available (via the cloud), you can reduce that bottleneck to some extent. The time you spend coding is the most expensive!


Several years ago there simply weren’t as many libraries geared towards data analysis for Python, which explains why in the past R was more popular. R is still pretty popular, and still boasts a greater number of libraries geared towards statistics. If you are into very cutting edge statistics it’s more likely you’ll find those types of libraries on R.


Whilst, I have coded a bit in R, I’ve always found the syntax of Python easier and a bit more “software engineer-ish” (which is a totally made up word). Furthermore, Python seems better suited to designing larger scale systems. It also seems easier to do non-maths type of stuff with Python too, like web servers and that type of thing, which can be an endpoint for displaying your analysis.


So what could the “other” category be? There’s Matlab, which still has a large following, indeed there’s a lot of legacy Matlab code. Whilst, it isn’t open source, Matlab works very nicely with many other languages like Java and Python, which should help to keep it going for many years longer, but with increasing competition from Python – which being open source has the obvious advantage of being cheaper (at least from a licensing point of view). I’m obviously a bit biased when it comes to languages… I prefer using Python! However, what my survey shows is that there is still a lot of room for the user base in Python to grow in financial markets.


I’m hoping I can help to persuade folks of the benefits of Python over Excel, especially when it comes to dealing with larger more complex financial problems. If you’re interested in hearing about how Python (and my open source libraries) can help improve your quant analysis of markets drop me a message!