Plot millions of points in Python 30x quicker

20170218 Tunnel

I like numbers. However, if you give me a choice between staring at a table full of numbers and a chart, I’ll probably choose a chart. Very often when looking at markets it’s easier to look at a chart, rather than staring at pages of numbers. The great thing with interactive charts is that we can get more easily explore a dataset, zooming in to delve into a time series more thoroughly, if we have enough data. In Python there are lots of great charting libraries. I’ve written my own library chartpy, which creates a common interface for plotting in plotly, matplotlib and bokeh. Simply, change a single keyword to switch between the libraries (rather than having to learn the different APIs of each library).

 

Let’s take for example a plot made up of minute data for EUR/USD, sometimes we might simply want the bigger picture, at other times we might want to zoom in to look at specific episodes on an intraday basis. If we’re looking at several years of data though, this means plotting lots of data: over 10 years this would result in over 3 million points. If you try to plot millions of points in matplotlib, it’ll take a long time to display. Furthermore, try zooming in on these complex plots and it’ll also take time to update. So what’s the solution if you want to display millions of points quickly in Python?

 

Computers are obviously capable of displaying very complex graphics, just think of all the cutting edge computer games, which use the powerful GPUs on graphics cards sitting in our computers. Can we use tap into similar technology to plot complicated market data? Yes, we can! One library which makes it possible is VisPy, which leverages the powerful GPUs on modern graphics cards for creating graphics. I have written a wrapper for VisPy, in my chartpy library, so users just need to change a single keyword to switch from matplotlib to VisPy. I’ve also written a code example, at chartpy code here, to show how to plot line charts in chartpy via VisPy. In my code example, I generate 5 random time series, each with 10 million of points. The example first plots this time series via VisPy and then in matplotlib, so we can benchmark the time differences. It takes around 20 seconds to plot all these points with the VisPy wrapper. It’s then possible to zoom in pretty quickly, by pressing the shift and key and rolling the mouse wheel. With the matplotlib, it took around 6 minutes for the chart window to display, and then it was only until 10 minutes till it actually displayed the actual lines, ie. it was 30x quicker to plot in VisPy! Attempting to zoom in on the charts took even longer. Matplotlib is a great library when you have a relatively small dataset and has lots of great features. Matplotlib is also quite a mature library compared to VisPy. However, when it comes to plotting millions of points, VisPy seems to have the upper hand and it is worth trying it out. VisPy also has some very impressive features for doing very complex graphics and animations (still trying to work out how I could use some of these for representing financial data though!)

 

Obviously, one way around this problem is to downsample the data, so there are fewer points to plot, for example plotting daily data rather than 1 minute data. However, then we can’t zoom in and explore the time series in as much detail. I ran the tests on my Macbook Pro laptop, which is around 4 years old, so I would expect much better results on a newer computer. Admittedly, my current implementation of the VisPy wrapper in chartpy isn’t perfect, for example, I haven’t yet figured out how to add labels with VisPy for dates. However, it’s at least a start and the massive speedup VisPy offers seems promising.. if I get enough interest, I’ll try to make the wrapper more fully featured! If you want to plot millions of points and don’t want to write millions of lines of code, it is possible to get the job done quickly in Python leveraging your GPU, with chartpy and VisPy: it just takes a single line of code to plot a line chart in chartpy with the VisPy wrapper.

 

If you’re interested in contributing to chartpy or the other libraries which I’ve written (findatapy or finmarketpy) for analysing markets let me know. Or if you’re interested in using any of these libraries in your workflow, feel free to drop me a message.