Keeping data vendor APIs simple

Just imagine that every time you went into a shop, you had to change your clothes, because they had a different dress code? Say your grocer would insist on only people wearing yellow, whilst your supermarket would require red clothes only.. Ok, that’s a somewhat ridiculous example, which would never happen. In practice, we can wear whatever we want (within reason), and go into whichever shop we’d like.

 

But when it comes to downloading market data, we basically end up having to “change” repeatedly. Different data vendors will usually have different APIs to access their data. So if we want to integrate a new data vendor into our financial processes, we’ll usually have to contend with a new API.

 

One way to deal with this, is to rewrite lots of your financial analysis code to accommodate changes to a different vendor API. Obviously, this approach is going to be pretty time consuming and won’t prevent you from doing the same thing later. Furthermore there are many instances where you’ll need to use data from many different vendors.

 

Another approach is decoupling between your data vendor APIs and your financial analysis code. In between, we can create a data vendor independent API. I’ve created my own open source version of this called findatapy (free download from GitHub). It allows you to download market data from many different data vendors using the common findatapy API. Underneath, it handles all the complexity to the various data vendor APIs, which are likely to look very different. It also handles mappings between standardised tickers, and any data vendor APIs. When your analysis code calls findatapy, no matter what the data vendor, the API “looks” the same. There’s also additional features underneath which might not be present in every data vendor API, such as the ability to thread requests and cache requests in memory. 

 

If you want to change to a different data vendor, you just need to change one keyword in findatapy, and this could be something in a configuration file. You don’t need to rewrite all your higher level analysis code, which you’d have to do, if you called data vendor APIs directly. 

 

Of course, I’m not claiming that findatapy is unique, pretty sure most funds and banks have something like this internally. This idea of abstracting away complexity is clearly not a new concept. My key point is that even if you don’t use findatapy, it is worth thinking about ways of decoupling your lower level APIs from your higher level analytics code. Having no “findatapy” like library means you’ll basically be tied to one vendor, unless you’re willing to rewrite the code every time around, when you want to change. Even if you use the same data vendor API, and it changes, it’ll be easier using a “findatapy” like library.

 

Other use cases, where we might be tempted to create a vendor independent layer include when we are using databases. Indeed, I’ve done this with my tcapy open source library, for transaction cost analysis, which needs databases to store trade/order data and also for market tick data. I’ve implemented many different adaptors so it works with many different SQL variants for trade/order data. For market data, it works with many databases suited for tick data, such as MongoDB/Arctic, kdb+/q and InfluxDB.

 

In practice, we cannot always create a vendor independent layer for every use case or in some instances, it’ll simply be too complicated and costly, so it makes sense just to use a vendor’s API, even if there’s some lock in. This can be the case, where there might be some very specific vendor features too. However, in the case of fetching/downloading data, having a vendor independent layer does make sense.