Gauging how useful alternative data is

20180420 Holland Park
Hands up, if you’ve heard the words “alternative data” being used in quant circles recently? I can see a lot of hands up! I could ask the same questions for machine learning and artificial intelligence, and probably get the same reaction. The crux of any good trading strategy is using data as an input and some analytical technique to coax out a signal. There are of course many common forms of data, most notably price data. However, even with price data there are many variants. You can get daily closing data, minute data, high frequency data, indicative etc. Alternative data refers to all those datasets which aren’t so commonly used in financial markets and are not always in numerical form. Much of this data can original consist of text, images, videos etc. as opposed to nicely structured time series, which traders like to use. Vendors can often structure the data to make it easier to consume for traders.


Of these alternative datasets, perhaps the most well known are the structured datasets consisting of machine readable news, produced by vendors such as Bloomberg, Thomson Reuters and RavenPack. Aside from that, you also have a lot social media based datasets derived from sources like Twitter or StockTwits. There are also numerous vendors who sell data based on satellite photography, which could be of relevance to commodities traders. I could give many further examples, because there are many different types of alternative dataset.


Let’s say we’re a quant fund. We are likely to get approached by vendors on a regular basis touting their alternative datasets. You simply too many alternative datasets to test each one. Hence, we need to make choices of which ones we can test. If you’re a quant fund, what should you bear in mind when investigating alternative data. Below, I’ve listed a few points which I think could help in this process. Some of them are applicable to the whole research process around developing trading strategies too.


Every vendor is convinced they have valuable data: what about giving quantitative examples?

Ok, this is not surprising, that a data vendor will say this! However, it is much more pertinent to ask, have they got any quantitative basis for this belief? One way a data vendor can quantitatively assess this is to create a research paper which delves into the data in the same way a practitioner would! The research could give case studies in how the data could be used to generate alpha from the perspective of a practitioner. Ok, I am somewhat talking my book, given I’ve been commissioned to do several research papers for data vendors such as Bloomberg! However, I do believe that putting yourself in the place of a client (quant fund) is going to put a data company at an advantage (compared to other data companies who simply say their data is valuable, without any statistical justification). Quant funds don’t necessarily need to follow the research paper, but it nevertheless gives a starting point in their testing process. Hence, it should speed up the testing process.


Think about how you could use an alternative dataset first, before sitting at a computer

One way to assess the value of an alternative dataset (or indeed any dataset) is to run some statistical tests on it. At the very simplest level it could be looking at correlations between various markets and the dataset. This won’t necessarily tell you that there’s a trading signal, but it can give you some insights. Even before doing any analysis and sitting at a computer, I would try to brainstorm *how* an alternative dataset could be useful for your strategy. Could you try to use it to improve the forecasting of a certain variable? I’m not always talking about directly forecasting market prices, it could be an economic variable or earnings too. Could the alternative dataset be used to enhance an existing factor somehow?


If you can’t really think how you could use the dataset during your brainstorm, then it might be a struggle to know how to approach getting a trading signal from it. You can of course throw in many datasets together and attempt to fit a solution, without any sort of initial direction, but I’m not sure whether this is the best approach is going to yield the best results. At least with an understanding of the direction of research you want to follow, you are already pruning your search space (and hopefully, taking away possibly spurious solutions).


After testing, you might not find any use for an alternative dataset (just like any other dataset)

It is inevitable that many attempts to create trading strategies will not yield good results. Many alternative datasets you test are not going to yield any signals during your test period. It’s of course difficult to know whether this is a result of you not spending sufficient time on the dataset (and you’ve missed a potential approach to analysing the data), or whether there are no signals at all. As a general rule of thumb, I suspect the more time I spend on a dataset, the more likely it is that my results are just going to be data mining! It’s always tricky to know which trading research projects to allocate your time to, but it’s clearly very important. I do think over time, you tend to get better at answering this time allocation question. Of course, with some datasets, you might have to spend an inordinate amount of time cleaning and structuring it, before you can do anything vaguely related to actually extracting a signal (yes, this step is the one we all hate, but is necessary).


Finding no trading signal lots of time, however, is inevitable, and you just have to accept that! Also a lot of time you can end up with a model which is highly correlated with existing factors you are already trade, which isn’t that helpful. Indeed, within FX if you are not careful, you can end up creating carry or trend models over and over again, even though the inputs might initially seem different, but end up being highly correlated! The ideal case is obviously finding a signal which exhibits little correlation with your existing framework. Just because a dataset is unusual doesn’t necessarily mean it has value. The flipside is that if you do find a signal in an alternative dataset, it is less likely that others have the same signal and hence could be quite profitable.



Alternative data can offer interesting signals for trading, and improve your existing models. However, it still takes time and effort to sort through the various datasets to find something which “works”. As mentioned, being unusual in itself does not necessarily make a dataset valuable for trading! There is no free lunch, but you might find a very tasty lunch in the process (hopefully a burger!) of searching alternative data for trading signals.