An alternative data meal needs to be cooked

20171001 Magic

Cooking is perhaps not my forte. My culinary abilities do not extend much beyond extremely simple dishes, but I’m nevertheless trying to improve, with the help of Jamie Oliver (well, his new book “5 ingredients” which seems to have been written for people like myself). I can rustle up a simple omelette. I can bake cookies, with copious amounts of chocolate and to my secret recipe. I can cook a steak, and get it to the right temperature. What little I’ve learnt is that the ingredients are important, coupled of course with an ability to cook them properly. An expensive cut of steak, cooked to an inch of being burnt (which I would call somewhat euphemistically as super well done), results in what can only be termed as an incredibly chewy experience and a fantastic waste of money. If you haven’t quite mastered cooking a steak, it probably is best to experiment on inexpensive cuts first, rather than splashing out on those fancy fillet steaks in the supermarket. This subject has relevance for the following discussion…

 

I spoke on a panel this week at Bloomberg’s Enterprise Data Summit and I very much enjoyed taking part in it. The panel was hosted by Jeremy Baksht from Bloomberg, with panellists Rado Lipus from Neudata and Maurizio Luisi from Unigestion, as well as myself (see photo above). We discussed a number of subjects within the realm of artificial intelligence and machine learning in markets, including the topic of alternative data. We had a lively discussion about the general issues involved with using machine learning within investing, and trying to make a connection between the inputs of such as an algorithm and its outputs. On alternative data. there was debate about how it could be used in trading. More broadly, the discussion did get me thinking about the issues involved with analysing alternative data, which I feel are analogous to my food example above. Good data more broadly (including alternative data) provides the ingredients for a trading strategy. However, how it is prepared and analysed is just as important. Here is my short checklist on things to look for when using alternative data below. This is by no means an exhaustive list of how alternative datasets and machine learning can be used in trading. However, it is a start, and I’d love to hear if you have anything you’d like to add to my list!

 

Value of different alternative datasets varies

Just because data is seemingly unusual does not necessarily make it valuable for trading purposes. Let’s say someone tells you (very unrealistically), “I’ve got a dataset which measures the heartbeat of every ant in the world”. This ridiculous example sounds exceptionally cool. However, how this could be useful from a tradable point of view? That is somewhat questionable! What makes it valuable are the types of insights it can provide in markets, that perhaps we can’t do in such a way with more common datasets. Can we also come up with a good rationale for using the data (what do we want to know from this dataset)? We give a list of sample ideas below for areas where alternative data can be used to help us when trading.

  • Can it help us measure something better than conventional data? For example, can it help us to predict the monthly US employment rate? Can we measure FX flows as they happen, rather than waiting for lagged positioning data?
  • It is possible to structure (and process) the dataset at all in a reasonable fashion? We might have an interesting dataset, but we cannot think of how to structure it into something more useful. It could be too big. Or the time and expense of cleaning will be too much to make it useful. We might be collecting masses of data from an exchange at a tick level, but latency issues might make it difficult for us to heavily compute intensive analysis on it, in a reasonable timeframe to trade at a high frequency.
  • There are obviously numerous other examples..!

A good hypothesis will help to cut down our search space of ideas, and reduce the chance that we are simply data mining. I would make a parallel between using alternative datasets and the use of truffles in cooking. It is not helpful to stuff truffles in every dish, just because it is expensive. Truffle ice cream does not sound that appetising to me. However, careful application of truffles in the right setting (with fries!), will make a dish better!

 

Structuring an unstructured alternative dataset is key

Often alternative datasets can be extremely big and consist of unstructured data, such as text. Cleaning a dataset will be very time consuming. Once we have cleaned the dataset, we need to think of the best ways to structure the dataset into more usuable forms. For example, we can better classify our dataset, a process which often involves some element of machine learning. For example, we can use natural language processing applying sentiment analysis to text to convert it into numerical form, which can be combined into time series, which are key building blocks for trading strategies. We can also add other tags to help us describe our dataset, such as identifying entities in text. Simply throwing flour in an oven on its own is not going to result in lovely bread! We need to combine it with other ingredients first, creating dough etc, before baking it in an oven.

 

Ready-made structured alternative datasets can save us time

Often we can save time if we use ready-made structured alternative datasets produced by vendors. This enables quants to concentrate on spending time understanding on creating a trading rule from a much smaller dataset, rather than spending a lot of time cleaning data and structuring it in a more usable form. Even if your company has the capacity to convert an unstrucutred dataset into a strucutred dataset, the key question is whether that time (& money) could be better spent elsewhere? Is it really worth trying to make my own chocolate for baking in my chocolate chip cookies, when I can buy excellent quality good value chocolate already (which is likely to be much better quality than anything I can make in the time I have)?

 

Alternative datasets should complement not replace conventional datasets

Conventional datasets like those using price data are still very important. We should not throw away all the factors we have used for trading for years, simply because we have alternative data. Instead, we can use alternative datasets to create new factors (or features as they tend to refer to them in machine learning) to compliment existing datasets. If you haven’t sufficiently explored using conventional datasets, explore that arena first before delving into alternative datasets. I would also extent this to analytical techniques too. There’s no point jumping to some funky machine learning techniques, if you haven’t tried simpler analytical techniques first (like linear regression). You win no prizes for unnecessarily complicating a trading strategy.

 

Still need to do conventional quant work to create actual trading strategy

Dealing with alternative data generally has more steps, because we need to go through additional processes like structuring. However, once that is done, a quant still needs to spend time to create additional indicators and creating a trading rule. From that perspective, we still need the traditional quant skillset, which is both technical and trading orientated, in particular having an understanding of how markets behave. It certainly doesn’t mean we can suddenly let the data do the talking and ignore the market (disguised data mining)