Ideas on pricing alternative data

We all know fizzy drinks are awful. They’re acidic, thanks to all the CO2. They are usually packed with copious amounts of sugar. Your teeth are unlikely to be fans of either carbonic acid or sugar. Fizzy drinks make it very easy to ingest large amounts of sugar very quickly in a few gulps, which isn’t good for you. Despite all this, I admit that I do drink Coca-Cola every so often, because it tastes good! In pretty much every vending machine I’ve used, there is something I’ve noticed. Coca-Cola is nearly always more expensive than Diet Coke, whether it’s in the US or Europe. Usually it’s something like 10 cents more. People clearly like Coca-Cola more than Diet Coke. One explanation is that, whilst Diet Coke doesn’t have sugar, it also tastes absolutely awful (I realise I may well get critical comments for this incredibly valid observation). I assume that the additional demand for Coca-Cola, means the manufacturer can charge more.


Fizzy drinks are commoditised items. Hence, for pricing there are likely to be billions of transactions and a huge amount of data. There are many other types of similar drinks (Pepsi!), which keeps price levels in a range. It is likely that there are lots of seasonal patterns in the data to help understand how much to supply to the market. Financial markets are all about pricing. What do you think an asset should be worth? Too cheap, buy it. Too expensive, sell it. There’s a market because everyone has different views on what the price should be. One of the main drivers for trading in markets is the use of data. Some of this data is very commoditised and there is a big data market. However, when it comes to  alternative data, it is extremely difficult to know how to price it.


The problem of data pricing (in particular for alternative data) is one that we are tackling in The Book of Alternative Data, which Alex Denev and I are currently writing. It will be published by Wiley in 2020 (can already be pre-ordered on Amazon). Whilst the topic is far too big to go into much detail here (and you’ll have to wait for the book to learn more), it is possible to make a few observations.


The number of transactions is a lot lower than for commoditised datasets. Furthermore, the datasets are all very different. On the one hand you might have datasets like satellite data, then you might have news data. In general the number of vendors producing each type of dataset is much smaller, and they can also vary significantly. Hence, it is difficult to get market based pricing when all the datasets are quite different and the number of transactions are sparse, although we might try to use this as a very rough proxy. If we think our dataset is worth 10mm USD, and the most a fund has ever paid for a dataset is 100k USD, good luck with that! A lot of the transactions are likely to be bilateral as well, where pricing is not public, making price discovery more difficult. It is like trying to understand the price of something like NOK/SEK based on the moves of something totally unrelated BRL/CLP, and at the same time having little or no price data for either.


There is likely to be a different value placed on the data between different buyers. For some buyers, a dataset might be super useful, yet for others it is totally irrelevant. If I trade commodities like wheat and corn and I see a dataset which lets me predict Apple earnings it is likely to be irrelevant compared to datasets which give me an idea of crop yields for wheat and corn. A quant fund is less excited by a dataset which can only be used for one company versus one that has thousands of tickers. By contrast a discretionary fund which has substantial holdings in that one company, are likely to be much more interested. The extent to which a fund can monetise a dataset will vary considerably, unsurprisingly, this will impact how much they are willing to pay for a dataset.


The pricing of alternative data is a tricky problem to solve. Keep an eye out for the extensive discussion in our book about this. We delve into the problem in a lot of detail, with a lot of suggestions on how to do it effectively. In the meantime, if you have any ideas on the subject, feel free to drop me a message!