How much to pay for alternative data?

20180107 What should data cost

How much is something worth? The simplest definition is the price someone is willing to pay for it. Is a Leonardo da Vinci painting worth four hundred and fifty million dollars? Someone was willing to pay that much for it (and another party was willing to sell it them for that amount). Such a market is of course hugely illiquid. We don’t see da Vinci paintings being bought and sold on a daily basis. This is not the market for EUR/USD! Let us consider a totally different asset, a company. What is the “capital” which makes up a company? (I’ll give a disclaimer here, I’m not an equity guy, so if you are, I suspect the next few sentences are going to seem overly simplified and not meant to be an exhaustive list):


We can break down the capital of a company, we can think of several parts:

  • Fixed capital – The buildings, factories and actual items owned by a company.
  • Brand capital – The value of the company brand and the relationships it has with clients.
  • Liquid capital – Liquid assets held by the company, such as cash, equities and bonds.
  • Human capital – The folks who work there, give the company a value.
  • Analytical capital – This comprises of the intellectual property developed by the company, which can consist for example of software, specialised manufacturing techniques etc.
  • Data capital – Data owned by the company.


I’m not going to attempt to discuss all of these. Instead I’m going to concentrate on the data capital part. Unfortunately, I can’t give some easy formula for this, it is a difficult problem, however, we can at least think about the factors which influence how much data is worth. If we think of every corporate, they all possess a certain amount of data. Often this can be collected as a result of their daily business. For tech companies such as Google, the value of their data is huge and they have obviously been adept at utilising their dataset in many novel ways. The sheer amount of data they have collected is a massive asset: the users of Google’s search engine are essentially given access to it for free, in exchange for populating the dataset.


Let us instead think of more traditional companies, whose primary business is not tech. Take a delivery company for example. They might track all their deliveries through the day. They provide this data to their clients so they can see when their parcel can arrive. They can also use the data to optimise the routing of their deliveries to save delivery and cut costs. Indeed, many delivery companies do this and it can save large companies hundreds of millions of dollars. Given they are already collecting this data, we might ask are their other more novel ways they can monetise their dataset, given that they are already spending a lot to collect this so-called “exhaust data”? Potentially, they can sell their data externally? Obviously, they can’t sell all their dataset to anyone (eg. a detailed dataset sold to a competitor, might not be the best idea!). Furthermore, they need to take into account legal considerations from selling the data to “data takers”. In many cases, the data will be need to be anonymised in some way, and also aggregated to allow it to be used externally. One buyer of alternative data in recent years has been quant hedge funds, and increasingly more discretionary traders. They already have access to lots of traditional market data, but are keen on accessing more unusual datasets, so called “alternative data”, which can give them an edge to understand markets.


I’ve spent many years looking at many datasets which can be considered as alternative trying to understand how it can be used to trade markets. The term alternative data, somehow was coined much later though! Most recently I did a project for Bloomberg, to examine how their machine readable news can be used to trade FX (get paper here). News data is probably one of the most common alternative datasets. You also have social media data, data collected from satellite images, mobile phone tracking data etc There is a very long list of alternative datasets derived from many sources! Also at Cuemacro we’ve created several exciting datasets, including one which quantifies the sentiment of the Fed from the various speeches and statements they realise. One question, which always crops up in discussions in this space, both from vendors and also hedge funds, is what should be the price of this data? For traditional market data, price discovery can be easily, simply because it has been around for a long time and there are many vendors essentially selling very similar price data.


However, for alternative data, there are often fewer vendors for any one type of dataset, in some cases, there might be only a handful or just one. Vendors (understandably!) think their alternative datasets are all very valuable. On the flip side, for a hedge fund, uniqueness by itself doesn’t mean they should pay millions for every alternative dataset. There are several metrics, one of the most important, is does this dataset enhance the alpha for the hedge fund? If there’s a big difference, then a hedge fund will be willing to pay a lot for it. If not, no matter how unique it might be, they won’t. This will also differ between funds, some might find an alternative dataset incredibly valuable and others simply won’t value it as much. It also depends on your asset class, a dataset which is great for FX, might not so be good for single stock equities and vice-versa. Some might be easier to use in a purely systematic framework, whilst others could be more amenable for usage by discretionary macro traders. There are so many different ways to number crunch alternative datasets, that different funds might come to a different conclusion given a limited time they have to evaluate that dataset. Time is of course a commodity, and typically a fund will allocate at most a few weeks to analyse a dataset, to see if it worth buying. Hedge funds get approached about many different datasets on regular basis. Hence, it can be resource intensive to evaluate them (and often not possible to go through every single dataset).


There is also the question of how the data can be used in trading strategies. If it can only be traded in very low capacity strategies, then the value of the data will decline if it is sold to many different accounts – the alpha edge might not be as significant in these instances. The size of the fund is also going to be a consideration. If a large number of traders have access to dataset, it will likely be priced at a higher level. The versatility of the data can also be a factor for pricing. If it is a very large dataset, with many potential uses it will be easier to justify a higher cost. Data quality and history are very important. If there is no history, it can be difficult to evaluate it. If the data quality is poor (eg. lots of missing values and unclean values), it will also take extra time to pre-process (and perhaps it won’t be worth it in the end). There are all just basic guidelines, and there are many other factors which need to be considered, but there are just a few which come to mind.


One other major source of data which could be useful for hedge funds and asset managers are of course are their sell side counterparties. Sell side firms sit on massive datasets. They are not only market makers, but also “data makers”! Their market making desks have information about all the trades they have done with their counterparties. With MiFID II, they also need to record quotes that have been requested (there is also the question of transaction reporting which is included in MiFID II). They have all the reports generated by their research desks, e-mail reports sent by their trading desks to clients etc. Obviously, any sort of data monetisation would need to be done in a way which adheres to client privacy. Typically, this would mean that data would need to be aggregated, before being distributed.


There are no “correct” answers for gauging a price for alternative data and it is a difficult problem to solve. In particular, its value will vary from from trader to trader. However, if we have an understanding for the factors which buyers and sellers look at, it can give us a better idea of the general direction. In the coming years, we are going to see many more corporates monetising their data, by selling to external parties, such as hedge funds. I can imagine in the coming years, a significant amount of revenue for corporates will come from selling their data. Maybe in a decade, we’ll think of banks, not only as providers of liquidity but also as data companies. I also think more traditional data companies, will increasingly be looking towards providing alternative datasets to their clients. I suspect, we might see a bit of blur between “data takers” and “data makers” too.