There are many points which crop up during presentations about data. Usually, there’s some discussion about how cleaning data takes 80% of the time and 20% of the time is dedicated to the “fun” bit of analytics. Then they’ll be some of the attributes associated with big data, such as the Vs. Perhaps what gets discussed most is trying to understand what the value of data is. Here, however, there are no easy answers. It’s a subject which Alex Denev and I will be addressing at length in The Book of Alternative Data, which we’re currently coauthoring.
The key point when trying to understand the value of data (and alternative data) is that its value can differ between who is using it. Let’s take for example, web scraped data. If we have a dataset made up a large number of web scraped pages, it is difficult to say that this has no “value”. However, the value we can extract from it, is likely to differ significantly and depends on our business model. If we take a firm like Google, they’ve obviously been able to utilise web content to create searches, and then monetise it through adverts. A financial data firm might be able to leverage web content to create alternative datasets for funds. A hedge fund can that monetise that through trading. We can see that the data source (web content) was the same in all cases, but the way it it monetised (and the scale at which it can also be monetised) differs substantially and depends on the business model of a firm.
Firms might have difficulty monetising their data, and instead seek to use external vendors to help shed insights on their data. For a fund, it could be seeking to utilise their trade data to understand their transaction costs. They can either build TCA analytics in house (or preferably utilise Cuemacro’s tcapy Python library to do this!) to run the analytics on their trade data on their own servers.
The other choice is to send out all their trade history external and use an external service from a vendor. In return they get back analysis of their trade data. Obviously, in this transaction regardless of whether they have paid a cheque to an external service or not, they are always “paying” somehow. In particular, a fund’s trading data history is in itself valuable, and by allowing an external vendor access to it they are “paying” the vendor. In particular, a vendor can use this trade data to calculate results to give to other clients (provided of course it is suitably aggregated, and the original client has consented).
Even if firms can’t necessarily monetise every aspect of it themselves, A firm really needs to understand the value of their data. Furthermore, if a firm’s data is simply seen as “free” without a value, a firm is unlikely to reap the value of it. If data is defined as being “free” and without value, it is likely that data will not be taken seriously, as no one wants to spend money in the storage and upkeep of their internal datasets, or to do analysis on the data.
Firms cannot do everything with their data, we can’t all be Google, however they can do something! Firms can at least need to quantify what their data is worth. This is an important step in taking advantage of the opportunities and benefits that your own internal datasets may hold.