Value from your internal data

Going on holiday is fun! As the cliche goes, there are so many places to see. However, in practice, we all end up going to relatively similar places, which are likely easier to get to. If you’re visiting a place, which your friends have been to, my guess is you’ll probably ask them for tips about good burger joints there (ok, maybe that’s just me, who does that). You might also ask TripAdvisor too, although this doesn’t always seem exhaustive, and sometimes throws up odd recommendations. When you visit you might simply stumble upon a good restaurant which doesn’t show up on TripAdvisor, because it is largely frequented by locals.


The key point is that you are likely to aggregate information from a number of sources some of which are online and others which are offline, relying upon just one, won’t give you a complete picture. Yes, the internet does have a lot of information about travel, but a personal recommendation from someone you trust often has more weight on it, than countless TripAdvisor posts. Furthermore, when a number of different information sources point to the same view, it gives you more confidence.


This idea of combining internal data with external data, is a particularly important concept for making investing decisions within financial markets. External data can often be in the form of market data and increasingly alternative datasets which might give you an edge. However, what is often lost, is that your internal data can have considerable value, provided you take the time to look at it. This internal data might consist of internal research reports and e-mails. It can also consist of data which has been curated from external sources, but isn’t being used effectively. For many buy side firms, this can include the multitude of sell side research reports they subscribe to. In practice, it is unlikely anyone has time to read them all, hence a systematic approach might help to extract more alpha from them.


I recently attended a talk by Peter Hafez, from RavenPack, where he discussed a research project undertaken for a buy side firm. RavenPack essentially structured a large amount of internal text information from the buy side firm using natural language processing. They applied network analysis to try to map who were the most influential researchers within the firm, and also applied tagging to articles for metrics like sentiment. They later created equities trading strategies based upon this structured dataset.


Hafez found that typically, the information content from internally curated data at this fund decayed much more slowly, compared to external news content, where the alpha fell away much quicker. This seems intuitive given that externally available news, is likely to be mined by large number of market participants. This contrasts to internal data, which is not widely disseminated.


In practice, I doubt that many buy side firms are systematically trying to monetise the text data they generate as an “exhaust” in this way. However, what Hafez’s research shows is that they might be missing out, if they don’t do this! If you are interested in understanding how to monetise and catalogue your internal datasets let me know – often half the battle is simply knowing what data you actually collect. It’s often easy to forget about what data you have, which is simply laying there unused. It’s time we started to monetise our internal datasets, to help reinforce insights gained from external data.