Your data first, then external

It’s always tempting to look further afield when travelling. I’m not suggesting that we shouldn’t travel further afield, it’s just that we often negate what is sitting right on our doorstep. Over the past year, many of us have spent far more time at home then we would ordinarily or in the area where we live. I’ve discovered all sorts of parks and walks nearby, which I must admit I kind of didn’t bother exploring till recently. 


It’s the same in financial markets. It’s always tempting to try to search for something new to improve your investing, whether that’s technology or data etc. Everyone loves a buzzword and it’s easier to sell a nice buzzword, whether you are a vendor or trying to sell the idea internally. I’m not saying that new technologies or ideas are bad. I do think that many investors who can successfully grapple with alternative data for example, can see an improvement in their process, as an example of a new idea or technology that has come to financial markets.


At the same, with all the focus on shiny new things, it is easy to overlook what you already have. For large sell side firms and asset managers, one thing that they already have in abundance is internal data or at least the possibility to collect data internally, which they can combine with external data from vendors. Augmenting external data with unique internal data could result in better signals. A buy side firm can have access to streaming prices from their liquidity brokers. Do they collect this data? In many cases probably not because of a lack of technology infrastructure. However, if they did collect this data, they can better understand their liquidity providers and the situations where some might offer better liquidity than others. As a rule of thumb, always collect more data that you think you will need! Of course, there are costs associated with collecting data, but these are coming down all the time.


Ok, so there’s data that firms could theoretically collect, but don’t necessarily collect that data. Then there is data that they do collect, but is not used much. For a buy side firm, this could include broker reports. Is every single broker report read by someone? Probably not, given the huge volume of broker reports produced daily. This dataset could instead by structured with NLP. We could compute sentiment and also tag reports for associated tickers. We could use this sentiment data as an input into our investing decisions.


For sell side firms, the amount of data they hold is going to be considerably more. Of course, there are many legal restrictions to how data is used (clearly, they cannot disseminate granular client trade data). However, in many cases the barrier is likely to be technical. Different databases will likely be managed by different teams, and in many cases across a firm, it will be difficult to know what data is stored in different databases. Data can end up being seen a resource purely for a specific team, rather than something that can be shared.


Obviously, if data is more accessible across a firm (for example through a shared data lake), it makes it easier to join datasets together and come up with more analysis which utilises multiple datasets. Trying to get this done is obviously easier said than done! As a first step though, having some sort of audit to understand what data your firm actually has is important, and this can be shared across the firm.


External data is usually the first port of call when doing financial analysis, and in many cases, it is the most important resource you can have. However, in your firm you are likely to have internal datasets, which potentially could be useful to, and they can augment the external data you have. Internal data can be just as alternative as external data.