Know what you don’t know with data

20170528 Know

Let’s say you had to identify a city you’d never heard of on a map. Let’s take a city chosen at random, Stockton and you wanted to identify which country it is in. You have with you a list of every single town in the world, alongside its country. A simple brute force way to find the country, is literally to go through every town on the list till you get to Stockton. You can obviously use some tips and tricks to speed up this, for example, sorting the towns alphabetically, which should make it easier to search through subsequent times. However, if we think about it, the name kind of sounds like its an English name, a simple heuristic, so we could speed up our search if we went through English speaking countries. (Stockton is in California by the way!)


When it comes to analysing markets and developing trading strategies, we have many different types of approaches we could use. One way would be to throw in lots and lots of variables as features and see if we can make predictions about the price, using machine learning techniques, and try as much as possible not to use previous knowledge about the markets. By not enforcing any sort of structure between the variables and our prediction, it’s as though we are using the “brute force” case of above. Essentially, we not trying to benefit from the structure in the dataset that we might know beforehand. Admittedly, there is some choice in which variables we select, but by and large, we are trying to get our algorithm to choose the variables, rather than enforcing which ones are the important ones. One advantage of this, is that we might find some unusual market relationship, that we would not have ordinarily thought of beforehand. The difficulty is trying to understand the validity of such a solution.


However, we can also approach creating a trading strategy, by using some prior knowledge of the market and in particular a hypothesis which we might have to help direct our search (like our English language heuristic earlier). I recently went to a talk by Robert Carver at QuantCon, where he called this “tacit” knowledge about the market. Obviously, not every hypothesis is going to yield something, but we do somewhat prune our search space by doing this. We focus on what we do not know about markets, rather than what we don’t know! At the same time, we can also reduce the number of spurious solutions to our problem which might “work” in-sample but have trouble out-of-sample. We can also use knowledge about the market to understand whether our solution is practical. It might well be possible to predict price action, but totally fail to monetise this, because of liquidity considerations. I have lost count of the number of times I’ve found a very exciting trading strategy, only to find that the addition of transaction costs renders it loss making, particularly for higher frequency strategies. Yes, I’m sure data can help us understand liquidity considerations (as well as simply talking to a trader in the market) but at some point we need to think to check this point.


There is no “correct” way to build a systematic trading strategy, although I do think there are “better” and “worse” ways to create them. I do believe that being able to have a hypothesis is an important place to start for the process, rather than trying to solve the problem without any views at all, which results in very large potential search space for ideas. Furthermore, we need to have a good grounding in how markets work, to understand factors like liquidity. Data can tell you a lot about the market, but if anything, sometimes, knowing what we don’t know is just as important. As ever, uppermost in our mind, needs to be the thought, the market is very good at making things look nice in-sample, only to teach you a very costly lesson with real cash.