San Francisco is nearly 2000 miles further away from London than New York. However, despite that, San Francisco sometimes feels closer to home. Whilst there are some high rise buildings in San Francisco, these are restricted to a relatively small area around the financial district. Much of the city consists of low rise housing, just like London. It also feels relatively relaxed. At times it can be easy to forget that San Francisco is a large metropolis when you are in one of the many green spaces in and around it. Of course these days San Francisco and nearby Silicon Valley is at hub of the tech industry. Google, Facebook, Apple and a plethora of other tech companies are headquartered in the area. As such it exhibits a major gravitational pull for data scientists. Last week, I presented at the Open Data Science Conference in San Francisco, speaking about using Python in finance. Most of the folks I met, were unsurprisingly, working in the field of data science. However, a relatively small number of people attending, worked in finance. There is a large community of people in the area, with the skills for number crunching datasets, who could potentially put these skills into action in markets. How can they do this, whilst holding down a full time job in other domains?
One answer is by crowdsourcing their “alpha”. Over the past months there has been a large amount of buzz in the finance industry about crowdsourcing alpha from companies like Numerai’s and Quantopian. More recently, Robin Wigglesworth wrote about the recent performance of Quantopian‘s crowdsourced fund, as did Paul Clarke at eFinancialCareers. Since June, returns are down 3%. Admittedly, this is a relatively short period to judge a fund (as pointed out on Twitter by @LadyFOHF). However, with all this media coverage, it is worth thinking about precisely what crowdsourcing alpha is.
Before we delve into the concept of crowdsourcing strategies, let’s first think about the model of a traditional quant fund. Generally speaking they will have a lot of quants working there. They spend their time coming up with new trading strategies or “alphas”. As with any sort of research, it won’t be the case that every researched strategy will end up being used. In many cases, the strategy could just end up gathering dust. There can be a whole host of reasons for this. Perhaps the theoretical grounding for the strategy wasn’t quite as robust as thought. Maybe, the capacity of the strategy is too small for a large hedge fund. It could be highly correlated to existing strategies run by the fund. In any case, knowing NOT to run a strategy is valuable information. After all, a successful fund is as much about avoiding losses, as it is about making profitable trades!
For those trading strategies which pass through all the hoops, they end up in the firm’s overall portfolio, with varying amounts of risk. How you allocate risk to each strategy within a portfolio is of course a big question in itself. An even more difficult question is how you decide to cut a strategy from its portfolio. I was recently on Bloomberg TV discussing this topic. Typically, one approach is to monitor the distribution of returns, as a way of determining if a strategy is behaving *normally* (please excuse the pun…) or has begun to take an unusual characteristics. It could be argued that there can be an element of group think
In a sense we could think that quant funds, are crowdsourcing alpha, but purely within a relatively small team. Within a fully crowdsourced approach, the lower layers of this problem, namely the creation of trading strategies are done by a large external crowd. These could be made up of data scientists from around the world. It doesn’t need to be purely be folks sitting in traditional financial centres such as New York or London (you could argue that many large quant funds however also have geographical variable in their workforce as well). The decisions of which strategies to allocate to, are still taken by the fund itself. Is this approach to creating alphas likely to successful? You could argue that having a very large pool of alphas, might make your life much easier. Perhaps someone will think of a strategy that a small group of quants might not think about. However, on the flipside, maybe these “crowdsourced” strategies could just end up being correlated to each other. It might also be the case that a large number of them could be poorly designed, and you would want to avoid these. Having some training in the area, whether it is on the job or otherwise, about how to develop trading strategies can help avoid many elementary mistakes. Just speaking from a personal perspective, many of the trading strategies I developed when I first started in the industry, were too heavily prone to being data mined. It was only through advice from my colleagues that I eventually began to understand the general approach to developing trading strategies. An interview process can also be used to help weed out these poorly designed trading strategies.
Paul Clarke’s article does mention that there is indeed an interview process for acceptance of strategies within Quantopian’s strategy, which suggests their approach does have some traditional elements. Indeed, only a very small number of strategies end up being selected to be run with real cash. In a sense, we can see a parallel in the recruitment process that goes on in quant hedge funds. Numerai’s approach is very data centric, indeed, on their website they note that “Because Numerai abstracts its financial data, data scientists do not know what the data represents and human biases and overfitting are overcome.”
Finance as a domain differs, from other areas, in that the problem changes (ie. it is non-stationary). As a result it is difficult to come up with robust ideas. By contrast, when we are doing image recognition (or solving many other data science problems), the underlying problem does not change. I would suggest that in practice, having no idea what your data represents makes it very difficult to develop a robust trading strategy and possibly encourages overfitting. The whole point about having a hypothesis is that we can prune away from the infinite search space of strategies, ideas that are unlikely to be robust in out-of-sample performance. It is also key to have a good understanding of the underlying market you are trading in. For higher frequency strategies, you need to have a very good understanding of your particular market, in order to think about liquidity, even if you do adopt a data driven approach. It could be argued that at very high frequencies, where there is masses of data, generated from the market, including order book data, there is perhaps a better case for a data driven approach (perhaps using techniques like machine learning), but this should be within the context of understanding the market microstructure. Often transaction costs can really eat into the returns of such strategies. A strategy for EUR/USD at high frequencies is not going to work for trading small cap stocks. Let’s make an analogy, where you develop a machine learning model to estimate how much water a potato plant needs according to the season and numerous other factors. If we use that model to water a cactus, we will probably end up overwatering it! You need to know your plants! Abstraction is for theory not for practice.
I also think there is a middle ground between developing strategies internally in a quant fund and fully crowdsourcing alpha. This middle ground can involve hiring experienced external consultants to develop trading strategies, to compliment those developed internally – providing the best of both worlds (admittedly, I am talking my own book here!). Indeed, my company Cuemacro spends a lot of time in this area, doing trading orientated projects for funds, who are interested in getting new ideas, which they might not have thought of internally. This can alleviate the group think issue. At the same time, it ensures general approach to developing trading strategies is still robust, given we’ve developed many strategies in the past, which have gone on to be run with real cash.
Other areas of the middle ground can include using crowdsourced data such as Estimize, which crowdsources analysts estimates for equities earnings and macro economic data. We can also use crowdsourcing through “alpha capture”, that is by collating broker recommendations (which many hedge funds do). We can also derive other sentiment from more public sources on social media (such as StockTwits) and through newswires (such as Bloomberg News). Just because you don’t crowdsource actual trading strategies, we can still use the “wisdom of the crowd” as data inputs! We can also have more control about how these are expressed as trades (and in particular we can combine this with our knowledge of the market more easily).
I think it is far too early to say that fully crowdsourced strategies will succeed or fail. However, I suspect that the advantages of quant hedge funds will probably stick around. They have a lot of resources, which are difficult to match. They might also have access to data which is difficult to source elsewhere. They are also working full time on solving these problems. Yes, quantity is good, but more important is quality in my opinion. I would also conjecture that crowdsourcing doesn’t remove the need to have a deep understanding of the markets. Maybe it’ll be the middle ground that will succeed, using experienced external consultants and using crowdsourced datasets, augmenting the strengths of quant hedge funds?