Themes from Newsweek’s AI data conference

20170304 AI

The first time I dabbled in artificial intelligence was nearly 15 years ago. I remember seeing that artificial intelligence was the title of a module at Imperial College in the Department of Computing. The term seemed pretty glamorous, so I took the plunge and did the course. We used various techniques to solve puzzles like sliding tiles and games like rock, paper and scissors. For many years, I didn’t think much about the other applications of artificial intelligence. In recent years, artificial intelligence and machine learning have soared on the wings of companies like Google. Indeed, Google even open sourced their TensorFlow code library for machine learning. At finance conferences, machine learning has been a regular topic over recent months, so perhaps it wasn’t surprising that Newsweek decided to organise a finance conference around machine learning, alternative data and related fields.

 

I attended Newsweek’s conference this week and also had the pleasure of speaking at the conference to present my Python finance libraries chartpy, findatapy and finmarketpy. I was also lucky enough to be on a panel discussing the merits of open source software ably chaired by Paul Bilokon, my fellow Thalesian, which we shall come to later in this piece. Overall, the general message from the conference seemed to be that machine learning has promise, but at the same time should not be treated as a panacea for trading markets. As Tamer Kamel from Quandl noted, machine learning cannot be used to “draw water from a stone” and that there was a danger from ignoring statistical rigor. The benefits of machine learning are of course that it can detect patterns in the market, that we might not detect as humans. The key of course is to identify whether this pattern is real and holds up to checks, such as out-of-sample testing, and is not simply an artifact of data mining. Traditionally, a hypothesis is used as the source of a trading strategy, which we can then verify through research (or not) and indeed I still believe this is one of the best starting points for building a robust trading strategy. This notion of having a hypothesis was emphasised by Leigh Drogen of Estimize during his talk at the conference. One point that was noted during the conference, was that finance is somewhat different to other areas which have been “cracked” by machine learning. Take for example image recognition of people, which Michael Beal of DCM suggested. Our faces remain relatively static over time, hence, identifying us is not a continually changing problem. With financial markets, change is effectively baked in, indeed, time series datasets are very important in trading. I do believe that machine learning will play a bigger role in investing, but the key will be to adopt it in such a way, that it isn’t a total black box, which is admittedly a challenge.

 

One particularly important area for machine learning in finance is its use in parsing text data, whether it is gauging sentiment or classifying text to give a few examples. There was an informative talk from Gideon Mann from Bloomberg on this subject. He noted that techniques machine learning were also useful not simply in classifying the topic of an article, but also in the process of scraping the data. Take for example, trying to parse tables from a PDF. They are all different formats and different sizes, and hence a single simple algorithm is unlikely to catch all use cases. An important point, which I’d add is that in these instances of using machine learning, even if the way to the answer is somewhat challenging, we can always understand the output. For example, we can read a PDF ourselves and verify where the tables are. We can read a news article, and classify it ourselves. To some extent these problems are like the “image classification” style problems we discussed earlier, as opposed to the problem of identifying price patterns using machine learning. The subject of identifying the validly of news was touched upon by Mann, and also on a panel on text based analysis. Some important ways of filtering out fake tweets were suggested, such as looking at geolocation tags of users, to see how close they were to a specific breaking news event.

 

On the subject of the use of alternative data more broadly, there seemed to be consensus, that it was no longer in the realm of quant funds only. Drogen, noted that we saw the first wave of quants using these techniques several years ago. The rest of the quant community began nearly 2 years ago and now discretionary fund managers began waking up to alternative data 6 months ago. Indeed, this is something that I can attest to. Years ago when I started looking at alternative datasets, such as search data or news data more broadly, it was rarely a matter of conversation with macro discretionary fund managers. Over the past few months I’ve noticed a shift, with discretionary traders reaching out to understand if there is value with either using techniques borrowed from the quant world or alternative data to get a better handle on what’s happening in the economy. Indeed, if you’re a macro manager seeking to see how quant techniques and alternative data can be used in your trading decisions let me know! The days when alternative data were mainly news data seem to be diminishing. News data has now been joined by social media, satilite imagery, marine tracking, payments data and many, many other alternative datasets, just to mention a few examples which came up during the conference.

 

As mentioned earlier, there was a large discussion on open source software. Open source packages such as pandas have significantly helped the ability of people to do work on time series. In the space of machine learning, open source libraries like scikit-learn, TensorFlow and Theano, have made it much easier to dabble in machine learning. Wes McKinney, the founder of the pandas project, noted that there can be a significant time burden for contributors of open source projects and often the best way for companies to help is to give engineering time to them, rather than purely giving cash. Gary Collier was also on the panel, and discussed how AHL used open source software and also contributed to it. Indeed, AHL’s arctic is pretty popular these days and I use on a regular basis. He noted that very often the best developers were keen to work on open source projects.

 

The world of alternative data and analytical techniques to crunch that data is growing rapidly, I suspect if you aren’t already looking at this area, you likely soon will.