Alt data quants aren’t left out by WallStreetBets

Over the past few weeks, the retail traders have been in the headlines, in particular because of the massive move in GameStop stock, which has squeezed shorts and also the posts in r/WallStreetBets, a subreddit which is described as “Like 4chan found a Bloomberg Terminal”. r/WallStreetBets has featured a lot of posts on GameStop stock recently. I recently read a great article by Brent Donnelly which seeks to describe the “investment information ecosystem”, including r/WallStreetBets.

 

Brent seeks to break up how financial news is disseminated into several groups ranging from the more traditional Wall Street sources to Off-Off-Wall Street which is more dominated by retail traders.

 

  • Wall Street – Bloomberg, Reuters, FT , sell side research etc.
  • Off Wall Street – Yahoo! Finance, Fintwit etc.
  • Off-Off Wall Street – r/WallStreetBets, Tiktok Finance etc.

 

A lot of my work revolves around finding and researching alternative datasets working with data firms and funds, so I though it would be worthwhile writing about various alternative datasets, that quants can use to understand Wall Street, Off Wall Street and Off-Off Wall Street!

 

Are quants left out by all this information frenzy? Wall Street sources

Well, they need not be! After all, many of these information sources above can be read electronically. If we start with the traditional Wall Street sources, like Bloomberg News, Reuters, FT etc. they will usually be accessible by machine readable APIs. I’ve done projects for several data firms including Bloomberg and RavenPack (Dow Jones News), and have looked at several news datasets out there primarily for macro based trading. In The Book of Alternative Data, we’ve got a section on news based datasets and a study on Bloomberg News for trading FX and forecasting FX vol. There are many different angles you can use with machine readable news, including looking at the sentiment of articles, their novelty and their volume. News volume and readership data can be useful for understanding market volatility.

 

There’s obviously an additional cost associated with these datasets. However, in most cases, the result is a nicely structured dataset, which has already been tagged with tradable tickers, sentiment and so on. Sell side research in many cases is also being ingested in machine readable form via APIs. The question I usually get asked is which news dataset should I buy? In practice, it depends on your budget, not just for the news data itself, but also in terms of the cost of researching the dataset and integrating into your tech stack. Whilst there might be news “common” to many sources, there will also be scoops/exclusives to only one particular news organisation. These are precisely the news articles that might move the market more. The more news pipes you subscribe to, the more of these scoops you’ll capture.

 

What about Off Wall Street?

What about fintwit? Sources like Twitter, can be accessible free via their API and an API key from Twitter. However, the free API is a bit limited, and the paid API allows you to download a lot more history and the number of hits you do are not as limited. In The Book of Alternative Data, we had a section on using Twitter data to help forecast NFP. Suffice to say, the free API wouldn’t have been sufficient for this use case. Here’s a comparison between the free, premium and enterprise Twitter APIs. It’s also worth noting there are many data firms which have created data products on top of Twitter, such as Bloomberg, which has a news stream collected from finance based Twitter accounts/ie. fintwit in their event driven news product.

 

You can also use webscraping. There are lots of great tools for this in Python. However, this faces several problems, notably, many websites will not allow this through their terms of use – there are all sorts of legal questions, which we discuss in The Book of Alternative Data. Furthermore, even if it is allowed by terms of use, there will usually be at least some manual element associated with it. It might actually end up being more cost effective to find a data firm which already collects the data and structures it for you in a more useable form.

 

And Off-off Wall Street?

As Brent Donnelly, notes in his article, stocks trending on r/WallStreetBets can rally a lot, and he used GameStop as an example. If you’re a quant, you might want to research this, to see whether this behaviour is common (or just confined to the odd stock mentioned there. To do this, you need to get r/WallStreetBets into some machine readable form, as well as obviously equity price data (although that is fairly easy to get). If you want to read r/WallStreetBets in a machine readable form. There is an API for Reddit which is accessible in Python and you can sign up with an API key from Reddit. There are number of terms of conditions associated with getting an API key, which can be seen here on Reddit’s official page. You’ll be able to read articles, and also other metadata associated with them such as upvotes and downvotes.

 

You can also buy Reddit data from SocialGist in a more structured form, which is likely to make it easier to parse. There’s also SimilarWeb which sells web traffic data, which might be useful for understanding readership of a site like Reddit. After all, posting is only one angle of understanding the impact of a post, knowing how many folks are reading it is just as important.

 

Obviously, it is difficult to tell whether r/WallStreetBets will remain important many years into the future, in particular for smaller cap stocks, for years ahead. However, at least at the present time, it does seem to be an interesting driver and at least worth investigating, if you’re a quant.

 

Conclusion

In the past, quants were mostly confined to using price data. These days, we have much more choice about which alternative datasets we might want to use. Thankfully, when it comes to news, both the traditional sources like Bloomberg and Reuters, as well as newer varieties like Twitter and r/WallStreetBets are accessible in machine readable form for quants to number crunch. Quants can then incorporate this data into their models at least for shorter term trading.