If you do any sort of analysis of markets, you need market data and related datasets. Without data, you are kind of stuck! There are of course many market datasets which are available to purchase from data vendors. However, what type of datasets are available for free (aside from crypto markets, where there is a massive amount of free data)? A surprising number of free data sources are available for free for more traditional markets and economics more broadly! Admittedly, the quality is not always quite as high with free datasets, compared with paid datasets. Paid datasets might often source what is “free” data but into a much more usable and common format. There are also some datasets, which are simply not available for free no matter how hard you might try. Whilst some free data sources like Google Finance or Yahoo Finance are very well known (especially for equities), there are many other free data sources out here. Below, I’ve summarised a few other free data sources below, for economic data and more traditional markets. Do make sure to check the licence terms for free data sources. Just because they can be downloaded for free, doesn’t always mean you can use totally freely (eg. what are the redistribution rights etc.). If you’ve found any particularly good free data sources, let me know too!
Quandl – https://www.quandl.com/
Quandl has in a couple of years become very well known, and most recently it was taken over by NASDAQ. It has many free datasets collected from a large number of sources, including official institutions. Given that many datasets on Quandl are free, it can be a great place to start. They also offer premium paid-for datasets, which include both more traditional market data, as well as more unusual alternative datasets. I’ve used both the free and premium datasets. There is an easy to use API for downloading from Quandl. I have implemented a nice wrapper in findatapy to make it easier to download Quandl data for Python users.
FRED/ALFRED – https://fred.stlouisfed.org/ and https://alfred.stlouisfed.org/
FRED is run by the St. Louis Fed, and has many free datasets related to both the market and economic data more broadly. ALFRED is also run by the St. Louis Fed, but also includes different vintages of time series, as opposed to the final revised version. Having access to this data can be particularly useful for backtesting, which is very important.
FX retail brokers
FX data isn’t always as easy to get, particularly if you are looking for high frequency datasets. Getting very high frequency high quality FX data is not going to be free. However, we can download historical intraday data from FX retail brokers (eg. FXCM and DukasCopy), which is at least a starting point, particularly when it downsampled into minute bars. In my findatapy, I’ve written a wrapper for downloading FX tick data from both FXCM and DukasCopy, and you can download the Python script here. Obviously, it won’t have the granularity of paid sources, but then again it is also free.
We noted that Quandl sources a lot of its data from official institutions, these can be (not an exhaustive list) central banks, IMF, World Bank, national statistics organisations etc. Obviously, how easy the data is to download depends on the source, and this can vary significantly between each organisation. Some might have easy to use APIs, whilst with others it might be CSV/Excel files (or just PDFs, which will take more time parse). In many cases there might be wrappers which have been written by third parties, so I would always advocating looking around on GitHub first for these, before attempting to write your own.
What about collecting data yourself?
More broadly, there is massive amount of data available on the web. A large proportion of it is text based data, which obviously has a very large amount of information. It is also important to note that you would need to adhere to the terms of usage for websites, if you are doing web scraping/crawling. However, we should note that it can be very time consuming to source large amounts of data this way and structure it into a usable format. Often, it can be much quicker to use a data vendor who might have already structured the web data you are interested in using.
Data is important when it comes to analysing markets.. and that’s probably still an understatement! However, a robust and clean dataset is an important part of the process of analysing markets. Obviously, many datasets are not free, and it is a worthwhile investment paying for good quality data. However, we shouldn’t ignore what’s available out there for free.