How to collect market tick data

A lunch break is probably more of a necessity from a break perspective than anything else. You could make the “break” aspect as short as possible, by getting a ready made takeaway. However, that kind of negates the whole break aspect of it all. I end up going through various cycles of what I have for lunch at work, largely due to the plethora of food choices available near my office in London. My current favourite is bringing a bit of food from home, and then buying some vegetables from the supermarket during my break. I then cut it up and microwave it all. Ok, it probably isn’t Gordon Ramsey standard, but the whole act of preparing it for a few minutes gives me a break from the rest of the day. Also being able to select the ingredients, gives me far more flexibility than having something ready made.

 

We often have the choice, between getting something pre-made or trying to make something ourselves. Of course, the caveat to making something ourselves is the need for good quality ingredients. If we use poor quality ingredients, the results are never going to be good. When it comes to doing transaction cost analysis to understand the cost of your trades, the two major ingredients are your own trade data and high frequency market data, to use as a benchmark. Typically, your trade data resides in your OMS/EMS. You can send this trade data externally for TCA. However, if you want to keep your trade data private and you want to have flexibility and customisability, then the only choice is to run TCA analysis internally. Typically larger organisations are likely to want to keep their trade data private, given the larger market impact of their flow. These organisations could include:

 

  • larger asset managers
  • central banks managing their reserves
  • sell side banks

 

It can also include smaller firms who want to take control of the TCA process, and spend a bit more time on it doing their own analysis, to save money on their trading costs. Cuemacro’s tcapy software allows you to run FX TCA on your own servers internally to keep your trade data private. Furthermore, you get all the source code, so it’s possible to fully customise the analysis (and you get full transparency) to get the benefits of an internal solution, but without having to spend years developing it all. As well as saving on development costs, you also save on the maintenance costs associated with a fully internal solution.

 

To do TCA using tcapy on your own servers, you need to go about creating a high frequency market data store. The first choice is to use a database which is suited to high frequency time series data. There are several choices, ranging from KDB, which is a proprietary (and very fast) time series database, to things like InfluxDB and MongoDB (with Arctic) which are open source. I wrote an article specifically about time series databases here. tcapy works out of the box with KDB, InfluxDB and Arctic/MongoDB. It’s also possible for users to add their own wrapper to work with other time series databases.

 

Ok, but once we’ve created our database what do we do? We need to populate it with market data? For this we need a market data source. In my tcapy library, it already has a software wrapper which connects to a data vendor, New Change FX, and downloads historical data from there out of the box. It can be set to run on a regular basis (for example daily, or hourly) to download market data from NCFX and then append to it the database. We’re also working on additional features for tcapy to query the market data stored to check the data quality (eg. are there gaps in the database) etc.

 

Other users might choose on capturing market data in a continuous manner, as opposed to regular downloads as discussed above. In effect this involves collecting streaming market data in a database. We need to connect to our streaming market data sources. For some markets like crypto, this will involve connecting using websockets or for more traditional markets it will be via FIX. You may end up collecting data from a number of different sources in FX, such as ECNs, brokers etc. The number of sources you collect, its granularity and its frequency, will impact the computational resources you require.

 

Eduard Silantyev has written a great article explaining how to capture streaming crypto market data using the cryptofeed Python library which connects to a number of crypto exchanges. He then shows how it’s possible to record all the market data messages in KDB. He also explains that it can sometimes be useful a tool like Apache Kafka to collect the data in batches when it is consumed by the database (in particular Arctic/MongoDB). This is because there can be an overhead with dumping very small messages with databases like Arctic/MongoDB. Redis Streams can also be used to achieve similar functionality. I am currently in the process of writing additional functionality for tcapy to integrate the streaming of market data to a database using Redis Streams.

 

Collecting tick data from the market, whether you choose to do using periodic downloads or a more real time solution using streaming market data, will help to give you many insights into the market. Furthermore, if you have a smaller budget, you can obviously choose to collect less data. Not everyone can collect every single market event, but you might not necessarily need that for many use cases.

 

It will enable you not only to run your TCA internally with maximum flexibility (and keeping your trade data private), but also it will be useful for backtesting trading strategies and analyse the market more broadly. Managing market tick data needed be difficult, and the process can be scaled for your needs. Just as a simple example, I’ve managed to create a market tick database using Arctic/MongoDB on my laptop, storing a couple of years of FX tick data and its fast enough to act as a source for tcapy (along with a few tips and tricks!), to do a TCA computation in a few seconds.

 

If you’re interested in bringing your TCA internally, to get control over the process and you trade data let me know at saeed@cuemacro.com! It’s time to get access to the raw ingredients! I’ll be able to office you advice on how to kick start the process with our tcapy software!