Creating benchmarks for FX

20180929 Baking

Everything is relative. Is this burger any good? Pretty much the only way to judge is to compare it with other burgers. Is it better than a Big Mac (most likely)? How about a different burger, maybe a fancy one? We can make implicit comparisons in our mind, till we have eventually constructed a ranking. Let’s think for a moment, that we have never ever had a burger (for someone like me, this is perhaps beyond the realm of my imagination). Suddenly it becomes very difficult to assess whether a burger is “good” if we have no other burger to judge it against. We are this forced to compare it to other similar meals we are have had, perhaps a steak, maybe a kebab? We essentially have no benchmark to judge it against.


I’ve written a lot in the past about the difficulty of doing best execution in FX, which typically involves doing transaction cost analysis for your trades. At its simplest level it involves calculating the slippage, that is the difference between where we executed and a benchmark. The FX market is very fragmented hence, trying to get an idea of what a benchmark is, can be tricky. We can seek to aggregate quotes from many different platforms to create a mid benchmark. This is what, for example, New Change FX does, saving us the job of doing this data aggregation. For pairs such as EURUSD, they are very liquid, hence quotes should be readily available. We might argue that potentially we could also want to create benchmarks from actual trade data, and not purely quotes. Many external TCA providers allow you to compare your execution metrics (such as slippage) to your peers, which is in effect creating a benchmark using trades.


What about for less liquid pairs, like NDFs? This is question I get asked a lot whenever I present any work about TCA. Here the venues where they trade are going to be less and typically there is a lot less liquidity. Creating a benchmark is likely to be trickier in this instance, given we simply have fewer points of comparison and we can be less sure of the accuracy of timestamps on quotes. One simple approach is to create time buckets of high/low prices (let’s say over 30 minutes before and after). We can then compare our sell fills against the low and buy trades against the high. Are we always getting filled around the high when we buy and the low when we sell (and how are the prices distributed between the high and low)? We can also do our transaction cost analysis with different price streams collected from the various brokers we are using and see how that impacts our slippage calculations.


However, it is not simply in assessing best execution that we face similar challenges of understanding where the “market” was. When backtesting an FX trading strategy, we need to ensure that we face such a challenge of knowing precisely where the market actually was. We also have the added complexity of seeing different FX markets. Not all quotes are available to all market participants. There are several ways can do this. As with judging best execution, we can use a benchmark mid. But we then need to add appropriate transaction costs to our trades if we do this. If we have TCA tools, these can be useful in giving us approximate indications of the type of slippage we have faced historically. This slippage data can be fed back into the backrest. How complex we want to make our model for assessing historic transaction costs is up to us. For example, the model could take into account the sizes we are executing, as well as market conditions around an event (such as volatility, time of day etc.). If our transaction cost analysis tool is very customisable it can output the historic slippage depending upon many of these parameters. Again we might wish to use different price streams for our backtest, to see its sensitivity. If we see vastly different results, then it could raise issues about how easily we can execute this trading strategy in practice.


In effect, much of our discussion boils down into the ability to customise our TCA, from the types of metrics might want to calculate, to how we bucket our results, tp which benchmarks we want to use (including some which are from data unique to us, such as our own broker streams). If we want to do this, we could go and create our own internal TCA tool. However this is going to be very costly and time consuming to do. Using an external TCA tool is much easier to use, but we don’t have access to source code to customise. It is one of the reasons, that I’m working on a Python based FX TCA library, which I’m hoping to open source, if I can get sponsorship for it. If you’re interested in hearing more about my Python based FX TCA library let me know!