Cutting compute costs and increasing speed

There’s a cost to everything. There’s no such thing as a free lunch (whether or not a burger is involved)! If you do want to get something for “free” (or cheaper), there might be costs involved in any case, like queuing, having to travel further etc. Sometimes, it’s just better to pay more to get a better service (or a better burger). This isn’t exactly an earth shattering observation! 

When we are trying to research financial markets, we constantly need to make compromises. Any firm’s research budget has constraints. From within your budget you need to be able to pay for staff and the resources they need to do their job. One specific aspect of resources is compute power. Over the past few months, Alexander Denev and I, have co-founded Turnleaf Analytics to create inflation forecasts, using machine learning and alternative data in our process (and as a plug, if you’re interested in working with Alexander and I have a look at the jobs we have on offer). Perhaps unsurprisingly, to do our research, we need a decent amount of compute power. Hence, more broadly, it’s got me thinking about how financial firms use compute, how to speed it up, how to access resources (local vs. cloud) and balancing all that with the relative costs. How you approach the problem depends on your requirements. Some compute might need to be done very fast (eg. providing option prices to clients), whilst other tasks can be done more slowly (eg. overnight risk reports) etc.


One of the “easiest” ways to make your computations run faster is to spin up more cores (and potentially multiple machines to create a cluster). To be able to take advantage of this we need our code to rewritten in such a way as to make it parallelisable. For certain computations, such as Monte Carlo, which are “embarrassingly parallel” this is easier, because there isn’t interaction between the various threads. If we are repeatedly writing to a shared memory space from multiple threads we need to make sure that the resource is properly managed so that we don’t have writing at exactly the same time.


It should be noted though that improving the speed of the code itself, before we make it parallel can be very valuable to reduce our compute costs (but this will incur another cost, development time etc.). We can use code profilers to identify those parts of the code which are causing bottlenecks, which we may want to optimise. If we are using Python for example, we might even choose to rewrite the whole stack in C++, albeit at a significant development cost.


When it comes to the choice of spinning up multiple cores, we have several choices, either to use the cloud or do use local resources. Using the cloud has many positives, notably, that we can spin up and down cores as they’re needed, so we are not continuously paying for them. If we are intensely using compute, we might choose to buy local servers. The flip side is that we need to manage local resources (eg. replace broken hardware). Cloud resources do also need management, but the skills required will likely be different. You won’t be replacing hardware on the cloud, but you will be trying to understand other things, like the services on offer, how to configure them and so on.


There are many ways we can access compute power in the cloud. The “lowest level” is to spin up machines directly (eg. AWS ec2, GCP Compute Engine) and install everything on those machines. This basically replicates a local setup, but with our cloud provider managing the systems. If our task doesn’t need to be done immediately, we might reduce compute code, by bidding for spot instances to get, which are cheaper than on demand instances. We can also use serverless computing services (eg. AWS Lambdas), which can be more cost effective depending on how often we compute and these are essentially pay as you go. We can use higher level services, that make it easier to access compute, such as having managed databases, things like GCP BigQuery, AWS Athena etc.


The various costs of these cloud services will vary. If we use only higher level services, it’ll be more convenient, but can cost more. There’s also the issue of vendor lock in, the higher up the stack we go. Switching between cloud providers if we’re purely accessing “lower level” services with open source tools is likely to be easier than if we use higher level services, which have vendor specific APIs.


Another choice we have for speeding up code is using GPUs, which can be better value than using more CPUs, and power usage. Many problems in finance are amenable to being speeded up using GPUs, such as machine learning, which involve things like large datasets and large scale matrix computations. There are many software libraries these days that make it easier to access GPU compute, without having to do low level CUDA code. 


There are many ways for speeding up our code, which include, spinning up more cores, optimizing bottlenecks, switching to GPUs etc. In practice, you are likely to use a bit of all the suggested ideas together to accomplish the goal. Which path you take, will impact your budget, so it’s all a matter of balancing the various ideas discussed. If we had an infinite budget we can apply all the above suggestions (and even more)….. but in practice, we need to make compromises.