Ok, the past few months haven’t been the most exciting for most folks. Slowly, at least in the UK, things are getting back to “normal”. This week, there was also a very good sign (in my view), I found that Badiani had opened up in Canary Wharf, which serves Italian gelato. Although, I must admit, I didn’t take a photo, so the one above of a sundae from elsewhere will have to do for now). It tasted exactly like I had expected, very creamy and sweet. It would be somewhat disconcerting if it tasted of something entirely different.
If you make something with milk, sugar and cream and then freeze it, it’ll taste, like, well ice cream. It’s a natural consequence. If you want to make something else, you’d have different ingredients and apply a different process. We can draw a parallel, when it comes to picking which technologies and languages to use for a project. Our objective is a particular outcome and use case.
When it comes to the language we use, there are many considerations, there isn’t one language which is “best” for every situation. In finance for example, if we are considering low latency applications such as high frequency trading, this would likely guide us towards C++ (or possibly Java) and also towards languages like kdb+/q to help store and query tick data. If our task is heavily compute intensive, is it worth writing code that can be run on a GPU or FPGA?
If we’re dealing with large datasets, but in a less latency sensitive environment (for example researching trading strategies), Python might be a choice, in particular to take advantage of a lot of existing data science libraries and is easier to write. In practice, we might end up using multiple languages. Developers also have their own preferences and experiences when it comes to languages. We might also have constraints in terms of existing libraries and resources used in a firm, and what a firm’s future plans are. Even as somewhat who is enthusiastic about using Python, I’ve coded in many languages in the past, such as Java. I think in practice, having experience of several languages does help more broadly in coding to understand different ways of doing things.
For any data project, we’ll need to choose how we store data, whether to use simple flat files like Parquet, or store in a database (and then we need to consider which database too). Do we want to use a computation engine such as Spark instead of a more traditional database like SQL?
From a hardware perspective, we need to consider whether we’ll use local servers or use the cloud (then which provider and so on), and we need to understand the various pros and cons of each. The cloud allows us to scale easily, in a way that is more challenging when using local hardware. At the same time we need to only use the resources we really need on the cloud, to avoid running up bills (eg. don’t waste money running servers on the cloud when they’re idle). Different clouds will have different technologies, which won’t necessarily be compatible. Do we use serverless technologies and if so which? If we have written code to target GPUs, we’ll obviously need to get the appropriate hardware too.
Obviously, this is just a short article, so it is really just skimming the surface in terms of the questions you might ask (and their associated answers). The main point though is that the answers will often vary, depending upon the project, and a certain technology or language won’t always be appropriate for every situation and firm.