ChatGPT and forecasting macro

First off, and possibly this might disappoint some, this post is written by a human, who has a certain penchant for burgers, as opposed to ChatGPT, who prefers to digest CPU cycles and TCP packets. Twitter and LinkedIn are awash with examples of questions for ChatGPT and their answers. I have to admit some of the responses from ChatGPT are remarkably detailed, and in many cases I think most people would struggle to tell them apart from a human. Perhaps the most human-like element is the absolute confidence ChatGPT has in many of its responses, without much in the way of references, or simply references which are made up!


But jokes aside, there are many use cases. One particular use case, is asking programming questions. Often the workflow for coding involves some sort of Google search, which invariably results in a Stack Overflow page which gives you hints. ChatGPT appears to supercharge this whole process, creating very specific answers, to programming questions you might usually simply Google. That isn’t to say it’s not perfect. You still need to be interpret the code it gives you, and weed out any bugs. However, if you can break down your problem (divide and conquer which is an integral part of computer science), I suspect ChatGPT could be a very useful tool to speed up development, provided you can break down problems into sufficiently small parts (ie. divide and conquer).


What about forecasting economic variables like inflation? That’s the main focus at Turnleaf Analytics, which Alexander Denev and I cofounded. If you ask ChatGPT questions about the drivers for inflation, it can indeed give you a reasonable response, which is fashioned around its rather large training set (several hundred GB in size). However, it isn’t going to directly forecast inflation for you.


The problem of forecasting inflation requires data, just like training ChatGPT needs data, although admittedly different sorts of data. However, there are many constraints we face. Our time series is relatively short (and low frequency), and we have many fewer observations, than we have variables. Throwing irrelevant data into the mix isn’t necessarily the way to go. If our model is very complex for the problem at hand, we could end up overfitting it. 


Curation of data is crucial, it’s not about any data, it’s about the right data! What data is likely to be a driver of inflation? Where can I get this data, and if some of this data is unstructured, how can I structure it into a nice time series to use in the model? How can I clean this data? What other preprocessing do I need to do? Of the data we collect, what does it represent, who collects it, what does it measure? When is this data collected (and what is the importance of point-in-time data)?


Only once we have gone through all these steps, will we have a nice clean dataset can we feed it into a machine learning model (that illusive SELECT * FROM A_TABLE_OF_CLEAN_DATA!). We need to have skills of both data engineering to manage the pipeline and data science when creating the model. Creating that “nice clean dataset” is a painful part of the problem, and once that is done we can think about model step.


ChatGPT is a very exciting tool. It can be a good tool to help to solve parts of many problems, if we can articulate very specific questions related to certain elements of those problems. However, it cannot automatically solve every generalized problem we might try to choose at it, such as forecasting from a high level.