Like most bad literature, this blog post was born out of flattery. In the afternoon after my November 12th Argentina presentation, a young investor emailed asking me if I had considered machine learning algorithms among my alternative debt sustainability models. The question was the opposite of being offered a seat in the bus – the young investor either assumed I was myself young, or was open-minded enough not to presume that, because of my age, I could not code.
My answer was that I had indeed considered it, even performed the analysis, but decided to set it aside because of the fatal disadvantage dimensionality-reduction would entail.
Let me explain.
Like the inquisitive investor, I had recognized that the DSA had the hallmarks of a supervised machine learning problem: I had a historical data set which could be cleaned up without major difficulty to obtain a tidy set, and we had a complete set of training output values. Furthermore, we even had the flexibility of choosing to define the problem as a prediction one (i.e., the output function returns debt/GDP), or a classification one (i.e., a sigmoid output function returning either “sustainable” or “unsustainable”).
Therefore, the problem lent itself well to the architecture of a simple, forward-propagating artificial neural network (ANN) with one input layer, one output layer and either one or two hidden layers, whose activation functions we don’t even need to be able to interpret.
In fact, I did build the ANN, fed it the training data and obtained a laconic “sustainable” as the output.
What made me decide against including it in the mix is that, in the matter of Argentina’s debt sustainability, is that I view myself as an advocate against the proposition that the debt is “unsustainable with high probability” and, as an advocate, I needed to concern myself with the persuasiveness of my arguments in front of my jury: the IMF.
In fact, at the presentation I put forward that, under the 2015 Exceptional Access Framework, investors do not bear the burden of proving that Argentina’s debt is sustainable in order to avoid the haircut axe: they can reach their goal by merely casting enough doubt on the probability assigned to a finding of unsustainability. However, shattering a perception which was arrived at through hard work is a nontrivial exercise in advocacy.
Although the arguments are not required to be true, but merely persuasive, I personally am a much better advocate when I have truth on my side (I would probably not make a very good criminal defense attorney). From the self-imposed constraints of using only true arguments (the ANN one being one example) I had to carefully curate those arguments for persuasiveness, which led me to narrow the field to six models: four of them based on the intertemporal budget constraint which is at the core of the IMF’s DSF, and two stochastic ones.
The choice of intertemporal budget constraint models was self-evident: they were all refinements/improvements empirically proven to perform better than the IMF’s DSF by, for example, defining projections as probabilistic instead of deterministic, explicitly recognizing the recursive nature of the debt accumulation/decumulation process, etc.
Further, I chose to introduce the pure-stochastic models in the mix because they had elicited the IMF’s interest when we used them on the Lebanese case to such an extent that we felt that it was safe to assume that, if properly documented, they would carry some weight in the IMF’s final analysis.
On the other hand, I excluded the ANN (and any other ML algorithm) not because the output is not correct but because I found it both unpersuasive and non-actionable.I was not particularly bothered by the “black boxy” perception of ANN algorithms. In fact, the cost function can always be expressed in math closed form as: And, since to goal is to minimize the error term, we can calculate the partial derivative simply as: Granted, for computational efficiency, in neural networks we tend to identify minima through gradient descent, not partial derivatives, but the process being concave for an exponential debt buildup, in this case the numerical shortcut (which identifies local, not absolute, minima) carries no risk of identifying a false minimum.
So, having partially explained away the black boxy nature of the algorithm, we might persuade an economist that the math underlying the ANN models the proper econometric process (that is, unless the economist inquires about what each activation function in the hidden layer actually does, to which we would have no non-stammering answer).
But even assuming the ANN output to be persuasive, i.e., the “jury” buys into it, we still have a problem with actionability. In fact, the DSF framework contains, depending how you are counting, between 32 and 50 features (variables), versus a total of 25 years of data. Our data set is further reduced if we adopt the good practice of reserving a portion of the data as a validation set and reduced further if we aim to minimize overfitting by reserving another subset for cross-validation.
It is obvious that we have either too many features or too few data points. Since we cannot produce additional data out of thin air, we need to reduce the number of variables for the model to work. That process, called dimensionality reduction, is really black boxy and hard to explain without arcane statistics. In itself, explaining how we went about reducing the number of variables should make the recipient of the information less than comfortable, thus further reducing our conclusion’s persuasiveness.
But the dispositive argument against the use of a machine learning model requiring dimensionality reduction is that nobody knows (not even us) what each variable within the resulting set actually means. By way of example, imagine that that five variables (e.g., primary surplus, growth rate, interest rate, real exchange rate appreciation, and % of debt denominated in foreign currency) get reduced to two variables A and B. Nobody knows what either A or B actually mean, which cause the model to lose actionability because if, for example, the IMF sent us home with the task of re-running the model with a lower value for B, we wouldn’t understand what that means in terms of actual policy mix, and vice-versa.
This was probably a very long way to explain why we decided against the use of an artificial neural network as part of our “rebuttal arsenal” but the inquisitive investor who presumed me young deserved a fulsome explanation. Thank you, Elena, I hope I reciprocated in kind.