Machine Learning in Finance: The Present - DecisionBoundaries

Machine Learning in Finance: The Present

Machine learning (ML) is an exciting, rapidly evolving, field of study. To some, especially those in my generation, mastering ML may feel daunting. It’s not.  ML is nothing more than the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. I learned the basics of ML in my late 50’s and now apply it at work all the time, sometimes as the sole device for solving a problem with no closed-form solution, other times as a coadjutant to more traditional problem-solving approaches. In fact, by naming my firm after a ML concept, I intended to convey approachability just like ML itself is welcoming. In fact, we have all (even the English majors among us) “done” ML at some point in our lives. Yes, those linear regressions we all worked on in high school are one (admittedly basic) ML algorithm.

ML is the hottest thing (too hot in my view) in my field of finance at the moment. Earlier this year, Marcos López de Prado, Adjunct Professor of Financial Machine Learning at Cornell, presented numerous inspiring use cases, including price prediction, portfolio construction, outlier detection, bet sizing, feature importance, and others. But, while machine learning has achieved impressive results in some fields (such as detecting nudity in Facebook posts), it still has a way to go in others. One example is writing. In last week’s blog I wrote about modern monetary theory. While not my most engaging post to date, it is clear that the human-written version was more informative and on point than its ML-written counterpart (posted this week, just for kicks).

In fact, ML, as we know it today, is not actually that smart1. In another recent post, I argue (from a behavioral finance standpoint) that that may be a good thing, at least for the current cohort of traders and investors.

Moving from the current paradigm of “representation learning” (or learning by “imitation” from data) to a paradigm of “machine consciousness”, where the agent will be able to make decisions based on independent “thinking” by using an approach that is more akin to human behavior, is still a distant milestone.

Actually, ML has been in finance long before it has become a hype. In fact, clustering and regression techniques have been used in classical finance and time series prediction literature for a long time, and chances are that any quantitative long-short equity firm is already using these techniques in their factor models.

But the industry still has a long way to go to adopt more advanced ML such as large deep learning models. This may be because of a lack of expertise and a reluctance to try new things when the current models seem to work fine. But reluctance is not the biggest obstacle to adoption – in fact every firm I talk to wants to either avail itself of the promise of the “ML edge” or not be left behind by it. Instead, high expectations, and ultimately disappointment, could be the biggest challenge during the adoption process. Indeed, it is hard work to get an ML model to function well. In my experience, the first step of a ML project takes the most time and effort and can lead to the greatest frustrations. Consider that a quant who is involved in an ML project, typically spends 50% (but often as much as 80%) of his or her time and effort on data cleaning and pre-processing (I’ve been there, it’s dreadful) and only 20 to 50% on actual training and testing of ML algorithms (the fun part).

In fact, ML models tend not to be parsimonious and have a lot of configuration overhead and sensitivity to data provenance. The former means that those involved in the project need to keep careful versioning of their model configurations and parameters, so their experiments are reproduceable and cleanly organized. The latter means that they need to make sure they track the history of their data carefully as it goes through iterations of cleaning, scrubbing, pre-processing, and so on. This would be very time-consuming for all firms, even firms with hundreds or thousands of employees, because it’s not a problem that’s linearly solvable with the number of people you throw at it. It requires careful design, planning, foresight and some good luck in making the right architectural choices.

But while the finance industry faces no issues of talent shortage, the commoditization of data science skillsets (and supporting software libraries) means that finance will face some competition. Indeed, typically, the most talented ML students will seek to work at tech firms, both because of higher pay, but mostly because the challenges and ability to have direct influence are seen as much better in the tech industry.

It should come as no surprise how hard (and expensive) it is to keep a highly talented recent graduate motivated and excited about data cleaning for more than two years. Yet, that is exactly what finance needs. The most important skills are actually the least glamorous. Attention to detail, methodical and organized workflow, ability to communicate your ideas well, and a willingness to spend a disproportionate amount of time cleaning data and debugging, compared to the more interesting modelling work.

So, while I don’t disagree with Marcos’ vision of ML’s promise in finance, my behavioral finance view remains unchanged: for the reasons above, the current cohort of traders and investors will not be replaced by machines any time soon.

1This month, for the first time, an AI system (Aristo) managed to pass an 8th grade science test.


 

 

Subscribe to Blog

Leave a Reply

Your email address will not be published. Required fields are marked *