top of page

7 OCTOBER | LONDON 2024

SEPTEMBER 12TH - 14TH
The O2, LONDON

The CogX Blog

Thought leadership on the most pressing issues of our time


Drawing from a diverse array of mathematical theories and computer science principles, machine learning algorithms are not just changing entire industries and fields — they’re revolutionising our approach to problem-solving itself. In this OpEd, Anil Ananthaswamy, an award-winning science writer and acclaimed author of “Why Machines Learn: The Elegant Maths Behind Modern AI”, challenges the oversimplified narrative that reduces ML to "glorified statistics". Instead, he sheds light on the intricate mathematical foundations that elevate ML far beyond traditional statistical methods.





Machine learning is not “just” statistics



Guest Author: Anil Ananthaswamy

August 14, 2024


“Once you encounter all these algorithms, methods and models, you’ll be hard-pressed to dismiss machine learning as just glorified statistics.”



You might have heard arguments that machine learning is nothing but glorified statistics. I beg to differ. My perspective comes from having been a former software engineer, and now having researched and written my book, Why Machines Learn: The Elegant Math Behind Modern AI. For the book, I had to relearn coding after a 20-year hiatus. Two decades ago, I used to be a distributed systems software engineer, in the pre-ML/AI days. As I learned python and ML, I was intrigued by the change in thinking warranted by machine learning, when it comes to solving problems: ML-based techniques are distinctly different from non-ML methods.


From a software engineering perspective, you have to flip your instincts about how to solve problems: from thinking algorithmically to learning how to pose questions of the data you have in hand, and use machine learning to, well, 𝘭𝘦𝘢𝘳𝘯 the model to represent the data, which can then be used for inference/prediction/generation. You have to learn how to 𝘴𝘦𝘦 data differently: 𝘢𝘴 𝘢 𝘳𝘦𝘱𝘰𝘴𝘪𝘵𝘰𝘳𝘺 𝘰𝘧 𝘢𝘯𝘴𝘸𝘦𝘳𝘴 𝘵𝘰 𝘺𝘰𝘶𝘳 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯𝘴.


Machine learning-based techniques are not just statistics. Yes, statisticians build sophisticated models of the patterns that exist in data and use these models to infer/predict. But ML is so much more than simply writing code to automate what statisticians do.


My hope is that Why Machines Learn will help those of us who know some high-school/first-year undergrad math and maybe even did some old-style software engineering (and even be of use to those, of course, coming to ML entirely untainted by old ways) to appreciate the technological changes happening underfoot. I took what I felt was a representative sample of ML algorithms to illustrate the math that undergirds this new way of thinking, while also providing a somewhat curated historical account.


Let’s say you wanted to write a piece of software to recognize images of cats and dogs. In the non-ML way of thinking about this problem, you’d first need to identify the kinds of features one thinks are characteristic of cats and dogs. For example, features might focus on the shape and size of ears, the length and width of bodies, the shape of tails, and so on. And then, you write software that recognizes such features in images, and depending on the features found in any given image, you tag the image as being that of a cat or a dog. One can well imagine the intractable nature of the problem: you can never come up with an exhaustive list of features, nor can you anticipate the manners in which such features will be visible or occluded in any given image.


But what if you had a large dataset of images of cats and dogs that had been annotated as such by humans. Well, then you feed the images as inputs to a machine learning algorithm (such as an artificial neural network) and ask it to categorise it as either a dog or cat. Because the images are already annotated, the algorithm knows the correct answer. So, if the ML model makes a mistake, the algorithm modifies the model’s parameters such that the error it makes when given the same image again is reduced a little. And you keep doing this until the ML model makes minimal errors for all the images in the training dataset.


Now, if you give the model a previously unseen image, it can in all likelihood tag it correctly as that of a dog or a cat. Internally, the model has—one hopes—figured out the relevant features that distinguish cats from dogs. Crucially, we didn’t have to identify such features a prior.


It’s true that the ML model is learning about the statistics of such patterns in the data; but it’s also true that to build the ML system, one has to go beyond simply thinking about the statistics.

In Why Machines Learn, we develop an intuition for ML-thinking, starting with Frank Rosenblatt’s perceptron algorithm (the first artificial neuron that learned, developed in 1958) and the Widrow-Hoff least mean squares (LMS) algorithm (1959), which can lay claim to be the true precursor to the “backpropagation” algorithm used today to train deep neural networks. There’s also Bayes Theorem, and the Optimal and Naïve Bayes classifiers (you can't do ML without really appreciating the role probability and statistics play in ML).


Then there’s the seminal k-Nearest Neighbor algorithm. It’s a wonderful way to develop a sense for how data is represented in vector spaces, and the notion of similarity in Euclidean space, and how all this falls apart when you move to high-dimensional spaces.


One of the coolest tools one can borrow from statisticians is principal component analysis, to appreciate the power of matrices, and how they can help bring high-dimensional data down to lower dimensions for computational efficiency and easier visualisation, among other things. Once high-dimensional data is projected down to lower dimensions, one can use standard ML algorithms to learn about inherent patterns, say, for classification.


Sometimes you have to move your data into higher dimensions to optimally classify the data into different categories (𝘢 𝘭𝘢 support vector machines), but the algorithm has to stay grounded in lower dimensions for computation efficiency. This can be done using kernel methods, which are so much more than just statistics.


And then, of course, there’s the revolution that's happened in artificial neural networks (ANNs) and deep learning. Some key ideas include the universal approximation theorem, the backpropagation algorithm and specific architectures of ANNs, such as the convolutional neural networks for image classification. And let’s not forget Hopfield networks, which give us insights into dynamical networks that may one day rule the roost.


Once you encounter all these algorithms, methods and models, you’ll be hard-pressed to dismiss machine learning as just glorified statistics.

Previous
Next
bottom of page