Understanding Kolmogorov-Arnold Networks (KAN)

Author:Murphy | View: 29072 | Time: 2025-03-22 21:46:20

A new research paper titled KAN: Kolmogorov–Arnold Network has stirred excitement in the Machine Learning community. It presents a fresh perspective on Neural Networks and suggests a possible alternative to Multi-Layer Perceptrons (MLPs), a cornerstone of current Machine Learning.

✨This is a paid article. If you're not a Medium member, you can read this for free in my newsletter: Qiubyte.

Inspired by the Kolmogorov-Arnold representation theorem, KANs diverge from traditional Multi-Layer Perceptrons (MLPs) by replacing fixed activation functions with learnable functions, effectively eliminating the need for linear weight matrices.

I strongly recommend reading through this paper if you're interested in the finer details and experiments. However, if you prefer a concise introduction, I've prepared this article to explain the essentials of KANs.

Note: The source of the images/figures used in this article is the "KAN: Kolmogorov–Arnold Network" paper unless stated otherwise.

The Theory

The theoretical pillar of these new networks is a theory developed by two Soviet mathematicians, Vladimir Arnold and Andrey Kolmogorov.

While a student of Andrey Kolmogorov at Moscow State University and still a teenager, Arnold showed in 1957 that any continuous function of several variables can be constructed with a finite number of two-variable functions, thereby solving Hilbert's thirteenth problem. (source: Wikipedia)

The theory that they worked on and eventually developed was based on the concept of multivariate continuous functions. According to this theory, any multivariate continuous function f can be written as a finite composition of continuous functions of a single variable, summed together.

The mathematical formula of the Kolmogorov–Arnold representation theorem. (source: Wikipedia)

How Does This Theorem Fit into Machine Learning?

In Machine Learning, the ability to efficiently and accurately approximate complex functions is an important subject, especially as the dimensionality of data increases. Current mainstream models such as Multi-Layer Perceptrons (MLPs) often struggle with high-dimensional data – a phenomenon known as the curse of dimensionality.

The Kolmogorov-Arnold theorem, however, provides a theoretical foundation for building networks (like KANs) that can overcome this challenge.

How can KAN avoid the curse of dimensionality?

This theorem allows for the decomposition of complex high-dimensional functions into compositions of simpler one-dimensional functions. By focusing on optimizing these one-dimensional functions rather than the entire multivariate space, KANs reduce the complexity and the number of parameters needed to achieve accurate modeling. Furthermore, Because of working with simpler one-dimensional functions, KANs can be simple and interpretable models.

Platonic Representation: Are AI Deep Network Models Converging?

What Are Kolmogorov–Arnold Networks (KAN)?

Kolmogorov-Arnold Networks, a.k.a KANs, is a type of neural network architecture inspired by the Kolmogorov-Arnold representation theorem. Unlike traditional neural networks that use fixed activation functions, KANs employ learnable activation functions on the edges of the network. This allows every weight parameter in a KAN to be replaced by a univariate function, typically parameterized as a spline, making them highly flexible and capable of modeling complex functions with potentially fewer parameters and enhanced interpretability.

KAN leverages the structure of MLP while benefiting from splines.

KAN architecture

The architecture of Kolmogorov-Arnold Networks (KANs) revolves around a novel concept where traditional weight parameters are replaced by univariate function parameters on the edges of the network. Each node in a KAN sums up these function outputs without applying any nonlinear transformations, in contrast with MLPs that include linear transformations followed by nonlinear activation functions.

B-Splines: The Core of KAN

Surprisingly, one of the most important figures in the paper can be missed easily. It's the description of Splines. Splines are the backbone of KAN's learning mechanism. They replace the traditional weight parameters typically found in Neural Networks.

The flexibility of splines allows them to adaptively model complex relationships in the data by adjusting their shape to minimize approximation error, therefore, enhancing the network's capability to learn subtle patterns from high-dimensional datasets.

The general formula for a spline in the context of KANs can be expressed using B-splines as follows: