Differentiate Noisy Time Series Data with Symbolic Regression
Note: If you don't have a Medium subscription, you can read the article for free here!
Time series profiles are around us in our everyday life. There are also many specialized research works out there that deal with them.
In simple terms, a time series profile is a collection of subsequent data points y(0), y(1), … ,y(t), where one point at time t depends on the previous point at time t-1 (or even further back in time).
In many applications, one is interested in predicting how the profile behaves if some previous points are available. To do that, there are a wide variety of modeling approaches out there. In their core, the models might take some information about the past (or the present), and they give an estimation about how the profile looks in the future. One can find a lot of works that deal with such time series predictions, for example to describe weather using neural networks (Bi et al., 2023), stock price behavior via deep learning (Xiao and Su, 2022), or product demand evolution of pharmaceuticals (Rathipriya et al., 2023). Of course, those research works I just found after a quick search, so there is plenty of other things out there.
However, as you know, to make a model learn so it can predict the time series profile, we need data. The better the data, usually, the better we can describe the process under study.
One could make a model learn to predict the next state y(t+1), if we hand it some old time points y(0), … , y(t). In some applications, however, we might want a model that takes the current observations y(0), … , y(t), and it should predict how fast the system changes at the current or next time point. Hence, we want the system's derivative dy instead of the observable state y. Therefore, to train such a model and let it output a derivative dy, we first need to gather those derivatives. And usually, such derivatives are calculated directly from the observed data y, since measuring derivatives – depending on the situation – might be hard or even impossible.
And this is the point where one little thing makes our life so much more difficult: Noise.
Everyone who worked with noisy time series data knows how painful it might be to calculate their derivatives.
There are many possibilities how to deal with noise in time series data. In this article, I would like to show you one possibility that worked quite well in one of our research projects, where there were not a lot of data points available in the time series profile.
So, get yourself a cup of coffee ☕, and then fire up Python