Feature Extraction for Time Series, from Theory to Practice, with Python

Author:Murphy | View: 20757 | Time: 2025-03-23 11:43:14

Time series are a special animal.

When I started my Machine Learning career I did it because I loved Physics (weird reason to start Machine Learning) and from Physics I understood that I also loved coding and Data Science a lot. I didn't really care about the type of data. All I wanted was to be in front of a computer writing 10k lines of code per day.

The truth is that even when you don't care (I still really don't) your career will drift you to some kinds of data rather than others.

If you work at SpaceX, you probably won't do a lot of NLP but you will do a lot of signal processing. If you work at Netflix, you might end up working with a lot of NLP and recommendation systems. If you work at Tesla you will most definitely be a Computer Vision expert and work with images.

When I started as a Physicist, and then I kept going with my PhD in Engineering, I was immediately thrown into the world of signals. This is just the natural world of engineering: every time you have a setup and extract the information from it, at the end of the day, you treat a signal. Don't get me wrong, engineering is not the only world where signals are the celebrities of the movie. Another very famous example is the one of finance and stock prices time series. That is another example of signals (time vs price). But if for whatever reason you are dealing with signals you should remember the first sentence of this blog post:

Time series are a special animal.

This means that a lot of transformation/operation/processing techniques that you would do with tabular data or images have another meaning (if they even have a meaning) for time series. Let's take feature extraction, for example.

The idea of "feature extraction" is to "work" on the data that we have and make sure that we extract all the meaningful features that we can so that the next step (typically the Machine Learning application) can benefit from them. In other words, it is a way of "helping" the machine learning step by feeding important features and filtering out all the less important ones.

This is the full feature extraction process:

Now, when we consider feature extractors for, let's say, tabular data and signals we are playing two completely different sports.

For example, the concept of peak and valley, the idea of Fourier Transform or Wavelet Transform, and the concept of Independent Component Analysis (ICA) only really make sense when dealing with signals. I'm doing all this talking and showing just to convince you that there is a set of feature extraction techniques that only belong to signals.

Now there are two macro classes of methods to do feature extractions:

Data driven based methods: Those methods aim to extract features by just looking at the signals. We ignore the Machine Learning step and its goal (e.g. classification, forecasting, or regression) and we only look at the signal, work on it, and extract information from it.
Model based methods: Those methods look at the whole pipeline and aim to find the features for that specific problem to solve.

The pros of the data-driven methods are that they are usually computationally simple to use and don't require the corresponding target output. The cons of them are that the features are not specific to your problem. For example, doing a Fourier Transform of a signal and using that as a feature might be suboptimal to use specific features trained in an end to end model.

For the sake of this blog post, we'll talk about data driven methods only. In particular we'll talk about domain specific based methods, frequency based methods, time based methods and statistical based methods. Let's get started!

1. Domain Specific Feature Extraction

The first one I'm going to describe is a little bit intentionally vague. The reality is that the best way to extract features is to consider the specific problem that you are facing. For example, let's say you are dealing with a signal from an engineering experiment and you really care about the amplitude after t = 6s. Those are cases where the feature extraction doesn't really make sense in general (for a random case t=6s might not be more special than t =10s) but it's actually extremely relevant for your case. That is what we mean by domain-specific feature extraction. I know this is not a lot of math and coding, but this is what is meant to be as it is extremely dependent on your specific situation.

2. Frequency based Feature Extraction

2.1 Explanation

This method is related to the spectral analysis of our time series/signal. What do we mean by that? If we look at our signal we have a natural domain. The natural domain is the simplest way to look at a signal, and it is the time domain, meaning that we consider the signal as a value (or vector) at a given time.

For example, let's consider this signal, in its natural domain:

If we plot it we get this:

This is the natural (time) domain, and it is the simplest domain of our dataset. We can convert this in the frequency domain. As we saw in the symbolic expression, our signal has three periodic components. The idea of the frequency domain is to decompose the signal in its periodic components frequencies, amplitudes and phases.

The Fourier Transform Y(k) of the signal y(t) is the following:

This describes the amplitude and phase of the component with frequency k. In terms of extracting the meaningful features, we can extract the amplitudes, phases, and frequency values for the 10 main components (the one with the highest amplitudes). These will be 10×3 features (amplitude, frequency, and phase x 10 ) that will describe your time series based on the spectral information.

Now, this method can be expanded. For example, we can decompose our signal not in based on the sines/cosines functions but based on wavelets, which are another form of periodic wave. ** That kind of decomposition is called Wavelet Decompositio**n.

I understand this is a lot to digest, so let's start with the coding part to show you what I mean…

2.2 Code

Now, let's build it in real life. Let's start with the very simple Fourier Transform.

First we need to invite some friends to the party:

Now let's take this signal as an example:

This signal has three major components. One with amplitude = 1 and frequency = 1, one with amplitude = 0.4 and frequency = 2 and one with amplitude = 2 and frequency = 3.2. We can recover them by running the Fourier Transform:

We can clearly see three peaks with corresponding amplitudes and frequency.

Now, we don't really need any fancy plotting (that was just to show that this method works), but we can just do everything with a very simple function, which would be this one:

So you give me the signal y and (optionally):

the x or time array
the number of features (or peaks) to consider
the largest frequency that you are willing to explore

This is the output:

If we want to extract features using **wavelets (not sines/cosine)*** we can do the wavelet transform. We would need to install this guy:

pip install PyWavelets

And then run this:

*I talk about wavelets in detail in this article here. Give a look to learn more about those majestic beasts
Tags: Data Science Feature Engineering Hands On Tutorials Machine Learning Time Series Analysis

Add Fav