Pandas for Time Series

Author:Murphy  |  View: 27397  |  Time: 2025-03-23 17:58:01

Data Processing in Python

Photo by Aron Visuals on Unsplash

Since I joined the workforce as a data scientist, most of the data I deal with are time series. Well, so there are a lot of definitions for time series, generally it's defined as a set of data points collected over a period of time. Or speaking in a Pythonic way, it refers to a dataset with a datetime index, and at least one column with numerical values.

It could be the price of a stock over the past few months, the sales of a hypermarket for the past few weeks, or even the blood sugar level records collected throughout the months for a patient.

In this article, I will show how to apply Pandas to a time series dataset, with an example of generated blood sugar level records.

With that, this article will be structured as below:

  1. DateTime Format Manipulationchanging the datetime series into the desired format
  2. Converting DateTime to a Particular Periodconvert each data point to the specific time periods
  3. Filtering DateTime Series based on Conditionfiltering data points based on selected time period
  4. Time Shiftshifting data points down for a specific number of period
  5. Resampling Time Seriesgrouping data points based on the specified time period
  6. Line Chart

Let's get started!


As usual, the first step in any analysis with Python is importing the necessary library.

Import Libraries

import pandas as pd
import random
import numpy as np
from datetime import datetime 

Create data

Then, let's generate a blood sugar level records dataset for this demo.

def create_demo_data():

    random.seed(365)
    np.random.seed(365)
    number_of_data_rows = 2160

    # generate list of date
    dates = pd.bdate_range(datetime(2020, 7, 1), freq='4H', periods=number_of_data_rows).tolist()

    # create a dictionary with the date generated and blood sugar level
    data = {'date': dates,
            'blood_sugar_level': np.random.normal(5.5, 1, size=(1, number_of_data_rows))[0]}
    # create dataframe
    df = pd.DataFrame(data)
    df = df.sort_values(by=["date"])
    df = df.set_index(keys="date")
    return df

df = create_demo_data()
print(df.shape)
df.head(10)

With the script above, a dataset with 2160 data points recorded in one year, with 4 hours gap is generated. The data points started on July 1st, 2020 and ended on Jun 25th, 2021.

The first 10 data points. Image by the author.
The last 10 data points. Image by the author.

Now the data is ready, let's get started!

Tags: Data Science Pandas Python

Comment