Pandas for Time Series

Author:Murphy | View: 27397 | Time: 2025-03-23 17:58:01

Data Processing in Python

Since I joined the workforce as a data scientist, most of the data I deal with are time series. Well, so there are a lot of definitions for time series, generally it's defined as a set of data points collected over a period of time. Or speaking in a Pythonic way, it refers to a dataset with a datetime index, and at least one column with numerical values.

It could be the price of a stock over the past few months, the sales of a hypermarket for the past few weeks, or even the blood sugar level records collected throughout the months for a patient.

In this article, I will show how to apply Pandas to a time series dataset, with an example of generated blood sugar level records.

With that, this article will be structured as below:

DateTime Format Manipulation – changing the datetime series into the desired format
Converting DateTime to a Particular Period – convert each data point to the specific time periods
Filtering DateTime Series based on Condition – filtering data points based on selected time period
Time Shift – shifting data points down for a specific number of period
Resampling Time Series – grouping data points based on the specified time period
Line Chart

Let's get started!

As usual, the first step in any analysis with Python is importing the necessary library.

Import Libraries

import pandas as pd
import random
import numpy as np
from datetime import datetime

Create data

Then, let's generate a blood sugar level records dataset for this demo.

def create_demo_data():

    random.seed(365)
    np.random.seed(365)
    number_of_data_rows = 2160

    # generate list of date
    dates = pd.bdate_range(datetime(2020, 7, 1), freq='4H', periods=number_of_data_rows).tolist()

    # create a dictionary with the date generated and blood sugar level
    data = {'date': dates,
            'blood_sugar_level': np.random.normal(5.5, 1, size=(1, number_of_data_rows))[0]}
    # create dataframe
    df = pd.DataFrame(data)
    df = df.sort_values(by=["date"])
    df = df.set_index(keys="date")
    return df

df = create_demo_data()
print(df.shape)
df.head(10)

With the script above, a dataset with 2160 data points recorded in one year, with 4 hours gap is generated. The data points started on July 1st, 2020 and ended on Jun 25th, 2021.