Pandas for Time Series
Data Processing in Python

Since I joined the workforce as a data scientist, most of the data I deal with are time series. Well, so there are a lot of definitions for time series, generally it's defined as a set of data points collected over a period of time. Or speaking in a Pythonic way, it refers to a dataset with a datetime index, and at least one column with numerical values.
It could be the price of a stock over the past few months, the sales of a hypermarket for the past few weeks, or even the blood sugar level records collected throughout the months for a patient.
In this article, I will show how to apply Pandas to a time series dataset, with an example of generated blood sugar level records.
With that, this article will be structured as below:
- DateTime Format Manipulation – changing the datetime series into the desired format
- Converting DateTime to a Particular Period – convert each data point to the specific time periods
- Filtering DateTime Series based on Condition – filtering data points based on selected time period
- Time Shift – shifting data points down for a specific number of period
- Resampling Time Series – grouping data points based on the specified time period
- Line Chart
Let's get started!
As usual, the first step in any analysis with Python is importing the necessary library.
Import Libraries
import pandas as pd
import random
import numpy as np
from datetime import datetime
Create data
Then, let's generate a blood sugar level records dataset for this demo.
def create_demo_data():
random.seed(365)
np.random.seed(365)
number_of_data_rows = 2160
# generate list of date
dates = pd.bdate_range(datetime(2020, 7, 1), freq='4H', periods=number_of_data_rows).tolist()
# create a dictionary with the date generated and blood sugar level
data = {'date': dates,
'blood_sugar_level': np.random.normal(5.5, 1, size=(1, number_of_data_rows))[0]}
# create dataframe
df = pd.DataFrame(data)
df = df.sort_values(by=["date"])
df = df.set_index(keys="date")
return df
df = create_demo_data()
print(df.shape)
df.head(10)
With the script above, a dataset with 2160 data points recorded in one year, with 4 hours gap is generated. The data points started on July 1st, 2020 and ended on Jun 25th, 2021.


Now the data is ready, let's get started!