How to Run Jupyter Notebooks and Generate HTML Reports with Python Scripts

Author:Murphy | View: 29257 | Time: 2025-03-22 19:06:35

Jupyter Notebooks are a widely used solution for quick analysis. As an alternative to creating code using scripts, they allow you to structure your code step by step and visualize the outputs of each code block. However, they are also powerful and sometimes underestimated tools for creating reports. Jupyter Notebooks allow you to combine code with rich text and interactive visualizations, which can be easily exported in a variety of formats, including HTML. In this way, a non-technical audience, who may not have tools such as integrated development environments installed on their computer and who have no interest in the code used in the analysis, can easily access the results from a browser.

In addition, in many projects, the use of Jupyter Notebooks is combined with Python scripts and pipelines. These Jupyter Notebooks are generally used to create interactive reports that support the analysis executed in the scripts. For this reason, it is interesting that the execution of the Notebooks is simultaneous to the execution of the pipeline, so that as we update, for example, several data sets, the interactive reports are also updated, ensuring that they always show the latest available data.

In this article, we will create a synthetic dataset that simulates the annual purchases we make at the supermarket. In addition, we will also create an interactive report where the purchases will be analyzed. Subsequently, we will simulate the update of this dataset with new purchases. At the same time that we update the dataset, we will also update the interactive report from a Python script. All this will be achieved without the need to run the notebook and export it manually as an HTML file.

Synthetic Data Creation

The interactive report created in this article uses synthetic data generated with LLMs. The data created simulates purchases in a supermarket and consists of the following columns:

Date: the date the product was purchased in the supermarket. The format is YYYY-MM-DD (e.g., 2024–01–01).
Product: product name (e.g. apples, yogurts, or salmon).
Quantity: quantity in units of the product purchased (for example, 2).
Price per unit: the price of one unit of the product (e.g. 0.5).
Total Cost: the total price of the products purchased, i.e., the quantity multiplied by the product's unit price (e.g. 2 * 0.5 = 1).
Category: category or type of product (e.g. meat or fish).
Payment Method: method of payment with which the purchase was made. Three possible payment methods exist: card, cash, or digital wallet.

The model used belongs to the company Mistral, a French startup in charge of developing large language models and other technologies based on artificial intelligence. Large language models can be used to generate synthetic data that realistically simulate certain structures, as in this case, the purchases in a supermarket. This data can be used, for example, to test applications, optimize algorithms, or train artificial intelligence models. This article uses data to test automation in generating HTML reports from a Jupyter Notebook when a dataset is updated.

To generate the synthetic data, we need to provide details in a prompt about what kind of output data we want. This prompt will be sent to the large language model, which will respond with the data set.

The prompt is divided into three parts:

Dataset Specification: this part briefly summarizes the contents of the data set you want to generate. The columns of the data set and their contents are specified.
Instructions for data generation: this section details the instructions to be followed by the large language model when generating the output response. For example, in this case, it is necessary that only the JSON output is generated, to later easily convert it into a DataFrame. It is also requested that the set of generated outputs be sufficiently varied.
Output examples: some output examples are provided, so that it is easier for the large language model to understand the structure of the data to be generated.

The process of designing a prompt is not linear. It is necessary to generate an initial prompt and test the response multiple times. The prompt will be adjusted depending on the quality of the response generated by the model.

Once the prompt is designed, we must also create a function to interact with the Mistral Large Language Model. The function generate_synthetic_data is a generic function that can be used with different Mistral prompts and models.

Finally, the function convert_api_response_to_dataframe is created, in charge of converting the JSON output format into a DataFrame. All the functions described above are defined in the synthetic_data_utils.py file.

The functions defined above are used to generate the initial synthetic data. The initial data simulate purchases during the first four weeks of January 2024. Subsequently, we will generate synthetic data for new weeks using these functions. The goal is that, when new synthetic data is generated, not only the dataset containing all annual purchases but also the report created in the Jupyter Notebook and the HTML generated from this Notebook will be updated.

The function generate_initial_data generates the purchase data for the first four weeks of 2024. The file run_generate_initial_data.py is responsible for the execution of this function. This file defines the large language model used, in this case, mistral-large-2407, and stores the output data in the file supermarket_purchases_data.csv. This file contains all annual purchases and will be the one that will be subsequently updated with new data.

After running the file run_generate_initial_data.py, we can check that the initial data has been generated correctly. The following image shows the structure of the data, which aligns with the indications provided at the prompt.

Structure of the Generated Data Based on the Provided Instructions in the Prompt (Image by the Author)

Report Generation in a Jupyter Notebook

The annual purchase data will be used to create an interactive report in a Jupyter Notebook that will allow us to track purchases. The purchase data will be updated weekly and we want the created report to be updated at the same time the data is updated, without the need to open the Jupyter Notebook, execute all its cells, save the result, and then generate the corresponding HTML file. We are going to automate this whole process from Python.

The next section explains in detail the automation process for running the Jupyter Notebook and creating the HTML file when updating the data in the supermarket_purchases_data.py file. Now, we will focus on understanding how the interactive report was generated.

Jupyter Notebooks are an interesting alternative for creating interactive reports, even for a non-technical audience. They allow you to create reports in which the code used is not shown, and only text, explanations, and graphics can be displayed. In addition, it can be exported as HTML files, allowing a user who does not have an integrated development environment installed on his or her computer to open the results in a browser easily.

The report created in this article will be simple. It is a basic example to show the automation process. The report is composed of four sections:

Dataset structure: this section presents the structure of the data set in table format. This allows the user to understand how the data is stored from which subsequent analysis is performed.
Daily Expenditures and Cumulative Annual Expenditures: for each product purchased, the date of purchase is specified. This section performs an analysis of the daily expenses throughout the year together with the cumulative annual expenses, i.e. how much money has been spent to date. To present these results, a line diagram is employed using the Plotly library. One of the lines represents the daily expenses and the other represents the cumulative annual expenses. The visualization has two distinct y-axes: the y-axis on the left shows the daily expenses and the y-axis on the right shows the cumulative annual expenses.
Expenditure Breakdown by Category and Product: in the purchase data set, each product belongs to a category. For example, salmon and tuna belong to the fish category. In this section, an analysis of spending by category and product is performed using a Sunburst chart. In this type of chart, each level of the hierarchy is presented as a concentric ring. In the first ring, the categories are shown, and in the second ring, the products. In this way, you can easily visualize the proportions in the data set in two different hierarchies: categories and products.
Expenditure Breakdown by Payment Method: finally, an analysis is made of the payment methods used. There are a total of three payment methods used by the user: digital wallet, cash, and card. To visualize the total expenditure with each of these payment methods, a pie chart is used.

The following link shows the interactive report. In this Notebook, you can consult all the code used to generate the three analyses explained above. This report will be updated as new data is added to the supermarket_purchases_data.csv file.

Notebook on nbviewer

Section of the Report Generated in the Jupyter Notebook (Image by the Author)

Jupyter Notebook Execution and HTML Report Generation with Python

The report created in Jupyter Notebook analyzes the purchases made in the supermarket to date, which are stored in the dataset supermarket_purchases_data.csv.

The objective is to run this report and create an updated HTML file each time the dataset is updated. To do this, the following two modules will be created:

execute_notebook.py: this module is in charge of executing the Jupyter Notebook provided as input argument. It uses subprocess.run to execute the notebook using jupyter nbconvert, so that the original notebook is overwritten with the execution results.
convert_notebook_to_html.py: this module is in charge of converting a Jupyter Notebook to an HTML file, omitting the cells with code in the generated file. The generated HTML report is stored in the reports folder, located at the same level as the notebooks folder.

<script src="https://gist.github.com/amandaiglesiasmoreno/1cdee0f6ed04970951f0612dec6666b3.js"></script>

<script src="https://gist.github.com/amandaiglesiasmoreno/9dc6cb10b1ddfb415d8b5277f518f30e.js"></script>

These functions are precisely the ones that will be executed at the same time that the supermarket_purchases_data.csv file is updated with new data. The following module simulates an update of the data with the purchases in the first week of February.

<script src="https://gist.github.com/amandaiglesiasmoreno/04dd07607e76ec1862262334a6d099ed.js"></script>

This module is used simultaneously with the previous two modules to ensure that when data is updated, the Jupyter Notebook is also run and the HTML report is updated.

<script src="https://gist.github.com/amandaiglesiasmoreno/251ef93c8b96f2faae59e9df67cb2aa7.js"></script>

In this way, you can see that, simply with two functions: one in charge of executing a Jupyter Notebook and another in charge of converting a Jupyter Notebook to HTML format, you can ensure that all our notebooks, where we perform alternative analyses in our project, are updated as the data sets we are creating are also updated.

Below is the entire folder structure required for the execution of all the above scripts. A link to the GitHub repository is also provided.

File and Folder Structure of the Pipeline

Throughout the article, the code of the pipeline files has been shown. The files are structured in four folders:

data: this folder contains the CSV files created. In this case, only one file will be created, supermarket_purchases_data.csv. This file has been synthetically created with an LLM and shows the purchases of food products made in a supermarket.
notebooks: this folder contains the Jupyter Notebooks of the project. In this case, we have only one called analysis_purchases.ipynb. This Jupyter Notebook includes an analysis of supermarket shopping data.
reports: this folder contains the interactive reports created from the Jupyter Notebooks in HTML format. In this case, there is only one interactive report called analysis_purchases.html. This report contains the same information as the notebook with the same name; however, the code used to generate the different visualizations is not shown in the report.
scripts: this folder contains all pipeline scripts. The following files are available:

synthetic_data_utils.py: this module contains all the necessary functions to generate synthetic data to simulate shopping in a supermarket. These functions will be used both to generate the initial dataset and to create the assumed updates to that dataset.
generate_initial_data.py: this module is responsible for creating a synthetic dataset that simulates the purchases made in a supermarket during the first four weeks of January 2024.
run_generate_initial_data.py: this module executes the code necessary to create the initial synthetic data and save the results in a CSV file.
execute_notebook.py: this module simulates, in a programmatic way, the execution of a Jupyter Notebook.
convert_notebook_to_html.py: this module programmatically simulates the conversion of a Jupyter Notebook to an HTML report.
update_data.py: this module simulates the updating of data with new purchases corresponding to the first week of February 2024.
process_pipeline.py: this module simulates the updating of data together with the execution of Jupyter Notebook and its conversion to HTML format.

All these files can be downloaded from the following GitHub repository.

GitHub – amandaiglesiasmoreno/automated-notebook-reports: This repository demonstrates how to use…

The GitHub repository already contains the file supermarket_purchases_data.csv with the purchases for the first four weeks of January; that is, the script run_generate_initial_data.py has already been executed. Now, we simply need to run the process_pipeline.py file. This file simulates a data update and executes the files needed to run the Jupyter Notebook and convert the notebook into an HTML file.

Jupyter Notebooks are an easy-to-run solution for displaying analysis results to a non-technical audience. They allow you to combine code with rich text and interactive visualizations and export the results in formats that simply require a browser installed on your computer, such as HTML.

Analyses in Jupyter Notebooks are often combined with code executed in scripts. For this reason, it is necessary to look for solutions that allow running these notebooks also from Python scripts so that the pipeline and the analyses performed in Jupyter Notebooks are not decoupled in their execution.

This paper has generated a synthetic dataset that simulates shopping in a supermarket. From this dataset, an interactive report in HTML format has been created using a Jupyter Notebook, where the purchases made so far in the supermarket are analyzed. A pipeline has been implemented so that, every time the file containing all the supermarket purchases is updated, the Jupyter Notebook is executed and the interactive report in HTML format is also updated.

In this way, we ensure that the interactive report created from the data always shows the latest data available and is generated from an updated data set. This is a simple example, but the same concept can be applied to larger projects with more data sets and interactive reports.

Thanks for reading.

Amanda Iglesias

Tags: automatization Data Science Hands On Tutorials Jupyter Notebook Python