Declarative vs. Imperative Plotting

Author:Murphy | View: 25858 | Time: 2025-03-22 23:22:43

Quick Success Data Science

Declarative vs. Imperative imagined as kings by Leonardo absolute_reality_v16

If you're learning Python, expect to make your first plots with Matplotlib. Besides being immensely popular, Matplotlib is an imperative plotting library. This means it generates graphics using a step-by-step approach, which is easy for beginners to grasp.

Python also supports declarative plotting libraries, such as seaborn, Altair, and HoloViews, that let you focus on what the plot should show, rather than how to draw it. To quote the Altair docs, "The key idea is that you are declaring links between data columns and visual encoding channels, such as the x-axis, y-axis, and color. The rest of the plot details are handled automatically. Building on this declarative system, a surprising range of plots, from simple to sophisticated, can be created using a concise grammar."

Scientists and engineers should find the declarative approach enticing, as it leads to more time doing their real jobs and less time coding. As I like to say, "Science first, coding second!"

In this Quick Success Data Science article, we'll look at both imperative and declarative plotting. The main focus, however, will be on the declarative style, using examples from seaborn, Plotly Express, and hvplot.

Declarative versus Imperative: The Big Picture

Before we get into the details, here are some salient points and strengths and weaknesses of the two plotting approaches.

Imperative Plotting

Imperative plotting in Python involves a step-by-step approach where users explicitly specify the details of the plot at a fairly low level. Users have direct control over each element of the plot, allowing for highly granular customization. As a result, imperative plotting requires more manual intervention compared to the declarative approach but can produce more complex visualizations.

Among the strong points of imperative plotting are:

Full control over the details of a plot.
The step-by-step methodology is easy to grasp.

Among the weaknesses of imperative plotting are:

Can require many lines of code, making programs complex and hard to read and maintain.
Users need to be familiar with API details, resulting in a steep learning curve.
Experimenting with different plot configurations and styles can be cumbersome, as each requires manual customization.

Matplotlib is the most popular imperative library and includes some declarative components. In addition, the seaborn library wraps Matplotlib to permit greater use of declarative plotting. So, to me, Matplotlib is only "quasi-imperative."

Likewise, other libraries – while considered declarative – often have a base implementation that permits lots of customization (quasi-imperative), plus a higher-level, more user-friendly __ interface with limited customization (quasi-declarative). For example, Plotly has Plotly Express, and HoloViews has hvplot.

Declarative Plotting

Declarative plotting in Python revolves around a high-level, expressive syntax that allows users to describe the visualizations they want using concise grammar. Most of the details of constructing the plot are abstracted away. Users declare the elements they want, and the plotting library takes care of the underlying implementation.

Declarative plotting libraries tend to be dataset-oriented. Most are designed to work well with pandas DataFrames, Python's answer to Excel spreadsheets.

Declarative plotting boasts several notable strengths, including:

A concise, expressive, and intuitive syntax that facilitates ease of use.
Consistency and reproducibility across plot types, thanks to the uniformity of syntax and logic, which also reduces coding errors.
Interactivity and scalability, where users leverage the power of underlying libraries and frameworks that handle the rendering, interactivity, and performance of the plots.
Excellent suitability for swift exploratory data analysis plots that don't require extensive refinement for publication.

Code Examples

To evaluate the various libraries consistently we'll use the same dataset and make a scatterplot. We'll then fit a regression line to the data and add a title and legend. All examples were built in JupyterLab.

Installing Libraries

We'll use the following open-source libraries: pandas, NumPy, Matplotlib, seaborn, Plotly, hvplot, and statsmodels. You can find installation instructions in each of the previous hyperlinks. I recommend installing these in a virtual environment or, if you're an Anaconda user, in a conda environment.

When using Plotly Express in either Jupyter Notebook or JupyterLab, you may be asked to install nbformat. You can do this in the command line interface with either:

pip install nb format

conda install -c conda-forge nbformat

Loading the Data

For data, we'll use the tips dataset that comes with seaborn. This dataset records restaurant data such as the total bill, the tip amount, the day of the week, the size of the party, and so on. Here's how to load it:

import seaborn as sns

# Load the tips dataset:
tips = sns.load_dataset('tips')

The data loads as a pandas DataFrame. Here are the top three lines:

tips.head(3)

The head of the tips DataFrame (by the author)

Matplotlib – Imperative Example

Let's start with the imperative library, Matplotlib. Note how we need to build a figure (fig) and axes (ax) object, create the scatterplot, build and add the regression line, and manually set the labels, title, and legend.

# Matplotlib imperative example:
import numpy as np
import matplotlib.pyplot as plt

# Create a figure and axes:
fig, ax = plt.subplots()

# Plot the data:
ax.scatter(x=tips['total_bill'], y=tips['tip'])

# Fit and plot a linear regression line:
m, b = np.polyfit(x=tips['total_bill'], y=tips['tip'], deg=1)
ax.plot(tips['total_bill'], m*tips['total_bill'] + b, color='red')

# Set the labels, title, and legend:
ax.set_xlabel('Total bill')
ax.set_ylabel('Tip')
ax.set_title('Imperative Example - Matplotlib (fig, ax)')
ax.legend(['Data', 'Linear fit']);

Here's the output:

The scatterplot built with Matplotlib (by the author)

One thing to note here is that you used the scatter()method, and used DataFrame columns for the x and y axes:

ax.scatter(x=tips['total_bill'], y=tips['tip'])

This method is declarative, but it produces such a simple plot that further embellishments are required for almost any purpose, including quick-and-dirty exploratory data analysis:

Output of the Matplotlib `**ax.scatter()`** method (by the author)

Notice how the absence of x and y labels renders the plot essentially useless. Thus, rather than being a standalone solution, this declarative plotting method is just a single step in a larger (imperative) journey.

Some of the more useful Matplotlib methods for making plots are listed below. While these plots are "sparse" right out of the box, Matplotlib provides full control over the details, making them suitable for scenarios where customization is a priority.

Useful Matplotlib methods for making plots (from Python Tools for Scientists)

Matplotlib also comes with a slightly simplified way to make the same plots using its pyplot module (see Demystifying Matplotlib). Here's the listing and syntax:

Useful Matplotlib **`pyplot`** methods for making plots (from Python Tools for Scientists)

These pyplot methods are primarily designed for making single plots.

Seaborn – Declarative Example

As mentioned previously, seaborn is a declarative library designed to make easier, more attractive plots than native Matplotlib. Here's the code for making a scatterplot with a regression line:

# seaborn declarative example:
import matplotlib.pyplot as plt
import seaborn as sns

sns.regplot(data=tips, x='total_bill', y='tip');

The scatterplot built with seaborn (by the author)

We built this plot with the last line of code. There was no need to specify labels for the x and y axes nor build the regression. As a bonus, the regression line came with the 95% confidence interval shaded, and the markers are semi-transparent, highlighting changes in marker density.

A 95% confidence interval for a regression line is a statistical measure that provides a range within which the true regression line is reasonably expected to lie.

It would be nice if the plot came with a title and legend. I assume these are left off so that they don't "get in the way" of quick exploratory data analyses. Still, it's a little disappointing.

But not to worry, we can add these with some extra code, so that our seaborn plot matches the one produced with Matplotlib.

# Recolor regression line and add a legend:
sns.regplot(data=tips, 
            x='total_bill', y='tip', 
            line_kws={'label': 'Linear Fit', 
                      'color': 'red'})
plt.title('Declarative Example - Seaborn')
plt.legend(labels=['Data', 'Linear Fit']);

The updated scatterplot built with seaborn (by the author)

The regplot() method uses additional parameters, which you can see here. Other seaborn plot types are listed below. Details on these and other methods can be found in the seaborn docs.

Some useful seaborn plotting methods (from Python Tools for Scientists)

Plotly Express – Declarative Example

Plotly Express is a built-in part of the Plotly graphing library. As a simpler, higher-level version of Plotly, it's the recommended starting point for creating common figures.

Plotly Express contains more than 30 functions for creating entire figures at once, and the API for these functions was carefully designed to be as consistent and easy to learn as possible. This makes it easy to switch between figure types during a data exploration session.

Here's the Plotly Express code to generate a scatterplot with regression line:

# Plotly Express declarative example:
import plotly.express as px

fig = px.scatter(tips, x='total_bill', y='tip', 
                 trendline='ols', 
                 trendline_color_override='red',
                 title='Declarative Example - Plotly Express', 
                 width=700, height=500)
fig

The scatterplot built with Plotly Express (by the author)

Notice how we were able to add the title and regression line using arguments in the [px.scatter(](https://plotly.com/python-api-reference/generated/plotly.express.scatter)) method ("ols" stands for "ordinary least squares"). The latter was enabled through the [statsmodels](https://www.statsmodels.org/stable/index.html) library.

The px.scatter() method also permits the use of a legend through the color parameter, which references a DataFrame column, such as:

color='smoker'

Here, however, we want to add the regression line to the legend, which requires extra code:

# Add a red regression line: 
fig = px.scatter(tips, 
                 x='total_bill', y='tip', 
                 trendline='ols', 
                 title='Declarativpe Example - Plotly Express',  
                 trendline_color_override='red', 
                 width=700, height=500)

# Add a legend entry for the regression line
fig.data[0].name = 'Data'
fig.data[0].showlegend = True
fig.data[1].name = fig.data[1].name + 'Linear Fit'
fig.data[1].showlegend = True

fig

The updated scatterplot built with Plotly Express (by the author)

Plotly Express supports interactive plots with panning, zooming, hovering, and clickable/selectable legends.

Below is a list of all the Plotly Express plotting methods. Details are available in the docs.

Plotly Express Plot Types (by the author from the official docs)

hvplot – Declarative Example

The Hvplot library is a high-level declarative interface for the HoloViews plotting library. It's designed to simplify the process of creating complex visualizations with minimal code. It works seamlessly with pandas DataFrames, allowing users to visualize data directly from their data structures.

As part of the HoloViz ecosystem, it integrates seamlessly with tools like Panel, Bokeh, and GeoViews to produce dashboards and other interactive visualizations. Its Bokeh-based API supports panning, zooming, hovering, and clickable/selectable legends.

Here's the hvplot code to generate a scatterplot:

# hvplot declarative example:
import numpy as np
import pandas as pd
import hvplot.pandas

tips.hvplot.scatter(x='total_bill', y='tip',  
                    title='Declarative Example - hvplot', 
                    width=700, 
                    height=500)

The scatterplot built with hvplot (by the author)

Now, let's add a regression line and legend:

# Add a regression line and legend:
scatter = tips.hvplot.scatter(x='total_bill', y='tip', 
                              title='Declarative Example - hvplot',
                              label='Data', 
                              width=800, height=500)

# Calculate linear regression coefficients
m, b = np.polyfit(tips['total_bill'], tips['tip'], deg=1)

# Create a line using the regression coefficients
line = pd.DataFrame({'total_bill': [tips['total_bill'].min(), 
                                    tips['total_bill'].max()], 
                     'tip': [m * tips['total_bill'].min() + b, 
                             m * tips['total_bill'].max() + b]})

# Line plot using hvplot
regression_line = line.hvplot.line(x='total_bill', y='tip', 
                                   color='red', 
                                   label='Linear Fit')

# Overlay the scatter plot and regression line
scatter_with_regression = scatter * regression_line

scatter_with_regression

The updated scatterplot built with hvplot (by the author)

This code is much more verbose than in the previous examples. One interesting thing, however, is that hvplot uses special operators to make it easy to overlay plots or display them side-by-side.

The asterisk (*) overlays plots, such as the regression line and scatterplot shown above. The plus sign (+) places them side-by-side. Just change the following line in the previous code as shown below:

# Overlay the scatter plot and regression line
scatter_with_regression = scatter + line

Now, the scatterplot and regression line are displayed in separate plots:

The scatterplot and regression line side by side (by the author)

While adding the regression line to the legend took a bit of work, normally, legends in hvplot are easy. You just need to specify a category, such as a DataFrame column, for the marker colors. Here's an example using the "smoker" column as the by argument:

# Add the legend using the scatter() method:
import numpy as np
import pandas as pd
import hvplot.pandas

tips.hvplot.scatter(x='total_bill', y='tip', 
                    by='smoker',
                    title='Declarative Example - hvplot',
                    label='Data',
                    legend='top_right',
                    width=700, 
                    height=500)

The scatterplot with legend (by the author)

The plot types available with hvplot are listed below. You can find more details in the user guide.

Some useful hvplot methods (by the author from the official docs)

Customizing Declarative Methods

I find it frustrating – and a bit baffling – that the declarative methods we've reviewed in this article don't produce complete figures by default. By complete I mean containing the minimum elements needed for publication, such as:

A title
A legend
Labeled and annotated x and y axes
A background grid

Fortunately, knowing Python means that you don't have to settle. You can generate your very own declarative function for making plots. Let's do this now by leveraging the seaborn regplot() method we used earlier:

# Function for a "complete" seaborn regression plot:

def my_regplot(data, x, y, title, line_color):
    """Draw a seaborn regplot with title and legend."""
    sns.set_style("whitegrid")
    sns.regplot(data=data, 
                x=x, y=y, 
                label='Data', 
                line_kws={'label': 'Linear Fit (95% CI)', 
                          'color': line_color})
    plt.title(title)
    plt.legend(loc='best')
    #plt.savefig(f'{title}.png', dpi=400)  # optional save

my_regplot(tips, 'total_bill', 'tip', 'My RegPlot Example', 'red')

This function takes some of the regplot() method arguments along with one for a title. The seaborn set_style method adds the background grid and Matplotlib's pyplot module adds the title and legend.

When you call the function, you pass it the DataFrame name, the names of the columns to use for the x and y axes, a title, and a color for the regression line. Here's the result:

A seaborn RegPlot built with a custom function (by the author)

You can use this customization approach with the other plotting libraries.

Summary

Imperative plotting in Python involves a step-by-step approach where users explicitly specify the details of the plot at a fairly low level. Imperative libraries, such as Matplotlib, are useful when building complex figures requiring a high degree of customization.

Declarative plotting in Python revolves around high-level methods that allow users to create basic visualizations with only a few lines of code. Customization is still permitted to various degrees.

The major plotting libraries all come with high-level declarative components. Matplotlib has seaborn, Plotly has Plotly Express, and HoloViews has hvplot.

As Anaconda's James Bednar points out, the ultimate goal of declarative products like HoloViews is to support the entire life cycle of scientific research, from initial exploration to publication to reproduction of the work and new extensions. The declarative style is thus the path to a unified, consistent, and forward-looking plotting solution for Python.