Data Visualization Cheat Sheet for Basic Machine Learning Algorithms

Cheat sheets can function as a guideline to give us initial ideas. Personally, I sometimes use some cheat sheets and find them quite helpful, especially when I started learning machine-learning algorithms.
Besides understanding and applying, checking the obtained outcome is an important step that helps us realize or see what happens to data. In this case, using Data Visualization is a good choice since it can visually show us the algorithm's results.
Even though there are various charts available, selecting the proper ones can help us effectively display the outcome. Thus, I think it is a good idea to make a cheat sheet that helps select the charts quickly. The result is the basic machine learning data visualization cheat sheet, as shown below.
Ta-da!!

Before continuing, please take into account that the data visualizations recommended in the cheat sheet are just some quick initial ideas. There may be cases where these charts are unsuitable to use. Next, I will guide you with examples of how to plot the charts using Python.
Let's get started.
Getting data
Start with importing libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
This article is going to use the iris dataset, which can be directly downloaded from the Sklearn library. The dataset can also be found on the UCI ML iris data and is licensed under a CC BY 4.0 license.
This article will mainly focus on the ‘sepal length' and ‘sepal width' attributes. If you want to try with other datasets, feel free to modify the code below or skip this part.
from sklearn.datasets import load_iris
data = load_iris()
df = pd.DataFrame(data = data.data, columns = data.feature_names)
df_t = pd.DataFrame(data = data.target, columns = ['label'])
df_iris = pd.concat([df,df_t], axis=1)
df_iris.head()

Before applying any machine learning technique, it is always a good idea to do the EDA(Exploratory data analysis) to understand the data or discover patterns. For conducting the EDA, data visualization plays an important part that helping us see the data visually.
Next, we will use a scatter plot to display the selected variables.
sns.set_style('darkgrid')
fig, ax = plt.subplots()
sns.scatterplot(data = df_iris, x = 'sepal length (cm)',
y = 'sepal width (cm)',
hue='label',
palette=['red', 'blue', 'orange'],
ax=ax)
ax.set_xlim(4,8)
ax.set_ylim(1.9,4.5)
plt.legend([],[], frameon=False)
plt.show()

Now that everything has been prepared, let's continue to the machine learning part.
As shown at the beginning of this article, the sheet is composed of four major sections:
- Classification
- Clustering
- Regression
- Dimensionality reduction
Let's get started…

Classification is a supervised machine learning method that creates a model using a training dataset to predict or label a testing dataset. Simply put, if you want to label the test data with discrete class labels, classification algorithms are the techniques that you are looking for.
There are various methods for classifying data. In this article, we are going to work with the K-Nearest Neighbors (KNN), which uses closeness or similarities to classify data. Due to being a supervised learning, we need to go through the process of training and testing the dataset.
The following code shows how to use train_test_split from Sklearn to divide the dataset into training and testing data. Then, the KNeighborsClassifier class is applied to create a classification model from the training data. Lastly, the obtained model will be used to label the testing data.

For plotting, we will use the Kernel Density Estimation (KDE) plot to show the data points' continuous probability density. This plot can help us visualize data density after the classification.
A scatter plot is also used to show data points' locations. We will label testing data with colors close to the class they are predicted, such as red and pink, to distinguish each class's training and testing data.

As we can see from the result the K-Nearest Neighbors method uses closeness or similarities to classify data.
The scatter plot shows the testing data points, with pink, light blue, and yellow colors, classified with the testing data points that they are located close to. The KDE plot shows the data density of each class after classification.
Please consider that data visualization only displays the result. To see how the classification model performs, metrics, such as accuracy or f1 score, must be calculated.

Compared with classification, clustering is an unsupervised machine learning method that tries to identify **** similar groups within a dataset. If your data has no labels and you want to group them into discrete classes, clustering algorithms are techniques that you look for.
In fact, there are various algorithms algorithms to perform the clustering task. Among them, K-means clustering is a common method that is easy to use. Theoretically, K-means tries to partition data into K clusters using the nearest mean from the cluster centroid.
Next, we can apply the KMeans class from Sklearn to cluster the dataset.

Similar to the result from classification, we can utilize the concept of the KDE plot to add more information by displaying the cluster density of the obtained result.

It can be seen that the clusters obtained from K-means clustering are well separated compared with the result from classification. We can notice that the KDE plot not only helps highlight the areas with a high data density, but also facilitates locating data points far away from their cluster.
One more idea that can be added to the plot is that if you work with the centroid-based clustering algorithm, labeling the centroids will provide more information.

Regression is a supervised machine learning technique that can show a relationship between dependent and independent variables. It can also be used to predict continuous values. Thus, if you work with and want to obtain continuous values, regression algorithms are techniques that you look for.
In this article, we are going to apply linear regression, which is a regression method that applies least squares to calculate the straight line that best fits the data.
With our iris dataset, let's see the relationship between the sepal length and sepal width attributes. We can perform the linear regression using the LinearRegression() class from Sklearn, as shown in the code below.

Unlike the previous two machine learning methods, this one does not need to show the data density. The primary goal is to obtain the regression line. Thus, we can just add the line plot to the scatter plot.

It can be interpreted from the regression lines that, for each iris flower species, there is a positive correlation between the sepal length and sepal width.
Please note that the chart just displays the obtained result from the linear regression algorithm. To confirm that two variables actually have a linear relationship, we need to calculate the Pearson correlation coefficient.

This machine learning method is used to cope with a dataset containing multiple continuous variables or features. It has the benefit of reducing complexity, improving algorithm performance, or helping plot data more easily. Thus, if you want to decrease the number of features, dimensionality reduction algorithms can be techniques that you look for.
Even though the iris dataset that we have contains only four continuous variables, it is hard to visualize these four attributes (features) at the same time. Thus, we need to reduce the number of features for plotting.
Principal Component Analysis (PCA) is a dimensionality reduction method that we are going to work with. Simply put, this method starts with the standardization of input data followed by computing the covariance matrix. After that, the eigenvectors and eigenvalues are calculated for sorting before the principal components are selected.
We can easily use the PCA class from Sklearn to perform dimensionality reduction. The following code shows how to reduce the four features of the iris dataset into two features.

For data visualization, we can only use a scatter plot to display data points' locations in the new dimensional space.

The scatter plot shows the data points after being processed with the PCA technique. Even though they may look similar to the previous scatter plot, the data points' locations differ. It can be noticed that the iris species labeled with blue and orange color are well separated.
Key Takeaways
First of all, as previously mentioned, cheat sheets can function as a guideline where you can get some suggestions. The sheet presented in this article aims to provide you with some initial ideas.
If we take a close look, one thing that these recommended four charts have in common is the scatter plot, which is used to show the base information. Then, other plots can be added to express additional information: the data density is displayed with the KDE plot, and the regression line is shown using the line plot.
Please be aware that there may be some conditions where these charts are unsuitable for use. Including, there can be scenarios where other charts are able to display the result better.
If you have any recommendations, please feel free to leave a comment. I would be happy to read them.
Thanks for reading.
These are some of my data visualization articles that you may find interesting:
- 9 Visualizations with Python that catch more attention than a Bar Chart (link)
- 8 Visualizations with Python to handle Multiple Time-Series data (link)
- 7 Visualizations with Python to handle Multivariate Categorical data (link)
References
- Sharma, G. (2023, November 22). 5 classification algorithms you should know – introductory guide. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2021/05/5-classification-algorithms-you-should-know-introductory-guide/
- Kaushik, S. (2024b, February 6). Clustering: Different methods, and applications (updated 2024). Analytics Vidhya. https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/
- About linear regression. IBM. (n.d.). https://www.ibm.com/topics/linear-regression
- GeeksforGeeks. (2023, May 6). Introduction to dimensionality reduction. GeeksforGeeks. https://www.geeksforgeeks.org/dimensionality-reduction/
- Wikimedia Foundation. (2024, January 11). Kernel Density Estimation. Wikipedia. https://en.wikipedia.org/wiki/Kernel_density_estimation
- The Iris dataset. scikit. (n.d.). https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html
- Fisher,R. A.. (1988). Iris. UCI Machine Learning Repository. https://doi.org/10.24432/C56C76.