Benchmarking Machine Learning Models with Cross-Validation and Matplotlib in Python

Author:Murphy | View: 26388 | Time: 2025-03-23 20:05:02

In this article, we will look at how to use Python to compare and evaluate the performance of machine learning models.

We will use cross-validation with Sklearn to test the models and Matplotlib to display the results.

The main motivation for doing this is to have a clear and accurate understanding of model performance and thus improve the model selection process.

Cross-validation is a robust method for testing models on data other than training data. It allows us to evaluate model performance on folds, data that has not been used to train the model itself, which gives us a more accurate estimate of model performance on real data.

For a detailed explanation of cross-validation, check this out article

What is cross-validation in machine learning

We will use an object-oriented approach so that we can reuse it for other machine learning projects easily, making this method highly replicable.

The Benchmark class

To start, we will create a class called Benchmark which will be responsible for testing models. The class will accept a dictionary of models, where the key will be the model name and the value will be the template object itself.

The class will also generate test data using the make_classification function of scikit-learn.

import numpy as np
from sklearn import model_selection
from sklearn import metrics
from sklearn import datasets

import matplotlib.pyplot as plt

class Benchmark:
  """
  This class allows to compare and evaluate the 
  performance of machine learning models using cross-validation

  Parameters
  ----------
  models : dict
      Dictionary of models, 
      where the key is the name of the model and
      the value is the model object.
  """

  def __init__(self, models):
      self.models = models

  def test_models(self, X=None, y=None, cv=5):
    """
    Test the models using the provided data and cross-validation.

    Parameters
    ----------
    X : array-like or DataFrame, shape (n_samples, n_features)
        Features for the test data.
    y : array-like or Series, shape (n_samples,)
        Target for the test data.
    cv : int, cross-validation generator or an iterable, optional
        Number of folds for the cross-validation.

    Returns
    -------
    best_model : str
        Name of the model with the highest score.
    """
    if X is None or y is None:
        X, y = datasets.make_classification(
            n_samples=100, 
            n_features=10, 
            n_classes=2, 
            n_clusters_per_class=1, 
            random_state=0
        )
    self.results = {}
    for name, model in self.models.items():
        scores = model_selection.cross_val_score(model, X, y, cv=cv)
        self.results[name] = scores.mean()
    self.best_model = max(self.results, key=self.results.get)
    return f"The best model is: {self.best_model} with a score of {self.results[self.best_model]:.3f}"

The main function of the class will be the test_models function, which will accept test data and use cross-validation to test the models. The function will store the results in an instance-bound variable and return the model with the highest score through the various iterations of the cross-validation.

To display the results, we will add a function called plot_cv_results to the class. This function will use Matplotlib to create a bar graph showing the average cross validation score for each model.

def plot_cv_results(self):
  """
  Create a bar chart to visualize the cross-validation results.

  Returns
  -------
  None
  """
  plt.figure(figsize=(15,5))
  x = np.arange(len(self.results))
  plt.bar(x, list(self.results.values()), align='center', color ='g')
  plt.xticks(x, list(self.results.keys()))
  plt.ylim([0, 1])
  plt.ylabel('Cross-Validation Score')
  plt.xlabel('Models')
  plt.title('Model Comparison')
  for index, value in enumerate(self.results.values()):
      plt.text(index, value, str(round(value,2)))
  plt.show()

Finally, to use the class, we will instantiate the Benchmark object by passing in the dictionary of models and calling the test_models function with the test data. Next, we will use the plot_cv_results function to display the results.

from sklearn import linear_model, ensemble

models = {
    'logistic': linear_model.LogisticRegression(),
    'randomforest': ensemble.RandomForestClassifier(),
    'extratrees': ensemble.ExtraTreesClassifier(),
    'gbm': ensemble.GradientBoostingClassifier()
}

benchmark = Benchmark(models)
print(benchmark.test_models())
benchmark.plot_cv_results()

And this is the result.

Model benchmark result. Image by author.

This way, we can easily compare and evaluate the performance of models and then choose the model that performs best for our specific problem.

In this example we used the make_classification function to generate the toy data, but of course you can use any dataset you like.

Additionally, the Benchmark class can be extended to include other features, such as the ability to save results to a file or test models across multiple datasets.

What are the next steps?

Following the usual machine learning pipeline, the next step will be to tune the hyperparameters of the best model (in this case ExtraTreesClassifier). This if our features are to be considered definitive.

If they are not, an intermediate step would be to do feature selection / engineering, and repeat the Benchmarking step every time such features are changed.

Conclusion

The Benchmark class we've created is just one example of how you can implement this technique in a project, but it can easily be adapted and customized to meet your project's specific needs.

The main benefit of using this approach is that it automates the process of comparing and evaluating models, which can save time and reduce human errors.

Useful Links (written by me)

Learn how to perform a top-tier Exploratory Data Analysis in Python: Exploratory Data Analysis in Python – A Step-by-Step Process
Learn the basics of TensorFlow: Get started with TensorFlow 2.0 – Introduction to deep learning
Perform text clustering with TF-IDF in Python: Text Clustering with TF-IDF in Python

If you want to support my content creation activity, feel free to follow my referral link below and join Medium's membership program. I will receive a portion of your investment and you'll be able to access Medium's plethora of articles on Data Science and more in a seamless way.

Join Medium with my referral link – Andrea D'Agostino

Code template

Here is the entire codebase

class Benchmark:
    def __init__(self, models):
        self.models = models

    def test_models(self, X=None, y=None, cv=5):
        if X is None or y is None:
            X, y = datasets.make_classification(
                n_samples=100, 
                n_features=10, 
                n_classes=2, 
                n_clusters_per_class=1, 
                random_state=0
            )
        self.results = {}
        for name, model in self.models.items():
            scores = model_selection.cross_val_score(model, X, y, cv=cv)
            self.results[name] = scores.mean()
        self.best_model = max(self.results, key=self.results.get)
        return f"The best model is: {self.best_model} with a score of {self.results[self.best_model]:.3f}"

    def plot_cv_results(self):
        plt.figure(figsize=(15,5))
        x = np.arange(len(self.results))
        plt.bar(x, list(self.results.values()), align='center', color ='g')
        plt.xticks(x, list(self.results.keys()))
        plt.ylim([0, 1])
        plt.ylabel('Cross-Validation Score')
        plt.xlabel('Models')
        plt.title('Model Comparison')
        for index, value in enumerate(self.results.values()):
            plt.text(index, value, str(round(value,2)))
        plt.show()

from sklearn import linear_model, ensemble

models = {
    'logistic': linear_model.LogisticRegression(),
    'randomforest': ensemble.RandomForestClassifier(),
    'extratrees': ensemble.ExtraTreesClassifier(),
    'gbm': ensemble.GradientBoostingClassifier()
}

benchmark = Benchmark(models)
print(benchmark.test_models())
benchmark.plot_cv_results()

Tags: Benchmarking Cross Validation Data Science Machine Learning Machine Learning Pipeline