Write Cleaner, Consistent Code: Can ChatGPT and Autoformatters Help?

Author:Murphy | View: 24373 | Time: 2025-03-22 23:19:45

AUTOMATION IN DATA SCIENCE

There are only two things that can happen when you revisit old code: either you cringe with a facepalm-moment-filled-with-regret or you give yourself a silent nod of appreciation and pat on the back for the neat, readable, and well-maintained code.

In this article, we'll strive for that second scenario.

Using the autoformatting tools and tips in this article will enable you to save unnecessary time and energy on manual formatting and deciphering poorly structured code later.

We will discuss:

A prompt that allows you to refine your Python code and adhere to best coding practices
Python code formatting libraries like black and autopep
The use of ChatGPT as a resource for improving coding practices and not just an Automation tool

The Prompt: Code Refactoring for Best Practices

The prompt below allows me to quickly optimize hastily written code and highlight opportunities for improvement related to best practices. I will emphasize specific segments of the prompt and elaborate on their significance as well as changes to tailor to your preferences.

Please refactor the following Python code to adhere to best coding practices,
including proper naming conventions, indentation, comments, and comprehensive
type hinting. Ensure that the code is clean and maintainable.

INSERT CODE


Before proceeding with the refactoring, please feel free to ask any necessary
follow-up questions to ensure that you have a clear understanding of the 
code's purpose and functionality. Don't give suggestions before I respond 
to any questions.

Once you've reviewed the code and obtained any needed clarifications, please 
provide suggestions and improvements based on the code itself. Tailor the 
coding practices and recommendations to the code's specific context, and 
share any additional recommendations for writing high-quality Python code. 
Format any docstring comment to include 'Input' and 'Output'.
Present the final refactored code based on your suggestions and explanations.

"Please feel free to ask any follow-up questions."

This encourages a deeper understanding of the code's purpose. Sometimes, ChatGPT may describe the ‘how‘ but not the ‘why.' By asking questions, you ensure that the code refactoring process considers the underlying reasons.

If you wish to save time, you may choose to omit this step, however, I believe that including the ‘why' in comments stands as the distinguishing factor between your future face-palm or pat on the back. Oftentimes, we look back at the code comments and the nuanced context is what saves us.

"Provide suggestions based on the code itself."

This emphasizes that recommendations should be tailored to the code's specific needs, ensuring that suggestions address areas of weakness rather than a comprehensive analysis for time efficiency. However, if your goal is to retrieve a comprehensive analysis, then you can modify the prompt to use a standardized list of best coding practices and provide a rating for each category.

"Format docstring comments to include ‘Input' and ‘Output'."

This is just a personal preference to use Input and Output in the docstring. However, some developers prefer Parameters and Returns .

"Include Type Hinting"

Type hinting is critical for enhancing code readability and preventing type-related errors. By specifying data types using type hints, developers and tools can catch potential issues early in development.

A Feature Importance Example

We'll insert the code example below into the refactoring prompt.

def permutation_importance(model, X, y, scoring=accuracy_score):
    base_score = scoring(model, X, y) 
    importances = np.zeros(X.shape[1])

    for i in range(X.shape[1]):
        shuffled_X = X.copy()
        shuffled_X[:, i] = np.random.permutation(shuffled_X[:, i])  
        score = scoring(model, shuffled_X, y)
        importances[i] = base_score - score  # Decrease in performance

    return importances

It's worth it to note that GPT 3.5 may not consistently wait for follow-up questions to be answered before proceeding. However, GPT 4 consistently waits for responses and will often go the extra mile by proactively suggesting potential enhancements to the code's functionality. For instance, below it proposes the addition of error handling.

I happily agreed to add the extra layer of error handling.

The provided refactored code:

def calculate_permutation_importance(
    model: Any, 
    features: np.ndarray, 
    target: np.ndarray, 
    scoring: Callable[[Any, np.ndarray, np.ndarray], float]
    ) -> np.ndarray:
    """
    This function computes the permutation importance of each feature 
    in the dataset by measuring how the scoring metric changes when the
    values of the feature are randomly shuffled.

    Input:
        model: A Machine Learning model for which the importance is calculated.
        features: A numpy array representing the feature data.
        target: A numpy array representing the target data.
        scoring: A function that calculates the scoring metric.

    Output:
        A numpy array of importances for each feature.

    Raises:
        ValueError: If the shapes of features and target do not align.
    """

    # Ensure that the number of samples in features and target are the same
    if features.shape[0] != target.shape[0]:
        raise ValueError("Features and target must have the same number of samples.")

    # Calculate the base score of the model using the original dataset
    base_score = scoring(model, features, target)

    # Initialize an array to store the importances of each feature
    importances = np.zeros(features.shape[1])

    # Iterate over each feature
    for i in range(features.shape[1]):
        # Copy the feature dataset to avoid modifying the original data
        shuffled_features = features.copy()

        # Shuffle the values of the current feature
        shuffled_features[:, i] = np.random.permutation(shuffled_features[:, i])

        # Calculate the score after shuffling the current feature
        shuffled_score = scoring(model, shuffled_features, target)

        # The importance is the reduction in the model's score
        importances[i] = base_score - shuffled_score

    return importances

As we can see, our original code snippet may be more concise and sufficient if the audience is familiar with conventions in machine learning. However, for long-term code maintenance and for larger teams, this cleaned version is preferable since it follows best practices such as type hinting, error handling, and descriptive variable names. Consequently, it is more robust, readable, and maintainable, making it the preferred choice for most collaborative software development scenarios.

Automate Formatting with `black` and `autopep8`

There are also some great tools to automate code formatting and ensure consistent style across your codebase. Here's a brief tutorial on using two popular Python libraries, black and autopep8, for automated code formatting.

black

black is a code formatter that automatically formats your Python code to adhere to the PEP 8 style guide. To use black, follow these steps:

Install black using pip:

pip install blacK

Next, navigate to your Python project directory.

To format a single file, run black yourfile.py in the terminal
To format a folder of files: black folder_name/ in the terminal
Check which files can be formatted in current folder: black --check .
Check what can change for a file: black --check --diff file_name.py

Example

First I'll check what will be modified using black --check --diff permutation_importance.py

Next, I'll modify the file using black permutation_importance.py

Below is the code before and after formatting with black.

Photo by Author. Code before and after formatting using black

`autopep8`

autopep8 is another tool that automatically formats Python code according to the PEP 8 style guide.

Install autopep8 using pip:

pip install autopep8

Next, navigate to your Python project directory.

To format a single file, run autopep8 your_file.py --in-place
Check what can change for a file: autopep8 --diff file_name.py

I will runautopep8 --diff permutation_importance.py to see if it'll catch any other formatting issues.

Here, we can see that even after formatting with black, autopep8 caught some nuanced formatting issues in the code, including:

Fixing the indentation of the raise ValueError statement
Adding line breaks and proper indentation for comments
Properly wrapping and indenting long lines for readability

Differences Between `black` and autopep8

Many have pointed out that black adopts a rigid, highly structured approach to formatting, leading to notable modifications in code structure, while autopep8 aligns code with PEP 8 standards yet preserves the original format. Black is also less versatile in configuration and prioritizes consistency, in contrast to the more adaptable and customizable autopep8.

My suggestion is to utilize both tools. As shown, autopep8 can detect and correct formatting discrepancies even after formatting using black. Using autopep8 following black ensures compliance with black's stringent style and broader PEP 8 norms.

For even more customization options, developers can explore alternatives like yapf, which excels at class formatting, or isort, which tackles import organization seamlessly.

Conclusion

Streamlining your code formatting practice with ChatGPT is an effective way to develop your coding skills and enhance long-term code maintenance. When you combine this strategy with the automation libraries, you establish an efficient workflow that improves the sustained health of your codebase. Furthermore, you can adapt the prompt here to be more of a learning tool for understanding best practices; continuous learning is at the core of software development and we have this new and unique opportunity to retrieve instant feedback as we refine our code.

So, next time you revisit old code, let's hope it'll be with an air of appreciation and that future you will thank you for your commitment to clean code.

Tags: Automation ChatGPT Code Quality Data Science Machine Learning