Convolutional vs Feedforward Autoencoders for Image Denoising

Author:Murphy | View: 21040 | Time: 2025-03-23 20:04:34

KavetiAPPLICATIONS OF AUTOENCODERS

Photo by Pierre Bamin on Unsplash (Slightly modified by the author)

Do you want to find out how convolutional autoencoders outperform feedforward autoencoders in image denoising?

If ‘Yes', just keep reading this article.

Different types of autoencoders

There are many practical applications of autoencoders. Image Denoising is one of them.

Image denoising refers to removing noise from corrupted images to get clean images.

The autoencoders used for image denoising are especially known as denoising autoencoders.

We have already covered the fundamentals of autoencoders in the following articles.

Prerequisites
-------------
01. An Introduction to Autoencoders in Deep Learning
02. Autoecnoder Latent Representation and Hidden Layers
03. Autoecnoder Latent Representation and Latent Vector Dimension
04. Dimensionality Reduction with Autoencoders
05. Creating Shallow and Deep Autoencoders in Keras

Optional
--------
06. Convolutional Neural Network (CNN) Architecture
07. Coding a Convolutional Neural Network (CNN)
08. Acquire, Understand and Prepare the MNIST Dataset
09. Keras Sequential API and Functional API
10. Plotting the Learning Curve
11. Image representation in deep learning

As a quick recap, an autoencoder is a type of neural network architecture that consists of three key elements: Encoder, Decoder and Latent vector.

The relationship between these key elements is shown in the following diagram.

**The structure of an autoencoder** (Image by author)

The encoder (function f) takes the input, x and transforms it into a latent vector denoted by z. The decoder (function g) takes z as its input and recovers the input, x from the latent vector. The recovered input is approximately the same as x and therefore denoted as x̄.

The latent vector can have the same or a higher dimension than the input (in case of an overcomplete autoencoder) or a much lower dimension than the input (in case of an undercomplete autoencoder).

However, for image-denoising applications, the latent vector should be lower-dimensional than the input.

Complex autoencoders have many hidden layers. They are known as deep (multilayer) autoencoders. In contrast, an autoencoder with a single hidden layer is called a shallow (vanilla) autoencoder.

We always use deep autoencoders in practical applications including image denoising as shallow autoencoders are not capable enough to capture the important relationships in the data.

An autoencoder can be created using dense and convolutional layers.

When both the encoder and decoder of an autoencoder consist of only dense (fully-connected or MLP) layers, such an autoencoder is known as a feedforward autoencoder.

When the encoder of an autoencoder consists of convolutional layers (downsampling layers) and the decoder consists of transposed convolutional layers (upsampling layers), such an autoencoder is known as a convolutional autoencoder.

Here, we will compare the outputs of both types.

How autoencoders remove noise from corrupted images

In image generation applications using autoencoders, we usually train the autoencoder model as:

autoencoder.fit(train_images, train_images)

This is like telling the model "learn the relationships using the training data to generate new images that are almost identical to the original ones". Then, we can call autoencoder.predict(test_images)to generate new images using the model.

In image-denoising applications using autoencoders, we usually train the autoencoder model as:

autoencoder.fit(train_images_noisy, train_images)

This is like telling the model "learn the relationships using the noisy (corrupted) training data to remove the noise from the images of the same domain". Then, we can call autoencoder.predict(test_images)to denoise new corrupted images of the same domain.

The noisy (corrupted) data is created by adding some stochastic (random) noise to the original data. The noise follows a Gaussian (normal) distribution centered (mean) at 0 with a standard deviation of 0.7.

import numpy as np

noise = np.random.normal(loc=0.0, scale=0.7, size=train_images.shape)
train_images_noisy = train_images + noise

When we train a denoising autoencoder, the encoder part keeps the most important information and removes any unnecessary noise from the training data and creates the lower-dimensional latent (compressed) representation of the same data. Then, the decoder part recovers the clean images from that compressed representation. The trained model can be used to clean new corrupted images of the same domain – by author

Building the models: Convolutional and feedforward autoencoders

The dataset we use

We will use the MNIST dataset (see Citation at the end) to build the two autoencoder models here. This dataset comes preloaded with tf.keras.

To learn how to acquire and prepare the MNIST dataset for deep learning applications, read my article, Acquire, Understand and Prepare the MNIST Dataset.

Approach

We will build a convolutional and a feedforward autoencoder using the MNIST data and compare the outputs of both models. During the training, both modes have the same loss function, optimizer, batch size and number of epochs for comparison purposes.

However, the two architectures differ significantly as feedforward autoencoders have dense layers while convolutional autoencoders have convolutional and transposed convolutional layers.

Training the models and making predictions

After running the following code, you will get the trained autoencoder models with their training history and also the predictions made on new data.

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras import Model, Input
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Conv2DTranspose

# Load the MNIST dataset
(train_images, _), (test_images, _) = mnist.load_data()

###########################
# Feedforward Autoencoder #
###########################

# Reshape data for the dense layer input
train_images = np.reshape(train_images, (-1, 784))
test_images = np.reshape(test_images, (-1, 784))

# Scale the data
train_images = train_images.astype('float32') / 255
test_images = test_images.astype('float32') / 255

# Add Gaussian noise to the data
noise = np.random.normal(loc=0.0, scale=0.7, size=train_images.shape)
train_images_noisy = train_images + noise

noise = np.random.normal(loc=0.0, scale=0.7, size=test_images.shape)
test_images_noisy = test_images + noise

# Clip the noisy data by 0 and 1:
# This is because adding noise may push the normalized pixel values 
# into invalid values of less than 0 or greater than 1
# So, we need to clip pixel values greater than 1 to 1.0 and
# less than 0 to 0.0
train_images_noisy = np.clip(train_images_noisy, 0., 1.)
test_images_noisy = np.clip(test_images_noisy, 0., 1.)

latent_vec_dim = 16
input_dim = 784

# Define the input layer
input_layer = Input(shape=(input_dim,))

# Define the feedforward (ff) autoencoder architecture
# First build the encoder with dense layers
x = Dense(500, activation='sigmoid')(input_layer)
x = Dense(300, activation='sigmoid')(x)
x = Dense(100, activation='sigmoid')(x)
encoder = Dense(latent_vec_dim, activation='tanh')(x)

# Then build the decoder with dense layers
x = Dense(100, activation='sigmoid')(encoder)
x = Dense(300, activation='sigmoid')(x)
x = Dense(500, activation='sigmoid')(x)
decoder = Dense(input_dim, activation='sigmoid')(x)

# Connect both encoder and decoder
ff_autoencoder = Model(input_layer, decoder, name="ff_autoencoder")

# Compile the model
ff_autoencoder.compile(loss='binary_crossentropy', optimizer='adam')

# Train the model
history_ff = ff_autoencoder.fit(train_images_noisy, train_images, epochs=50, 
                                batch_size=128, shuffle=True,
                                validation_data=(test_images_noisy, test_images))

# Denoise new images (prediction)
ff_autoencoder_denoised_images = ff_autoencoder.predict(test_images_noisy)

# Clear the history of the previous model
# Otherwise, you'll run out of memory or 
# will get abnormal results due to clutter from old models and layers.
tf.keras.backend.clear_session()

# Load the MNIST dataset again
(train_images, _), (test_images, _) = mnist.load_data()

#############################
# Convolutional Autoencoder #
#############################

# Reshape data for the convolutional layer input
train_images = np.reshape(train_images, (len(train_images), 28, 28, 1))
test_images = np.reshape(test_images, (len(test_images), 28, 28, 1))

# Scale the data
train_images = train_images.astype('float32') / 255
test_images = test_images.astype('float32') / 255

# Add Gaussian noise to the data
noise = np.random.normal(loc=0.0, scale=0.7, size=train_images.shape)
train_images_noisy = train_images + noise

noise = np.random.normal(loc=0.0, scale=0.7, size=test_images.shape)
test_images_noisy = test_images + noise

# Clip the noisy data by 0 and 1 as before
train_images_noisy = np.clip(train_images_noisy, 0., 1.)
test_images_noisy = np.clip(test_images_noisy, 0., 1.)

# Define the input layer
input_layer = Input(shape=(28, 28, 1))

# Define the convolutional (conv) autoencoder architecture
# First build the encoder with convolutional layers
x = Conv2D(64, (3, 3), activation="relu", padding="same")(input_layer)
x = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(64, (3, 3), activation="relu", padding="same")(x)
encoder = MaxPooling2D((2, 2), padding='same')(x)

# Then build the decoder with transposed convolutional layers
x = Conv2DTranspose(64, (3, 3), strides=2, activation="relu", padding="same")(encoder)
x = Conv2DTranspose(64, (3, 3), strides=2, activation="relu", padding="same")(x)
decoder = Conv2D(1, (3, 3), activation="sigmoid", padding="same")(x)

# Connect both encoder and decoder
conv_autoencoder = Model(input_layer, decoder, name="conv_autoencoder")

# Compile the model
conv_autoencoder.compile(loss='binary_crossentropy', optimizer='adam')

# Train the model
history_conv = conv_autoencoder.fit(train_images_noisy, train_images,
                                    epochs=50, batch_size=128, shuffle=True, 
                                    validation_data=(test_images_noisy, test_images))

# Denoise new images (prediction)
conv_autoencoder_denoised_images = conv_autoencoder.predict(test_images_noisy)

It will take a long time to run this code if you use only the CPU. To speed up the process, you can use the Colab free GPU or a GPU on your own laptop, if any.

I'm not going to explain the code line by line as I have already discussed all these things in my previous articles.

Plotting the learning curves

Now, we can plot the learning curves for both models to analyze the training performance of both networks. For this, we can use the history objects of both models.

# Plot training and validation loss scores
# against the number of epochs
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

ax1.plot(history_ff.history['loss'], label='Train')
ax1.plot(history_ff.history['val_loss'], label='Validation')
ax1.set_title('FF_AE Loss', pad=12)
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.legend(loc='upper right')

ax2.plot(history_conv.history['loss'], label='Train')
ax2.plot(history_conv.history['val_loss'], label='Validation')
ax2.set_title('Conv_AE Loss', pad=12)
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Loss')
ax2.legend(loc='upper right')

plt.savefig("Learning_Curves.png")

**The learning curves** (Image by author)

Left plot: Shows the training performance of the feedforward autoencoder. The model seems to slightly overfit after the 35th epoch.
Right plot: Shows the training performance of the convolutional autoencoder. The model is in the just-right condition where the model is neither underfitting nor overfitting.

Visualizing the outputs

Now, we compare the outputs of both models. For comparison purposes, I will also add the original images and their corrupted (noisy) counterparts.

Let's see the output.

# Visualize image outputs
n = 10
plt.figure(figsize=(10, 5))

for i in range(n):
    # Display noisy images
    ax = plt.subplot(4, n, i + 1)
    plt.imshow(test_images_noisy[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display denoised images by feedforward autoencoder
    ax = plt.subplot(4, n, i + 1 + n)
    plt.imshow(ff_autoencoder_denoised_images[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display denoised images by convolutional autoencoder
    ax = plt.subplot(4, n, i + 1 + 2*n)
    plt.imshow(conv_autoencoder_denoised_images[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display original images for comparison purposes
    ax = plt.subplot(4, n, i + 1 + 3*n)
    plt.imshow(test_images[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

plt.savefig("Comparisons.png")

1st row: Represents the corrupted images after adding Gaussian noise centered (mean) at 0 with a standard deviation of 0.7. We feed these images to autoencoders to get clean images.
2nd row: Represents the cleaned (denoised/predicted) images by the feedforward autoencoder. The contents of images can be identified and close enough to the original images.
3rd row: Represents the cleaned (denoised/predicted) images by the convolutional autoencoder. The contents of images can be easily identified and are almost identical to the original ones. The reason is that the convolutional layers can maintain the spatial structure between nearby pixels of the image. This is not possible with dense layers.
4th row: Represents the original images added for comparison purposes.

Conclusions

Convolutional autoencoders outperform feedforward autoencoders in image denoising as the convolutional layers in convolutional autoencoders can maintain the spatial structure between nearby pixels of the images – by author

This is the end of today's article.

Please let me know if you've any questions or feedback.

How about an AI course?

Support me as a writer

I hope you enjoyed reading this article. If you'd like to support me as a writer, kindly consider signing up for a membership to get unlimited access to Medium. It only costs $5 per month and I will receive a portion of your membership fee.

Join Medium with my referral link – Rukshan Pramoditha

Join my private list of emails

Never miss a great story from me again. By subscribing to my email list, you will directly receive my stories as soon as I publish them.

Thank you so much for your continuous support! See you in the next article. Happy learning to everyone!

MNIST dataset info

Citation: Deng, L., 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6), pp. 141–142.
Source: http://yann.lecun.com/exdb/mnist/
License: Yann LeCun (Courant Institute, NYU) and Corinna Cortes (Google Labs, New York) hold the copyright of the MNIST dataset which is available under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA). You can learn more about different dataset license types here.