Hands-on Generative AI with GANs using Python: DCGAN

Author:Murphy | View: 23603 | Time: 2025-03-23 19:07:04

Photo by Vinicius "amnx" Amano on Unsplash

Introduction

In my previous article, we have seen how to use GANs to generate images of the type of MNIST dataset. We achieved good results and succeeded in our intent. However, the two networks G (generator) and D (discriminator) were composed mostly of dense layers. At this point, you will know that usually when working with images we use CNNs (convolutional neural networks) since they use convolutional layers. So let us see how to improve our GANs by using these types of layers. GANs that use convolutional layers are called DCGANs.

What is transposed deconvolution?

Typically when we work with CNNs we are used to working with convolutional layers. In this case, though we also need the "inverse" operation the transposed deconvolution, sometimes also called deconvolution.

This operation allows us to upsample the feature space. For example, if we have an image represented by a 5×5 grid we can "enlarge" this grid to make it 28×28.

What you do in principle is quite simple, you put zeros inside the elements of the initial feature map to enlarge it and then apply a normal convolution operation by using a certain kernel size, stride and padding.

For example, suppose we want to transform a 5×5 feature space to 8×8. First, by inserting zeros we create a 9×9 feature space then by applying a 2×2 filter we shrink it again to 8×8. Let's look at a graphical example.

Transposed Convolution (Image By Author)

Also in this network, we are going to use batch-normalization layers, which help with the internal covariance shift problem. In a nutshell, what they do is normalize each batch before a layer so that there is no change in the distribution of the data during training.

Generator Architecture

The generator will then be formed by a sequence of transposed convolutional layers, they will bring the initial random vector z to have the correct size of the image we want to produce in this case 28×28. The depth of the feature maps on the other hand will go to be smaller and smaller, unlike the convolutional layers.

Discriminator Architecture

The discriminator, on the other hand, is a classical CNN network that has to classify images. So we will have a sequence of convolution layers until we reach a single number, the probability that the input is real or fake.

Let's code!

I am going to work on Deepnote, but you can work on Google Colab if you prefer.

First, check if you have a GPU available on your hardware.

If you are working on Google Colab you would need to mount your drive. Let's import also the necessary libraries.

Python">from google.colab import drive
drive.mount('/content/drive/')

Now we define the function to create the generator G network as we described earlier.

To define the discriminator D instead, we use a Python class since we need the output of the forward method.

Now we can finally instantiate our G and D networks. Let's also print the model to see the summary of the layers.

As usual, we need to define the cost function and optimizers if we want to do network training.

The input vector z is a random vector, taken from some distribution that can be either uniform or normal in our case.

Now let's define the train function of the discriminator D. As we also did in the previous article, D must be trained on both real and fake images. The real images are taken directly from the MNIST dataset while for the fake ones we create an input z on the fly, pass it to the generator G and take the output of G. The labels we can create ourselves knowing that they will be all ones for the real images and zeros for the fake ones. The final loss will be the loss sum of the real images plus the fake ones.

The generator takes as input the output of the discriminator since it has to see if D has figured out whether it is a fake or real image. And based on that it calculates its loss.

We are ready to import the dataset that will allow us to do network training. With PyTorch importing the MNIST dataset is very easy since it has methods already implemented to do this.

Now that we have the dataset we can instantiate the dataloader.

Since at the end of the training, we would like to have an idea of how image generation is improved from time to time, we create a function that allows us to generate and save these images at each epoch.

Finally, we are ready to start the training. Choose the number of epochs, for good results it should be around 100. I only launched 10 and so I will have an "uglier" output.

The training with 100 epochs should take about an hour, then of course it depends a lot on the hardware you have available. Let's plot the results to see if the network has learned how to generate these synthetic images.

Final Thoughts

In this paper we have gone beyond the simple GAN network by also including convolution operations that are very effective when working with images, thus creating what is called DCGAN. To create these synthetic images we built two networks a generator G and discriminator D that play an adversarial game. If this article was helpful to you follow me for my upcoming articles on generative networks!

Tags: Artificial Intelligence Data Science Deep Learning Machine Learning Python