Why Convolve? : Understanding Convolution and Feature Extraction in Deep Networks

Author:Murphy | View: 29975 | Time: 2025-03-23 19:52:54

Figure 1: 1D/2D Convolutions and an Interactive Visualization Tool (Source: Author)

It is a common practice nowadays to construct deep neural networks with a set of convolution layers. However, it was not always like this, earlier neural networks and other machine learning frameworks didn't employ convolutions. Feature extraction and learning were two separate fields of study until recently. This is why it is important to understand how Convolution works and why it took such an important place in Deep Learning architectures. In this article we shall explore the Convolution thoroughly and you would be able to understand the concept more deeply with an interactive tool.

Introduction

Convolution is the process of changing the shape of one function by applying another function. It is an operation on two functions just like any other arithmetic operation, however it is applied on functions and not on function values. Therefore, it produces a third function as a result. More formally, a Convolution *C=f(x)g(x)** is an operation between a function f and a function g and is represented by a convolution operator ***. In terms of mathematical formulation, Convolution is the integral of the product of the function _f with the function g which is mirrored on the x-axis and is shifted by a time step t**_.

Note that the output function is the function of t while the input function f is the function of x. A number of timesteps are performed, – usually equal to the number of samples of function values of the f function. At each timestep, Convolution function computes the area under the curve of the function f by taking the function values of G as weights. The sign of t determines the direction of the movement. For a positive value of t the direction of movement is towards + ∞ and for a negative value, movement is towards the – ∞. Intuitively, a Convolution functions provides a moving weighted average of the function values of f.

The Convolution operator can also be thought of as a form of an Integral Transform. An Integral Transform in calculus is a type of function transform T which is applied on an input function f using a Kernel function K.

Convolution on Data Samples

The design on the Kernel function determines the output of the transform. Most commonly, a symmetric Kernel function is used which makes the product in the equation commutative. The size of the Kernel function determines the spread of the convolution. A large Kernel size includes more neighboring points and gives an output which is affected by a large number of neighbors.

Figure 2: An example effect of a convolution operator on Random points representing Statistical Data (Source: Author)

If we take an example set of Data as in figure 2 and apply a Convolution Kernel with a constant value of 0.25 then we obtain a normally distributed curve as can be seen in the figure 2. This shows that a Convolution function with a constant value output, produces a normalizing curve of the Data. Now if we extend this concept and create a Kernel function of size 10 populated with only values 0.1, and apply it on the same data then we see a result as in figure 3. As we can see that the result is a smoothen curve. This is due to the fact that we have created a moving average kernel with a window of 10 neigboring points. Each value in the Kernel function provided a probability weight for computing the weighted average at a given point.

Figure 3: Convolution of Data points with a constant Kernel (Source: Author)

These weights that we provided need not to be all same in a Kernel function. Infact most of the time they are not constant values. So, we can construct a Kernel function which gradually increases the values starting from 0–0.5 until the middle of the kernel and then gradually decreases back to zero. Such Kernel would produce a result as in figure 4. The result shows that it did a better job at smoothing the data curve. This is because we have approximated a most common smoothing Kernel called ‘Gaussian Kernel‘.

Figure 4: Effect of a Gaussian Convolution Kernel on Data points (Source: Author)

So far you have seen the effect of Convolution as a smoothing operator, however, as you shall see in the later sections, the application of a Convolution operator is not limited to smoothing. Convolution is a general operator which can be modified to do many different forms of computations performed on the function values. The effect that one can create from a Convolution operation depends entirely on the construction of the kernel function.

1D Convolution

In the previous section, we discussed the Convolution and its effects on the data. The examples we used were of 1D data. In this section, we extend it and show how Convolution behaves for various 1D signals. Application of Convolution on 1D signals is the most obvious application and it is infact where the concept has originated. A 1D signal is a function of an independent variable (e.g., time) which is sampled to obtain a set of discrete values for a dependent variable as an output. A discrete sampling range is often used to observe the function. The sampling range gives one a viewing window showing the shape of the function within that range. A Convolution operation between two signals is whereby the second signal changes the shape of the first signal. As you may see in the figure 5 how one curve affects another curve and produces a third curve which combines the both (sort of an average – roughly speaking).

Figure 5: Depiction of a Convolution Operation (Source:Author)

In this case the Convolution happened between two similar functions. The convolving function only had difference in amplitude and its phase was shifted. However, it may not be the case in general. Convolution function can take various different forms and can produce different results accordingly. Some examples of such Convolution kernel functions and their effect on various different functions are shown in figure 6. You may see the ‘averaging effect' on square wave when it is Convolved with either a sinewave or a DC kernel. In case of two opposite phases, the waves do not cancel each other as opposed to destructive interference of two waves, rather an average output of two phases is obtained. When employed on a complex signal made up of multiple simple sine, cosine waves and noise components, the convolved signal is smooth and helps in reduction of noise. Smoothing is the most common effect of a Convolution operation; however, kernel function can be used to highlight, differentiate or suppress various segments of a function as well. We shall see how this can be useful in the later section.

Figure 6: Results of 1D Convolutions, First Row: Effects on a Square Wave Signal, Second Row: Effect on interaction of various forms of Sinosoidal waves (Source: Author)

2D Convolution

A 2D Convolution is just an extension of a 1D Convolution with an added dimension. The operation is performed not only in one dimension but is done in both directions. An image is an example of a 2D function and its values (i.e., R,G,B) represent a set of discrete samples of the function obtained as a result of quantization. These samples are arranged in a set of 2D matrices whereby each matrix represent a color Channel. The Kernel function for such data structure is also a 2D matrix and a set of kernel matrices are called a filter. The convolution operation is performed iteratively by moving a sliding window on the image across all color channels and a respective kernel is applied.

In figure 7 you can see the process of a 2D Convolution when applied on an image. For each pixel, a neighborhood window, equivalent to the size of the kernel is extracted. A set of zero padding pixels are inserted, in case it is a pixel at the boundary of an image. A dot product of the neighborhood window and the kernel matrix gives the pixel value for the respective position in the new convolved image. It can be seen from the figure that the convolved image has higher values in the diagonal of the viewing window. This is what a kernel of this kind does to an image. It is an example of an edge detector which finds diagonal edges in the image. The same process is repeated for each pixel in the image by sliding the kernel across all the image and obtaining a new convolved image of the same size. Notice we haven't flipped the kernel matrix; it is because this kernel is symmetric. In the case of a non-symmetric kernel construction, flipping/mirroring is needed before taking the dot-product.

Figure 8: An example application of Convolution Operation, First Row: Image Blurring, Second Row: Edge Detection (Source:Author)

As it was mentioned earlier, Convolution process is not limited to a particular operation but it can be used to do various different kinds of operations. In figure 8 we show two example applications of a Convolution operation. When an image is convolved with a Gaussian kernel it produces a blurring effect on the image. However, when an edge detection kernel, such as the one we showed earlier is applied, it produces an output image which only highlights the edges in the image. Kernel coefficients determine the output of the image. These coefficients can be hand-crafted or can be learned through a learning framework.

Feature Learning with 2D Convolution

In the previous section, we saw how 2D convolution works for images. In this section we shall explore how one can utilize this to extract features. A feature in image processing is an invariant pattern of pixels which can describe an object. A set of features extracted from an image of an object can be used to uniquely identify the object. Convolution Kernels provide a robust way to construct features because they are batch operation machines. A range of operations that would otherwise be required for extraction of features (e.g., a set of edges), require only one matrix multiplication and a sum operation. This reduces computation cost and provides a general framework for extracting features by constructing and applying a large variety of feature kernels. In the figure 9, there are four edge detection kernels shown, each specialized in detecting a certain edge orientation.

Figure 9: An example of a Feature Extraction Process on a 2D color image (Source: Author)

The sliding window shown determines the size of a feature. A number of features can be extracted from the image depending on the stride of a sliding window. A small window with a unit-pixel stride provides a set of local features while a small of number of larger windows construct a global feature space.

In a deep neural network (e.g., a CNN), it is controlled by specifying a number of features learned at each layer. Also a deep network learns the parameters of the kernels as part of the overall parameter learning. So, there is no need to hand-construct the features however, conceptually, the learned features are same to what has been described earlier. For example, instead of constructing a crude hand-crafted kernel for vertical edges which are usually not so frequent in real-world images, it can learn a non-linear Kernel which can find the curvature of an object. The features learned at the convolution layers provide a basis for a set of fully connected classification layers in the later part of a deep neural network.

Final Remarks

In this article, we have explored the concept of Convolution, its various types and discussed how it can be used to extract features from images. Convolution is a general-purpose computation mechanism. It can leverage the power of GPU processing due to its coherence with the GPU architecture. Computations can be split into batches and can be performed independently. Feature learning using Convolution provides a robust and automatic extraction of features from images which deep neural networks employ. In-fact, feature learning is perhaps the most crucial part of an object classification deep convolutional neural network. Learned kernels capture the complex patterns better and Features learned are more diverse.

You can improve the understanding of the concept and practice further with a visualization tool that you can find from the below link.