In our previous post we explored What are Layers? and got a high level understanding of What are the different types of Layers? But then again who is looking at just a superficial knowledge.
So in today's article we will dive deeper into our first and most widely seen category of layer - Convolution Layer.
What is Convolution?
Convolution is the process of combining or processing TWO inputs adversarially to extract or formulate a Unique Output.
Or simply put, It is the mathematical operation responsible for transforming and extracting features from the Input.
Now let's talk in terms of Neural Networks. The reason Convolutional Layers are so widely used in AI, is because of their phenomenal capacity to extract features from Input Matrices fed to the network.
Based on the type of Kernel Matrix you choose, the type of extracted Features also vary. At the end of the day it is just mere Matrix Multiplication to transform the Input.
Now we all are aware about the fact that most of the AI problems dealing with Cognitive Senses and Interactions with the Environment deal with data types such as - Images, Sound, Video etc.
Now a thing to note about all these data types is that all of them are mere different representation of Matrices, perceived in different manner.
Now if we were to process these Matrices using Convolution, it would be astonishing to you all to look at the variations in the results of the transformed data, based on the type of KERNEL MATRIX that we choose.
I know, I have given an overload of information, but need not worry. As we progress, we will be dealing with all of the terminologies one by one.
Before we dive into Convolutional Layers, below is an illustration showcasing the capability and working of Convolutional Neural Networks.
What are Convolutional Layers?
Convolution Layers are the group of Neurons present in the architecture which are responsible for performing the task of Convolution to extract relevant information and to transform the Input Matrix into the desired format of the subsequent layer.
Now before we explore How do they work? Let's try and understand what do they need before they can get to work. Some of the terminologies which are very essential to understand before going forward are -
- Kernel Matrix
Kernel Matrix - It is the secondary Matrix which is used over the Input data, by imposing it iteratively over the whole stretch of the feed, to transform and extrapolate information. Ideally the Kernel Matrix has dimensions smaller than the Input Data (usually Images).
Stride - As we mentioned above that the Kernel Matrix is used to perform recurrent Mathematical Operation over the whole stretch of the Input feed, now Stride is the parameter which decides the shift in the Kernel to impose over a New Segment of the Input.
Padding - As Convolution generally downsamples the Input size, basically reducing the size and dimensions of the feed, it is necessary to Pad the Input with certain pre-defined values in order to retain the Original Dimensionality of the Input.
Now that we have understood superficially the definition of various parameters involved with Convolutional Layers. Let's go ahead and see How do they work?
How do they work?
Now that we had a glimpse into what goes into building a Convolutional Layer, let's take a look at how do the function, to help us understand the concepts better.
First let's understand what do we mean by the fact that there is a Kernel Matrix that goes iteratively over an Image to generate a new Matrix.
As you can clearly see -
GREEN - Image
YELLOW - Kernel Matrix
PINK - Extracted Feature
The above illustration defines the process of how a Kernel Matrix performs Matrix Multiplication over the Input to extract and transform the features. But now the next obvious question is What makes it move?
That is what we call Strides. Let's take a look at an Illustration to understand how would a Kernel Matrix perform with a Stride = 1.
Till now we have seen how would Convolution perform on a Single Channel Matrix. But let me ask you a question?
Are Images Single Channel?
The answer is No
Images usually comprise of 3 Channels - Red, Green and Blue or also known as RGB
That bring us to our next question -
How to Implement Convolution on Real Images?
Ideally speaking, the overall concept of Matrices and Kernels remain the same throughout the implementations, what does change is the output.
So based on one's requirement in the architecture, Convolution can be performed to either Merge all the features or more importantly create separate Channels for each extracted Feature.
Let's take a look at the below illustration to understand better about the prior case.
As you could clearly see, the process of convolution results in an Aggregation of all the extracted features and those are being carried forward into the subsequent layers.
Thanks to the years of research we don't have to perform these computational exhaustive operations and honestly pretty redundant and boring, we have Open Sourced Frameworks such as Tensorflow and PyTorch that help us in this exercise and speed up the processes significantly.
If you have followed through this article, then by now you would have a good understanding of the Inner workings of Convolutional Layers in Neural Networks.
Note - These understanding of Layers prove extremely crucial while implementing custom architectures for complex problems
As most of the Novel solutions coming from you all readers is custom built for particular use cases. In such circumstances pre-trained designs may not be the best solution and that's when such understanding becomes vital.
I hope this article finds you well. STAY TUNED for more. 😁