Hello there! We are here with yet another chapter on Neural Networks. If you have been following our series on Neural Networks, we have already seen some basic concepts involving, Intuitive understanding of ANN, Fundamental concepts, different Layers involved in detail. We also drilled down on ConvNets and LSTM layers.
So now that we have enough context around the WHAT, HOW, and WHY of neural networks, let's take a step forward and try to build the whole Neural Network from scratch.
Before we build anything, the most important task is designing the Neural Architecture. The word Neural Architecture refers to the overall structure of the network. This involves finding answers to 3 simple questions:
- WHICH layer should be used?
- HOW many units are required in the network?
- WHAT are the various ways to optimize the result?
In neural network architectures, we define the arrangement of different layers in a chain structure, with each layer being a function of the layer that preceded it.
Let's try to answer each of these questions and understand the intuition behind them. We will take examples wherever necessary for better understanding.
Design Step 1: Identify the layers to be used
Most neural networks are organized into groups of units called layers. These layers are specialized in performing a specific task. We have seen different types of layers in our previous article. We have also discussed Convolutional Layers and LSTM layers in detail.
Every layer is responsible for a task and is specialized in a specific task. When we put together different layers (along with activation functions) we get a fully connected network, that takes the input vector and spits out the predicted value based on the internal mathematical computations.
Design Step 2: How many layers are required
Once you have selected the layers that will be used in the network, the next question kicks in:
How many of these layers are needed and how many units are required per layer?
The main architectural considerations are to choose the DEPTH of the network and the WIDTH of each layer. As we have discussed in our previous articles, a network with even one hidden layer is sufficient to fit the training set, but deeper networks often can use far fewer units per layer and far fewer parameters and often generalize to the test set.
Deeper networks are often harder to optimize.
The general understanding that deeper networks are always better is NOT TRUE. In fact the deeper the network is, the less explainable is the architecture. Which raises questions about the learnings it has obtained. We need to find the perfect fit of layers and activation functions, which generalizes our dataset most efficiently.
Design Step 3: Optimize the results
This is a fairly experimental step. To find the right match there is a lot of trial and error required. Here are some of the modifications that can be made to the architecture to achieve the desired results:
Add New Layers
An initial improvement is to add additional layers to the network. These additional neurons might help the network learn more complex patterns from the training data. Adding more layers add more parameters, potentially allowing a model to memorize more complex patterns.
Adding new layers helps us add new neurons in the network, which helps the model to identify more complex patterns in data. This has a negative impact as well.
Adding new neurons and layers also increases the complexity of the model. This might lead to the additional time required to train the model. This also has a risk of Overfitting our model, which makes our model perform well on training data, but poorly on test data.
These problems can be resolved by using Regularization in the network. But How does Regularization work? In simple terms, Regularization penalizes large weight values the model has learned. This makes weights to be equally distributed across the neurons and no some neurons having higher weight values.
There are three different types of Regularization used:
- L1 Regularization: The complexity of the model is expressed as the sum of the absolute values of the weights
- L2 Regularization: The complexity of the model is expressed as the sum of squares of the weights
- Elastic Regularization: The complexity of the model is expressed by a combination of L1 and L2 regularization techniques
Introducing Dropout Layer
Dropout layers are a special type of layers, designed to randomly drop few neurons from the network. This random dropout of neurons forces the network to learn redundant patterns from the data. This helps the network to generalize better.
More on this in our future articles.
The ideal network architecture for a task must be found via experimentation guided by monitoring the validation set error. There aren't specific steps pre-defined that can improve the results. It varies as per the problem statement and data.
So we suggest you keep on experimenting and exploring different architectures of Neural Network.
Hope we were able to explain the various nuances involved in finding the right Neural Architecture. Stay Tuned for more fun and informative articles in the future.