# Neural Networks: Chapter 6 - Neural Architectures

Hello there! We are here with yet another chapter on **Neural Networks**. If you have been following our series on Neural Networks, we have already seen some basic concepts involving, **Intuitive understanding of ANN**, **Fundamental concepts**, different **Layers **involved in detail. We also drilled down on **ConvNets** and **LSTM** layers.

So now that we have enough context around the **WHAT**,** HOW**, and **WHY** of neural networks, let's take a step forward and try to build the whole Neural Network from scratch.

Before we build anything, the most important task is designing the **Neural Architecture**. The word **Neural Architecture** refers to the overall structure of the network. This involves finding answers to **3 simple questions**:

**WHICH**layer should be used?**HOW**many units are required in the network?**WHAT**are the various ways to optimize the result?

In neural network architectures, we define the arrangement of different layers in a chain structure, with each layer being a function of the layer that preceded it.

Let's try to answer each of these questions and understand the intuition behind them. We will take examples wherever necessary for better understanding.

**Design Step 1:** Identify the layers to be used

Most neural networks are organized into groups of units called **layers**. These layers are specialized in performing a specific task. We have seen different types of layers in our previous article. We have also discussed Convolutional Layers and LSTM layers in detail.

Every layer is responsible for a task and is specialized in a specific task. When we put together different layers (*along with activation functions*) we get a fully connected network, that takes the input vector and spits out the predicted value based on the internal mathematical computations.

**Design Step 2:** How many layers are required

Once you have selected the layers that will be used in the network, the next question kicks in:

*How many of these layers are needed and how many units are required per layer?*

The main architectural considerations are to choose the **DEPTH** of the network and the **WIDTH** of each layer. As we have discussed in our previous articles, a network with even one hidden layer is sufficient to fit the training set, but deeper networks often can use far fewer units per layer and far fewer parameters and often generalize to the test set.

Deeper networks are often harder to optimize.

The general understanding that deeper networks are always better is NOT TRUE. In fact the deeper the network is, the less explainable is the architecture. Which raises questions about the learnings it has obtained. We need to find the perfect fit of layers and activation functions, which generalizes our dataset most efficiently.

**Design Step 3:** Optimize the results

This is a fairly experimental step. To find the right match there is a lot of trial and error required. Here are some of the modifications that can be made to the architecture to achieve the desired results:

### Add New Layers

An initial improvement is to add additional layers to the network. These additional neurons might help the network learn more complex patterns from the training data. Adding more layers add more parameters, potentially allowing a model to memorize more complex patterns.

Adding new layers helps us add new neurons in the network, which helps the model to identify more complex patterns in data. This has a negative impact as well.

### Regularization

Adding new neurons and layers also increases the complexity of the model. This might lead to the additional time required to train the model. This also has a risk of **Overfitting** our model, which makes our model perform well on training data, but poorly on test data.

These problems can be resolved by using **Regularization** in the network. But **How does Regularization work?** In simple terms, Regularization **penalizes** large weight values the model has learned. This makes weights to be equally distributed across the neurons and no some neurons having higher weight values.

There are three different types of Regularization used:

**L1 Regularization:**The complexity of the model is expressed as the sum of the absolute values of the weights**L2 Regularization:**The complexity of the model is expressed as the sum of squares of the weights**Elastic Regularization:**The complexity of the model is expressed by a combination of L1 and L2 regularization techniques

### Introducing Dropout Layer

**Dropout** layers are a special type of layers, designed to randomly drop few neurons from the network. This random dropout of neurons forces the network to learn redundant patterns from the data. This helps the network to **generalize better**.

### Hyperparameter Tuning

This includes increasing the number of **Epochs**, trying out different **Optimizers**, modifying the **Learning Rate** of Optimizer, modifying the **Batch Size**, trying out different **Loss Functions**, etc.

More on this in our future articles.

## Conclusion

The ideal network architecture for a task must be found via **experimentation** guided by monitoring the validation set error. There aren't specific steps pre-defined that can improve the results. It varies as per the problem statement and data.

So we suggest you keep on **experimenting and exploring different architectures of Neural Network.**

Hope we were able to explain the various nuances involved in finding the right Neural Architecture.** Stay Tuned** for more fun and informative articles in the future.