Deep learning is inspired by the brain functionality and its interconnection known as artificial neural networks. It is also known as larger neural networks, containing discrete layers, where each layer is trained to extracts feature information from data (feature learning) more than that of  the later. These large models provide better performance when trained with large volume of data and the depth of training is related to complexity of the problem.

CNNs are biologically inspired networks as ANN and the architecture is known for its accuracy in computer vision and at present is widely used in image recognition and classification problems. CNNs are feed forward networks and the spatial relationship between the pixels is preserved. Here the information is routed from the input (x), through the intermediate layers to the output(y) where the output is approximated to a function f*.

The mapping function is given as y = f*(x; θ), where the parameter θ results in more precise function approximation [1].

Thus the input to the network flows through several convolution layers that learns the image features and abstracts the feature representation and is shown in Fig. 1

Fig. 1 Convolutional Neural Networks

Convolution layer: These layers serves as feature extractors and each neuron in the layer is mapped to the previous layer by means of trainable weights. A filter in a CNN is a matrix of weights and this filter when glided over the input image performs the dot product and the resulting matrix is known as feature map. Different filters over the same input image would result in different feature maps. More the number of filters, better the pattern prediction. Thus, for an input image x, the output(Y) of the kth layer is given as

                                                 Yk = f (Wk ∗ x)                                          (1)

where Wk is the weight of the kth layer.

The output so obtained is then subjected to high – level reasoning and thereby the accuracy is enhanced. Since CNN learns the filters, little pre-processing is required. Activation functions are used to introduce non – linarites in the network to suit complex architecture with high volume non – linear data sets. Without an activation function, the system works similar to linear regression model where the performance is limited. Most widely used activation functions include Sigmoid, Hyperbolic tangent and more

recently, Rectified Linear units (ReLu) is used. ReLu is used only on the hidden layers and the function performs better than the traditional methods and given as

                                                  R(x) = max (0, x)                                        (2)

 where R(x) = 0 when x<0;

R(x) = x when x >= 0.

Pooling Layer: The layer controls over fitting [2]. The number of parameters used in analysis is reduced by means of sub sampling known as spatial pooling. In spatial pooling the feature map’s dimensionality is reduced by using any one of the three mechanisms: 1. Max Pooling 2. Average Pooling and 3. Sub Pooling where the above mechanisms find the largest, average of all the elements and total value of all the elements respectively. However the significant information in the image is still preserved.

Fully Connected (FC) Layer: The final layers are fully connected where each neuron from the previous layer is connected to every other neuron in the succeeding layers. Hence all the possible paths leading from the input to the output are analysed, interprets the feature representation and are subjected to high level reasoning. Manipulation of weight matrices reduces the computation time and it is found that a circular structure provides better performance than a linear structure. In pattern recognition is used to convert 2-D inputs into 1-D output signals to speed up the pattern recognition time. Therefore, for a fully connected layer with n inputs and n outputs, circular projections results in faster computation time from O(n2) to O(n log n) [3].

References:

1.       Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016

2. Ashwin Bhandare et al., “Applications of Convolutional Neural Networks”,(IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 7 (5), 2016, 2206-2215, ISSN: 0975 – 9646.

3. Kumar, S., S. R. Smith, G. Fowler, and C. Velis. “Rena; Kumar, R.” Cheeseman, C, ”Challenges and opportunities associated with waste management in India” R. Soc. Open Sci 4 (2017): 160764.