This section will first illuminate the constituent components of the neural network followed by the various network architectures that illustrate how neurons are arranged. The subsequent deliberations on neural network chips and optimizers will demonstrate a seamless implementation of the network for improved accuracy and speed.
Neural Network Components
It's critical to first appreciate the nuts and bolts of a neural network, which is composed of the three layers — input, hidden, and output — as well as the perceptron, activation functions, weights, and biases.
Input, Hidden, and Output Layers
The single input layer accepts external independent variables that help in predicting the desired outcome. All the independent variables of the model are a part of the input layer. The one-to-many interconnected hidden layers are configured based on the purpose that the neural network is going to serve, like object detection and classification through visual recognition and NLP. Hidden layers are a function of the weighted sum of the inputs/predictors. When the network contains multiple hidden layers, each hidden unit is a function of the weighted sum of the units of the previous hidden layer. The output layer, as a function of the hidden layers, contains the target (dependent) variables. For any image classification, the output layer segregates the input data into multiple nodes as per the desired objective of the model.
Figure 1: Input, hidden, and output layers
Frank Rosenblatt, an experimental psychologist from Cornell, was intrigued by the ability of neurons to learn and created a simple perceptron with multiple inputs, a single processor, and a singular output. Thus, the perceptron is a building block of a neural network that comprises a single layer.
Figure 2: Perceptron
An activation function, also known as a transfer function, controls the amplitude of a neuron's output. In a deep neural network with multiple hidden layers, the activation function links the weighted sums of units in a layer to the values of units in the succeeding layer. The same activation function is used for all the hidden layers. Activation functions can be either linear or non-linear, and the most commonly used ones are summarized in the table below:
|Rectified Linear Activation (ReLU)
||A linear function that will output the input directly if it is positive, but if the input is negative, the output is 0
||An "S" curve that generates an output between 0 and 1 and is expressed as probability
|Hyperbolic Tangent (TanH)
||Like a Sigmoid function but symmetrical in nature and therefore produces better results; takes real-valued arguments and transforms them to the range (–1, 1)
||Takes real-valued arguments and returns them unchanged
||f(x) = x
||Commonly used for a classification problem with multiple classes and returns the probability of each class
||softmax(z_i) = exp(z_i)/(Σ_j exp(z_j))
In case there is any confusion about which activation function to use that best suits your use case, it is advisable to use ReLU since it helps to overcome the vanishing gradient problem and allows the models to be better trained.
Weights and Biases
Weights signify the importance of the corresponding feature (input variable) in predicting the output. They also explain the relationship between that feature and the target output. The figure below illustrates that the output is a summation of the x (input) times the connection weight w0 and the b (bias) times the connection weight w1.
Figure 3: Weights and biases
Biases are like constants in a linear function y = mx+c where m = weights and c = bias. Without a bias, the model will pass through the origin only, and such scenarios are far from the reality. Thus, the bias helps in transposing the line and makes the model more flexible.
Common Neural Architectures
The neural network architecture is composed of neurons, and the way these neurons are arranged creates the structure that defines how the algorithm is going to learn. The arrangements have the input and the output layers with hidden layers in between that increase the model's computational and processing power. The key architectures are discussed below.
Radial Basis Function
The radial basis function (RBF) has a single non-linear hidden layer called a "feature vector," where the number of neurons in the hidden layer should be more than the number of neurons in the input layer to cast the data into a higher dimensional space. Thus, RBF increases the dimension of the feature vector to make the classification highly separable in high-dimensional space. The figure below illustrates how the inputs (x) are transformed to output (y) with through a single hidden layer (i.e., feature vector), which connects to x and y through the weights.
Figure 4: Radial basis function
Restricted Boltzmann Machines
Restricted Boltzmann machines (RBMs) are unsupervised learning algorithms with two-layer neural networks comprising a visible/input layer and the hidden layer without any intra-layer connections — i.e., no two nodes in the layers are connected, which creates restriction. RBMs are used for recommendation engines of movies, pattern recognition (e.g., understanding handwritten text), and radar target recognition for real-time intra-pulse recognition.
Figure 5: Restricted Boltzmann machines
Recurrent Neural Networks
Recurrent neural networks (RNNs) consider input as time series to generate output as time series with at least one connection cycle. RNNs are universal approximators: They can approximate virtually any dynamical system. RNNs are used for time series analyses like stock predictions, sales forecasting, natural language processing and translation, chatbots, image captioning, and music synthesis.
Figure 6: Recurrent neural networks
Long short-term memory (LSTM) — which is composed of forget, input, and output gates — has several applications including time series predictions and natural language understanding and generation. LSTM is primarily used to capture long-term dependencies. The forget gate decides whether to retain the information from the previous timestamp or "forget" it. A less complex variant with a smaller number of gates form the gated recurrent unit (GRU).
The GRU is a simplified variant of LSTM where forget and input gates are combined into a single update gate, and the cell state and hidden state are also combined. Thus, a GRU uses less memory and is therefore faster than LSTM.
Convolutional Neural Networks
Convolutional neural networks (CNNs) are widely popular for image classification. A CNN assigns weights and biases to objects in the image for classification purposes. An image comprising a matrix of pixel values is processed through the convolutional layer, pooling layer, and fully connected (FC) layer. The pooling layer reduces the spatial size of the convolved feature. The final output layer generates a confidence score to determine how likely it is that an image belongs to a defined class. CNNs are widely used in Facebook and other social media platforms to monitor content.
Deep Reinforcement Learning
deep RL, short for deep reinforcement learning, creates a perfect synergy by amalgamating the power of AI and reinforcement learning. Learning through reinforcement refers to the algorithm of rewarding for the right decision and punishing for the wrong one. Applications of deep RL include load balancing, robotics, industrial operations, traffic control, and recommendation systems.
Generative Adversarial Networks
Generative adversarial networks (GANs) use two neural networks, a generator, and a discriminator. While the generator helps in generating image, voice, and video content, the discriminator classifies them as either from the domain or generated. The two models are trained for a zero-sum game until it's proven that the generator model is producing reasonable results.
Neural Network Chips
Neural network chips provide the power of computing infrastructure through processing speed, storage, and networking that make the chips capable of quickly running neural network algorithms on vast amounts of data. Network chips break the tasks into multiple sub-tasks, which can run into multiple cores concurrently to increase the processing speed.
Types of AI Accelerators
Specialized AI accelerators have been designed that vary significantly depending on the model size, supported framework, programmability, learning curve, target throughput, latency, and cost. Such hardware includes the graphical processing unit (GPU), vision processing unit (VPU), field-programmable gate array (FPGA), central processing unit (CPU), and Tensor Processing Unit (TPU). While some accelerators like GPUs are more capable of handling computer graphics and image processing, FPGAs demand field programming using hardware description languages (HDLs) like VHDL and Verilog, and TPUs by Google are more specialized for neural network machine learning. Let's look at each of them separately below to understand their capabilities.
GPUs were originally developed for graphics processing and are now widely used for deep learning (DL). Their benefit is parallel processing through the following five architectures:
- Single instruction, single data (SISD)
- Single instruction, multiple data (SIMD)
- Multiple instructions, single data (MISD)
- Multiple instructions, multiple data (MIMD)
- Single instruction, multiple threads (SIMT)
A GPU computes faster than a CPU as it devotes more transistors to data processing, which help to maximize the memory bandwidth for large datasets with medium-to-large models and larger effective batch sizes.
VPUs are optimized DL processors aimed at enabling computer vision tasks with ultra-low power requirements without compromising performance. Thus, VPUs are optimized for deep learning inference by leveraging the power of pre-trained CNN models.
An FPGA has thousands of memory units that run parallel architectures at low power consumption. It consists of reprogrammable logic gates to create custom circuits. FPGAs are used for autonomous driving and automated spoken language recognition and search.
CPUs with MIMD architecture are brilliant in task optimization and are more suitable for applications with limited parallelism, such as sparse DNNs, RNNs that have dependency on the steps, and small models with small effective batch sizes.
A TPU is Google's custom-developed application-specific integrated circuit (ASIC) that is used to accelerate DL workloads. TPUs provide high throughput for large batch sizes and are suitable for models that train for weeks, dominated by matrix computations.
AI Accelerators for Deep Learning Inference
AI accelerators are required for DL inference for faster computation through parallel computational capabilities. They have high bandwidth memory that can allocate four to five times more bandwidth between processors than traditional chips. A couple of leading AI accelerators for DL inference are AWS Inferentia, a custom-designed ASIC, and Open Visual Inference and Neural Network Optimization (OpenVINO), an open-source toolkit for optimizing and deploying AI inference.
They both boost deep learning performance for performing tasks like computer vision, speech recognition, NLP, and NLG. OpenVINO uses models trained in frameworks including TensorFlow, PyTorch, Caffe, and Keras, and optimizes model performance with acceleration from CPU, GPU, VPU, and iGPU.
Neural Network Model Optimization
Deep learning model optimizations are used for various scenarios, including video analytics as well as computer vision. Since most of these computation-intensive analyses are done in real time, the following objectives are critical:
- Faster performance
- Reduced computational requirements
- Optimized space usage
For example, OpenVINO provides seamless optimization of neural networks with the help of the following tools:
- Model Optimizer – Converts models from multiple frameworks to Intermediate Representation (IR); these can then be concluded with OpenVINO Runtime. OpenVINO Runtime plugins are software components that comprise full implementation for inference on hardware such as CPUs, GPUs, and VPUs.
- Post-Training Optimization Toolkit (POT) – Accelerates the inference speed of IR models by applying post-training automated model quantization through the DefaultQuantization and AccuracyAwareQuantization algorithms.
- Neural Network Compression Framework (NNCF) – Integrates with PyTorch and TensorFlow to quantize and compress the model through pruning. The commonly used compression algorithms are 8-bit quantization, filter pruning, sparsity, mixed-precision quantization, and binarization.