# Artificial Neural Networks: Some Misconceptions (Part 1)

# Artificial Neural Networks: Some Misconceptions (Part 1)

### In Part 1 of this 5-part series, learn about two misconceptions about ANNs: that they are models of the human brain and that they are a weak form of statistics.

Join the DZone community and get the full member experience.

Join For FreeNeural networks are one of the most popular and powerful classes of machine learning algorithms. In quantitative finance, neural networks are often used for time series forecasting, constructing proprietary indicators, algorithmic trading, securities classification, and credit risk modeling. They have also been used to construct stochastic process models and price derivatives. Despite their usefulness, neural networks tend to have a bad reputation because their performance is “temperamental.” In my opinion, this can be attributed to poor network design owing to misconceptions regarding how neural networks work. This series discusses some of those misconceptions.

## Neural Networks Are Not Models of the Human Brain

The human brain is one of the great mysteries of our time, and scientists have not reached a consensus on exactly how it works. Two theories of the brain exist: namely, the grandmother cell theory and the distributed representation theory. The first theory asserts that individual neurons have high information capacity and are capable of representing complex concepts such as your grandmother or even Jennifer Aniston. The second theory asserts that neurons are much more simple and representations of complex objects are distributed across many neurons. Artificial neural networks are loosely inspired by the second theory.

One reason why I believe that the current generation neural networks is not capable of sentience (a different concept than intelligence) is because I believe that biological neurons are much more complex than artificial neurons.

A single neuron in the brain is an incredibly complex machine that even today we don’t understand. A single “neuron” in a neural network is an incredibly simple mathematical function that captures a minuscule fraction of the complexity of a biological neuron. So to say neural networks mimic the brain, that is true at the level of loose inspiration, but really artificial neural networks are nothing like what the biological brain does. — Andrew Ng

Another big difference between the brain and neural networks is size and organization. Human brains contain many more neurons and synapses than neural networks and they are more self-organizing and adaptive. Neural networks, in comparison, are organized according to an architecture. Neural networks are not “self-organizing” in the same sense as the brain, which much more closely resembles a graph than an ordered network.

*Some very interesting views of the brain as created by state of the art brain imaging techniques. *

So, what does that mean? Think of it this way: a neural network is inspired by the brain in the same way that the Olympic stadium in Beijing is inspired by a bird’s nest. That does not mean that the Olympic stadium *is* a bird nest — it means that some elements of bird nests are present in the design of the stadium. In other words, elements of the brain are present in the design of neural networks, but they are a lot less similar than you might think.

In fact, neural networks are more closely related to statistical methods such as curve fitting and regression analysis than the human brain. In the context of quantitative finance, I think it is important to remember that because while it may sound cool to say that something is "inspired by the brain," this statement may result in unrealistic expectations or fear. For more info, see this article*.*

*An example of curve fitting, also known as function approximation. Neural networks are quite often used to approximate complex mathematical functions.*

## Neural Networks Aren’t a Weak Form of Statistics

Neural networks consist of layers of interconnected nodes. Individual nodes are called perceptrons and resemble a multiple linear regression. The difference between a multiple linear regression and a perceptron is that a perceptron feeds the signal generated by a multiple linear regression into an activation function that may or may not be non-linear. In a multi-layered perceptron (MLP), perceptrons are arranged into layers and layers are connected with one another. In the MLP, there are three types of layers: namely, the input layer, hidden layer(s), and output layer. The input layer receives input patterns and the output layer could contain a list of classifications or output signals to which those input patterns may map. Hidden layers adjust the weightings on those inputs until the error of the neural network is minimized. One interpretation of this is that the hidden layers extract salient features in the input data, which have predictive power with respect to the outputs.

### Mapping Inputs: Outputs

A perceptron receives a vector of inputs, **z**=(*z*1,*z*2,…,*Zn*), consisting of *n *attributes. This vector of inputs is called an input pattern. These inputs are weighted according to the weight vector belonging to that perceptron, **v**=(*v*1,*v*2,…,*vn*). In the context of multiple linear regression, these can be thought of as regression coefficients or betas. The net input signal, *net*, of the perceptron is usually the sum product of the input pattern and their weights. Neurons which use the sum-product for *net *are called summation units.

*net*=∑*ni*−1*zivi*

The net input signal minus a bias *θ *is then fed into some activation function, *f*(). Activation functions are usually monotonically increasing functions that are bounded between either (0,1) or (−1,1) (This is discussed further on in this series.) Activation functions can be linear or non-linear.

Some popular activation functions used in neural networks are shown below:

The simplest neural network is one that has just one neuron that maps inputs to an output. Given a pattern, *p*, the objective of this network would be to minimize the error of the output signal, *op*, relative to some known target value for some given training pattern, *tp*. For example, if the neuron was supposed to map *p* to -1 but it mapped it to 1, then the error, as measured by sum-squared distance, of the neuron would be 4, (−1−1)2.

### Layering

As shown in the image above, perceptrons are organized into layers. The first layer or perceptrons, called the input later, receive the patterns, *p*, in the training set, *PT*. The last layer maps to the expected outputs for those patterns. An example of this is that the patterns may be a list of quantities for different technical indicators regarding security and the potential outputs may be the categories {*BUY*,* HOLD*,* SELL*}.

A hidden layer is one that receives as inputs the outputs from another layer, and for which the outputs form the inputs into yet another layer. So, what do these hidden layers do? One interpretation is that they extract salient features in the input data, which have predictive power with respect to the outputs. This is called feature extraction and in a way, it performs a similar function to statistical techniques such as principal component analysis.

Deep neural networks have a large number of hidden layers and are able to extract much deeper features from the data. Recently, deep neural networks have performed particularly well for image recognition problems. An illustration of feature extraction in the context of image recognition is shown below:

I think that one of the problems facing the use of deep neural networks for trading (in addition to the obvious risk of overfitting) is that the inputs into the neural network are almost always heavily pre-processed, meaning that there may be few features to actually extract because the inputs are already to some extent features.

### Learning Rules

As mentioned previously, the objective of the neural network is to minimize some measure of error, *ϵ*. The most common measure of error is sum-squared-error, although this metric is sensitive to outliers and may be less appropriate than tracking error in the context of financial markets.

Sum squared error (SSE), *ϵ*=∑*PTp*=1(*tp*−*op*)2

Given that the objective of the network is to minimize *ϵ*, we can use an optimization algorithm to adjust the weights in the neural network. The most common learning algorithm for neural networks is the gradient descent algorithm, although other and potentially better optimization algorithms can be used. Gradient descent works by calculating the partial derivative of the error with respect to the weights for each layer in the neural network and then moving in the opposite direction to the gradient (because we want to *minimize* the error of the neural network). By minimizing the error, we maximize the performance of the neural network *in-sample*.

Expressed mathematically, the update rule for the weights in the neural network (**v**) is given by:

*vi*(*t*)=*vi*(*t*−1)+*δvi*(*t*)

Where:

*δvi*(*t*)=*η*(−∂*ϵ*∂*vi*)

Where:

∂*ϵ*∂*vi*=−2(*tp*−*op*)∂*f*∂*netpzi*,*p*

Where *η *is the learning rate that controls how quickly or slowly the neural network converges. It is worth noting that the calculation of the partial derivative of *f*, with respect to the net input signal for a pattern *p* represents a problem for any discontinuous activation functions; which is one reason why alternative optimization algorithms may be used. The choice of learning rate has a large impact on the performance of the neural network. Small values for *η* may result in very slow convergence, whereas high values for *η *could result in a lot of variance in the training.

## Summary

Despite what some of the statisticians I have met in my time believe, neural networks are not just a “weak form of statistics for lazy analysts” (I have actually been told this before and it was quite funny). Neural networks represent an abstraction of solid statistical techniques that date back hundreds of years. For a fantastic explanation of the statistics behind neural networks, I recommend reading this chapter. That said, I do agree that some practitioners like to treat neural networks as a “black box” that can be thrown at any problem without first taking the time to understand the nature of the problem and whether or not neural networks are an appropriate choice. An example of this is the use of neural networks for trading; markets are dynamic, yet neural networks assume the distribution of input patterns remains stationary over time. This is discussed in more detail here.

Published at DZone with permission of Jayesh Bapu Ahire , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

{{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}