The Artificial Neural Networks Handbook: Part 3
The Artificial Neural Networks Handbook: Part 3
In part three of the Artificial Neural Networks Handbook series, explore a biological background of ANNs and a comparison of conventional computational techniques.
Join the DZone community and get the full member experience.Join For Free
Chronicle of Artificial Neural Networks Development
According to Nelson and Illingworth , the earliest attempt to understand the human brain goes back centuries. They cite information given by Fischler and Firschein  who refer to the work of Hippocrates, and the less familiar Edward Smith Papyrus; a treatise written around 3000 BC that described the location of certain sensory and motor control areas in the brain. For the most part of history, since the days of ancient Greek philosophers such as Plato and Aristotle, the study of the brain has been limited to the philosophical question of whether the mind and the body are one. As Rich and Knight state in the beginning of their book, Artificial Intelligence, “Philosophy has always been the study of those branches of knowledge that were so poorly understood that they had not yet become separate disciplines in their own right.” This was certainly true with modern brain theory and the eventual development of Artificial Neural Networks (ANNs). Technology to enable the study of the workings of the brain was not available until the late nineteenth century. Since then, ANNs have had a very rocky climb to fame. There are four distinct periods of their development to their current status. Eberhart and Dobbins  classified them in the following order:
1890–1969: The Age of Camelot
1969–1982: The Dark Age (Depression Age)
1982–1986: The Renaissance
1986-Current: The Age of Neoconnectionism
The first period began in the late nineteenth century with the advent of modern science and the pursuit for a better understanding of the workings of the brain. As technology improved, psychologists and biologist were able to start hypothesizing on how rather than why the human brain functions. Most ANNs literature places the beginning of the ANNs and modern brain theory era with the publication of a text by William James entitled “Psychology (Briefer Course)” [James 1890]. The text contained many insights into brain activities and was the precursor of many of the current theories.
It was some fifty years later before the next major breakthrough came in 1943 when McCulloch and Pitts presented their first model of a biological neuron [McCulloch and Pitts 1943]. They developed theorems related to models of neuronal systems based on the knowledge of the biological structure at the time. Their models could solve any finite logical expressions, and, since James, they were the first authors who proposed a massively parallel neural model. However, their models could not “learn” as they used only fixed weights. Donald Hebb , an eminent psychologist, added to this knowledge with his hypothesis of how the neurons communicated and stored knowledge in the brain structure. This hypothesis became known as Hebbian Learning Rule and enabled the eventual development of learning rules for the McCulloch-Pitts neural models.
This period peaked in 1958 when Frank Rosenblatt published his landmark paper [Rosenblatt 1958] that defined a neural network structure called the perceptron. Rosenblatt was inspired by the way the eye functioned and built his perceptron model based on it. He incorporated learning based on the Hebbian Learning Rule into the McCulloch-Pitts neural model. The tasks that he used the perceptron to solve were identifying simple pattern recognition problems such as differentiating sets of geometric patterns and alphabets. The Artificial Intelligence community was excited with the initial success of the perceptron and expectations were generally very high with the perception of the perceptron being the panacea for all the known computer problems of that time. Bernard Widrow and Marcian Hoff contributed to this optimism when they published a paper [Widrow and Hoff 1960] on ANNs from the engineering perspective and introduced a single neuron model called ADALINE that became the first ANN to be used in a commercial application. It has been used since then as an adaptive filter for telecommunication to cancel out echoes on phone lines. The ADALINE used a learning algorithm that became known as the delta rule. It involves using an error reduction method known as gradient descent or steepest descent.
However, in 1969, Marvin Minsky and Samuel Papert, two well renown researchers in the Artificial Intelligence field, published a book entitled ‘Perceptron’ [Minsky and Papert 1969], criticizing the perceptron model, concluding that it (and ANNs as a whole) could not solve any real problems of interest. They proved that the perceptron model, being a simple linear model with no hidden layers, could only solve a class of problems known as linearly separable problems. One example of a non-linearly separable problem that they proved the perceptron model was incapable of solving is the now infamous exclusive-or and its generalization, the parity detection problem. Rosenblatt did consider multilayer perceptron models but at that time, a learning algorithm to train such models was not available.
This critique, coupled with the death of Rosenblatt in a boat accident in 1971 [Masters 1993], cast doubt on the minds of research sponsors and researchers alike on the viability of developing practical applications from Artificial Neural Networks. Funds for ANNs research dried up, and many researchers went on to pursue other more conventional Artificial Intelligence technology. In the prologue of the recent reprint of ‘Perceptron’, Minsky and Papert [1988, pp. vii-xv] justified their criticism of the perceptron model and pessimism of the ANNs field at that time by claiming that the redirection of research was “no arbitrary diversion but a necessary interlude”. They felt that more time was needed to develop adequate ideas about the representation of knowledge before the field could progress further. They further claimed that the result of this diversion of resources brought about many new and powerful ideas in symbolic AI such as relational databases, frames, and production systems, which in turn, benefited many other research areas in psychology, brain science, and applied expert systems. They hailed the 1970s as a golden age of a new field of research into the representation of knowledge. Ironically, this signaled the end of the second period of ANN development and the beginning of the Dark Ages for ANNs research.
However, pockets of researchers such as David Rumelhart at UC San Diego (now at Stanford University), Stephen Grossberg at Boston University, Teuvo Kohonen in Finland and Kunihiko Fukushima in Japan, persisted with their research into Artificial Neural Networks. Their work came into fruition in the early 1980s, an era that many deemed as the Renaissance period of ANNs. John Hopfield of the California Institute of Technology, a prominent scientist, presented a paper [Hopfield 1984] at the Academy of Science on applying ANNs to the infamous ‘traveling salesman problem’. It was his ability to describe his work from the point of a scientist coupled with his credibility, that heralded the gradual re-acceptance of ANNs. Interest grew from researchers from a multitude of fields, ranging from biologists to bankers, and engineers to psychologists. This era culminated with the publication of the first of the three volumes, now famous reference text on ANNs, ‘Parallel Data Processing’ by Rumelhart et al. [1986b]. The authors had proposed the ‘back-propagation’ learning algorithm in an earlier publication [1986a] that was popularized by the text. The back-propagation algorithm overcame some of the pitfalls of the perceptron model that were pointed out by Minsky and Papert by allowing multi-layer perceptron models to learn. According to Ripley , the back-propagation algorithm was originally discovered by Bryson and Ho  and Werbos  but did not gain prominence until it was rediscovered and popularized by Rumelhart et al. According to Eberhart and Dobbins , it is hard to overstate the effect the Parallel Data Processing (PDP) books had on neural network research and development. They attribute the success of the books in one sentence: “The books presented everything practical there was to know about neural networks in 1986 in an understandable, usable and interesting way; in fact, 1986 seemed to mark the point at which a ‘critical mass’ of neural network information became available”.
The current era begins where the PDP books left off and has been called the Age of Neoconnectionism by Cowan and Sharp . In this era, there has being a growing number of commercial ANN applications as well as continued prolific research interest from a wide field of disciplines in ANNs, as evident by the number of publications and conferences on ANNs. Sejnowski and Rosenburg’s  success on their NETtalk ANN-based speech generation program that teaches itself to read out aloud and subsequent work by Martin  on an ANN-based handwriting recognition to recognize zip codes for the US Post Office, spurred on the prominence of ANNs as a potential application tool for handling difficult tasks. The significant improvements in computer technology, as well as the rapid reduction in the cost of high powered computers, have resulted in making the development of ANNs applications a universally attractive and affordable option.
ANNs were inspired by the biological sciences, particularly the neurological sciences, as discussed in the section on the chronicle of their development. However, ANNs resemblance to their biological counterparts are limited to some borrowed concepts from the biological networks, mainly for their architecture. They are still far from resembling the workings of the simplest biological networks, due to the enormous complexity of the biological networks.
The cells found in the human brain and nervous system are known as neurons. Information or signals are transmitted out unidirectionally through connections between neurons known as axons. Information is received by a neuron through its dendrites. The human brain consists of around 100 billion neurons and over 1014 synapses. Neurons communicate with each other through synapses which are gaps or junctions between the connections. The transmitting side of the synapses release neurotransmitters which are paired to the neuroreceptors on the receiving side of the synapses. Learning is usually done by adjusting existing synapses, though some learning and memory functions are carried out by creating new synapses. In the human brain, neurons are organized in clusters and only several thousands or hundreds of thousands participate in any given task. Figure shows a sample neurobiological structure of a neuron and its connections.
The axon of a neuron is the output path of a neuron that branches out through axon collaterals which in turn connect to the dendrites or input paths of neurons through a junction or a gap known as the synapse. It is through these synapses that most learning is carried out by either exciting or inhibiting their associated neuron activity. However, not all neurons are adaptive or plastic. Synapses contain neurotransmitters that are released according to the incoming signals. The synapses excite or inhibit their associated neuron activity depending on the neurotransmitters released. A biological neuron will add up all the activating signals and subtract all the inhibiting signals from all of its synapses. It will only send out a signal to its axon if the difference is higher than its threshold of activation.
The processing in the biological brain is highly parallel and is also very fault tolerant. The fault tolerance characteristic is a result of the neural pathways being very redundant and information being spread throughout synapses in the brain. This wide distribution of information also allows the neural pathways to deal well with noisy data.
A biological neuron is so complex that current supercomputers cannot even model a single neuron. Researchers have therefore simplified neuron models in designing ANNs.
Comparison of Conventional Computational Techniques
ANNs differ from conventional computational techniques in that the system builder of an ANN is not required to write programs, hence, there is no necessity for the system builder to know a priori the necessary rules or models that are required to perform the desired task. Instead, a system builder trains an ANN to ‘learn’ from previous samples of data in much the same way that a teacher would teach a child to recognize shapes, colors, alphabets, etc. The ANN builds an internal representation of the data and by doing so ‘creates’ an internal model that can be used with new data that it has not seen before.
Existing computers process information in a serial fashion while ANNs process information in parallel. This is why even though a human brain neuron transfers information in the milliseconds (10^–3) range while current computer logic gates operate in the nanosecond
(10^–9) range, about a million times faster, a human brain can still process a pattern recognition task much faster and more efficiently than the fastest currently available computer. The brain has approximately 10^–11 neurons and each of these neurons acts as a simple processor that processes data concurrently; i.e. in parallel. Tasks such as walking and cycling seem to be easy to humans once they have learned them and certainly not much thought is needed to perform these tasks once they are learned. However, writing a conventional computer program to allow a robot to perform these tasks is very complex. This is due to the enormous quantity of data that must be processed in order to cope with the constantly changing surrounding environment. These changes require frequent computation and dynamic real-time processing. A human child learns these tasks by trial and error. For example, in learning to walk, a child gets up, staggers and falls, and keeps repeating the actions over and over until he/she has learned to walk. The child effectively ‘models’ the walking task in the human brain through constant adjustments of the synaptic strengths or weights until a stable model is achieved.
Humans (and neural networks) are very good at pattern recognition tasks. This explains why one can usually guess a tune from just hearing a few bars of it or how a letter carrier can read a wide variety of handwritten address without much difficulty. In fact, people tend to always associate their senses with their experiences. For example, in the ‘Wheel of Fortune’ game show, the contestants and viewers are usually able to guess a phrase correctly from only a few visible letters in a phrase. The eyes tend to look at the whole phrase, leaving the brains to fill in the missing letters in the phrase and associate it with a known phrase. Now, if we were to process this information sequentially like a serial computer; i.e., look at one visible character at a time; and try to work out the phrase, it would be very difficult. This suggests that pattern recognition tasks are easier to perform by looking at a whole pattern (which is more akin to neural network’s parallel processing) rather than in sequential manner (as in a conventional computer’s serial processing).
In contrast, tasks that involve many numerical computations are still done faster by computers because most numerical computations can be reduced to binary representations that allow fast serial processing. Most of today’s ANN programs are being simulated by serial computers, which is why speed is still a major issue for ANNs, specifically the training time. There are a growing number of ANN hardware available in the market today including personal computer-based ones like the Intel’s Ni1000 and the Electronically Trainable Artificial Neural Network (ETANN), the IBM’s ZISC/ISA Accelerator for PC and the Brainmaker Professional CNAPSä Accelerator System. These ANN hardware process information in parallel, but the costs and the learning curves required to use them are still quite prohibitive. Most researchers are of the view that in the near future, a special ANN chip will be sitting next to the more familiar CPU chip in personal computers, performing pattern recognition tasks such as voice and optical character recognition.
In this part, we have covered the biological description and analogy of ANNs. In the next part, we will deep dive into construction and description of ANNs and analyze their strength and weaknesses.
Published at DZone with permission of Jayesh Bapu Ahire , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.