4 Keys to Designing a Neural Network Model
From the layout of layers to optimization rules, pay attention to these elements.
Join the DZone community and get the full member experience.Join For Free
Artificial neural networks (ANNs) are a commonly used tool in deep learning. In this tutorial, you can learn what they are, learn their basic structure, and code a simple neural network with only one neuron. When you design your own neural networks, there are a number of considerations to take into account. This article will describe a few.
Layout of Network Layers
Rarely if ever are neural networks as simple as one neuron. For example, the majority of neural networks have at least several layers. You will need hidden layers, layers of artificial neurons between the input and output neuron layers if the data are to be separated in a non-linear fashion.
You might want to think about each hidden neuron as a linear classifier. The number of neurons in the first hidden layer should equal the number of lines you would need to draw to classify your data. The later hidden layers and the output layer connect the various linear classifiers.
In neural networks, an activation function is a function that determines the artificial neuron’s output, given specific inputs. In my earlier tutorial, we used a sigmoid function, which has the advantage of forcing outputs to be within a specific range. Another advantage is that a sigmoid function is monotonic — in other words, the value order of inputs is the same as the value order of outputs. A disadvantage of sigmoid functions is that especially where the sigmoid curve is relatively flat, learning can be slow.
Another popular type of activation function is called the rectified linear unit (ReLU). The value of this function is simply 0 if x is less than 0. Otherwise, it is x. ReLUs enable a faster learning process, even if they create an arbitrary distinction between negative and positive values of x. In their advantages and disadvantages, hyperbolic tangent or tanh activation functions tend to be a happy medium between sigmoid and ReLU.
Loss is merely the prediction error of the neural network, determined in each pass-through. Ideally, it should be minimized. The loss function, or the function of loss against output and predicted value, is used to update the weights of the neural network for the next pass-through. The calculation of the new weights is based in some way on the gradient, a function representing the slope of the loss function at each point. Different types of loss functions should be used for different types of regression or classification tasks, as described in more detail here.
An optimizer is an algorithm or other method which will update the attributes of the neural network, in order to minimize the losses. For example, it can account for the history of gradient updates, rather than only updating the gradient from a single set — or batch — of data samples. It may incorporate momentum — in other words, the newest update will be the weighted average of all previous updates, with the older weights decayed exponentially.
These are some of the many considerations you will need to have when developing your neural network model. Fortunately, many of the most common neural network features are supported by the major machine learning libraries, such as TensorFlow and PyTorch. So, as long as you are not doing anything too unusual, implementability should not be a concern.
Published at DZone with permission of Rebecca Sealfon. See the original article here.
Opinions expressed by DZone contributors are their own.