Over a million developers have joined DZone.

Deciphering Handwritten Numbers With a Neural Network

DZone's Guide to

Deciphering Handwritten Numbers With a Neural Network

In this article, I will focus on comparing the predictions of my neural networks with different parameters to decipher handwritten numbers.

· AI Zone ·
Free Resource

Insight for I&O leaders on deploying AIOps platforms to enhance performance monitoring today. Read the Guide.

I got around to building an NN using multilayer perceptrons (MLP) to recognize handwritten numerals. This is a classic beginner's NN problem. In this article, I will focus on comparing the predictions of the NN with different parameters.

The Problem/Data

The MNIST database was used for the problem. MNIST has about 70,000 different handwritten numbers. The data is well arranged, is gray-scale, is in a 28x28 matrix, and is centered in the matrix — making this a "relatively" easy problem to tackle.

As you can see from the serif on 1, the numbers are slightly different, and classifying these right is the gist of the problem. The NN views each image as a grayscale image, i.e. 0-255 bitmap.

The NN Architecture

I have built a two-layer NN that takes in 784 (28x28) input nodes. The 784 is "flattened 28x28" into a single row because MLPs cannot understand multidimensional inputs. The NN outputs ten nodes (0-9). The final activation function is a softmax, as in it gives us a probability of a number being either 0, 1 ...9. The architecture looks like following:

The Impact of the Hyper-Parameters

I ran about ten experiments to see the impact of the hyper-parameters on the prediction capabilities of the NN.

The boxes in green show the parameters that were tuned.

The accuracy w/o training is the number without training the network. As you can see, without training, most NNs performed close to 10% — that is, the NN might as well be guessing the answer (0-9, i.e. 1/10 probability of estimating it right).

The prediction test data is how the NN performed after the training on the training data. This is what we are after.

The validation error is how close the NN is to the actual answer on every run (or epoch). This is a key parameter to observe if you want to make sure that the data is not overfitting. I think of overfitting as the student who gets the paper before the exam and is extremely well prepared for the exam, but when he goes to the real world, he is in trouble.

Interesting Observations

  1. Increasing the batch size or the rows of data that the NN can digest increases the speed of learning.
  2. Changing the activation function has a good chunk size impact on the accuracy of the NN. Sigmoids are not in flavor and we can see that the NN lost about 1% accuracy.
  3. My hypothesis that I could increase the accuracy of the NN if I increased the number of nodes or the depth of the NN was disproven. I am not quite sure why.
  4. Changing the optimizer has the biggest impact on the performance of an NN. This isn't surprising at all. Optimizers are the functions that perform the gradient descent to converge to the solution and gradient descent is what makes an NN work. Choosing an inefficient gradient descent is going to nuke your results.


Finding the right hyper-parameter within an NN is key to its performance and finding the numbers is based on experiments rather than on theory.

I am starting to enjoy NN with libraries such as Keras. Building a simpler NN in Python was like getting a root canal! And I don't think I could do the MNIST data in Python.

I am beginning to be dazzled by NNs. If you had told me a week earlier that I could sit down and write a program to understand handwritten numbers to predict what they were, I wouldn't have believed you. And here, I started off by saying this problem was "relatively" easy to solve. Amazing!

Disclaimer: Most of the source code was provided by Udacity — I filled the key NN architecture. Here is the complete source from Udacity.

TrueSight is an AIOps platform, powered by machine learning and analytics, that elevates IT operations to address multi-cloud complexity and the speed of digital transformation.

ai ,neural network ,keras ,python ,mnist ,image recognition ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}