Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Machine Learning With Deeplearning4j and Eclipse Scout

DZone's Guide to

Machine Learning With Deeplearning4j and Eclipse Scout

Look at a simple system to recognize monetary amounts on Swiss payment slips and see how to build, train, and run the deep neural net using Deeplearning4j.

· AI Zone
Free Resource

Machine learning and deep learning, in particular, are developing at amazing speeds. Today, machine learning can be used to solve ever more complex tasks that have been considered impractical just a few years ago. Examples include autonomous carsAlphaGo’s win against the world’s Go champion, the photo-realistic transformation of pictures, and neural machine translation systems.

In this blog post, we describe a simple system to recognize monetary amounts on Swiss payment slips. The user interface is implemented using Eclipse Scout and we build, train, and run the deep neural net using Deeplearning4j.

Image title

Recognizing Handwritten Amounts on Payment Slips

Anagnostes is an Eclipse Scout application that uses a convolutional neural network to recognize handwritten amounts on Swiss payment slips.

Image title

The screenshot above shows an image of the scanned payment slip in the upper part. In the lower part of the form, the output of the neural network is shown. The form shown above is implemented in class HcrForm.

Although all handwritten numerals are correctly recognized, the network assigns a low confidence score to the numeral six. This is indicated by the orange background which will prompt the operator to manually check the result and (if necessary) correct the output of the neural network in the user interface.

The Eclipse Scout Framework

The open-source framework Eclipse Scout has been specifically built for enterprise applications with the following goals in mind.

  • Enterprise user deserve simple and powerful user interfaces.
  • Implementing and maintaining business applications must be efficient.
  • Business applications should be independent of specific technologies.
  • Learning the framework should be painless.

Scout may be used for any type of business applications such as ERP, CRM, or medical data storage systems. As shown with the demo application described in this blog post innovative technologies such as machine learning are straightforward to integrate with Scout applications.

The framework has been proven in production for over a decade is currently based on Java and HTML5. Since 2010, the Scout Open Source project has been hosted by the Eclipse foundation.

The latest Scout release is shipped as part of the Eclipse Oxygen release train as of June 28, 2017.

Deeplearning4j: Machine Learning for Java

DeepLearning4J is a toolkit for building, training, and deploying neural networks. As of today, it is probably the most complete and mature deep learning library in the Java domain. The library also comes with a good documentation and can be easily integrated with Java applications. For the example application described in this blog post, it is enough to add the following Maven dependencies.

<dependency>
	<groupId>org.deeplearning4j</groupId>
	<artifactId>deeplearning4j-core</artifactId>
	<version>${org.deeplearning4j.version}</version>
</dependency>
<dependency>
	<groupId>org.nd4j</groupId>
	<artifactId>nd4j-native-platform</artifactId>
	<version>${org.deeplearning4j.version}</version>
</dependency>
<dependency>
	<groupId>org.datavec</groupId>
	<artifactId>datavec-api</artifactId>
	<version>${org.deeplearning4j.version}</version>
</dependency>

As machine learning is always about models that need to be trained on some data and then applied to some other data, we want to illustrate these steps using the Deeplearning4j library. Let’s start by constructing a new multi-layer network like the class NeuralNetwork  of the Anagnostes demo application.

/**
 * Build a network with initial (random) weights/parameters.
 */
public NeuralNetwork() {
 MultiLayerConfiguration configuration = LeNet.networkConfiguration();
 m_network = new MultiLayerNetwork(configuration);
 m_network.init();
}

For now, we skip the description of the network configuration object, as this is covered in more detail in the Network Architecture section below. We can then train this neural network model as follows.

/** 
 * Train the network for the specified number of epochs. 
 */
public void train(DataSetIterator trainData, DataSetIterator validationData, int epochs) {
 for (int epoch = 1; epoch <= epochs; epoch++) {

  // train the network using training data
  log.info("Starting epoch {}, samples: {}", epoch, trainData.numExamples());
  trainData.reset();
  m_network.fit(trainData);

  // evaluate performance using validation data
  validationData.reset();
  evaluate(validationData);
 }
}

The above method trains the neural network over several epochs (an epoch corresponds to cycling through the complete training data once). In each epoch, the network's parameters are updated to improve the network's performance on the training data with the line m_network.fit(trainData) . To verify the performance with data not seen during training, the network is evaluated after each epoch using separate validation data.

The trained model can then be used to classify new data. In our demo application, we want to recognize handwritten numerals. The code below takes an image as input and transforms the normalized image into an input vector for the network using Nd4j.create(normalizedImage) . The network then classifies this input with the statement m_network.output(input)  by assigning confidence values to each numeral class 01 … 9’. The confidence value for class 4 can then be accessed with output.getDouble(4) .

/** 
 * Train the network for the specified number of epochs. 
 */
public void train(DataSetIterator trainData, DataSetIterator validationData, int epochs) {
 for (int epoch = 1; epoch <= epochs; epoch++) {

  // train the network using training data
  log.info("Starting epoch {}, samples: {}", epoch, trainData.numExamples());
  trainData.reset();
  m_network.fit(trainData);

  // evaluate performance using validation data
  validationData.reset();
  evaluate(validationData);
 }
}

Getting the Data: Handwritten Digits

Good data is always of central importance whenever we apply machine learning to a specific domain. For the sake of simplicity and comparability, we decided to go for the best-known task in the domain of machine learning: the classification of handwritten numerals. By far the most frequently used data collection is called the MNIST database. It contains roughly 60,000 images of numerals to train systems and 10,000 numerals to test systems.

Image title

The individual numerals in the MNIST database are normalized to 28 by 28 pixels of gray-level images. The picture above provides some examples.

For our demo application, we also wanted to experiment with our own data in addition to publicly available MNIST data. For the data collection, we asked people to fill in a simple form with their everyday writing style. See below for a picture of such a collection form.

Image title

In a simple semi-manual process the scanned form is then converted into individual image files holding a single isolated numeral. In contrast to the MNIST data, the images of our numbers database are normalized for training and testing at runtime. For our experiments, we now have 10,000 digit images written by 20 individuals. As in the case of MNIST, our data is publicly available. In contrast to the MNIST data our scanned images are available in their original format (color or grayscale, whatever we received as contributions).

Side note: Please consider to contribute to this collection! Our next goal is to reach 20,000 images. We gladly accept pull requests containing at least the scan of your filled in form (using the template).

Image Processing and Data Preparation

Before we can use the images of our handwritten numerals for training and/or recognition, we perform an image normalization step that converts the scanned numeral into the 28 by 28 gray-level pixel format used by the MNIST database. This normalization step is illustrated below.

Image title

This normalization has the advantage that we can work with existing network architectures that have been extensively tested by the machine learning community and at the same time, it allows us to use the existing MNIST data to amend our own data collection.

To match the MNIST images format, the normalization process consists of the following steps:

  1. Binarize the color or gray-level image. This results in a black and white image.
  2. Resize the cropped numeral to a 20 by 20 pixel box while preserving the aspect ratio.
  3. Calculate the center of gravity for the resized image.
  4. Center the image in a 28 by 28 pixel frame using the center of gravity calculated above.

Implementation details for this normalizations can be found in class ImageUtility. This utility class is subsequently used in NumbersDataFetcher and NumbersDatasetIterator for the training and the evaluation of the neural network models of this demo application.

Neural Network Architecture

For the neural network architecture, we use a convolutional neural network similar to the one proposed by Yann Le Cun et al in 1998. This architecture is illustrated in the diagram provided by Le Cun’s publication.

Image title

The architecture can be divided into a feature extraction stage (convolutional and subsampling layers) and a classification stage (the fully connected layers at the right end). The planes in the convolutional layers implement different filters that are applied to the input image. By applying subsampling and adding more convolutional layers, the network is capable of learning a set of filter combinations that prove to be highly effective for image classification. To learn more about convolutional network architectures, check out  Denny Britz’s blog post.

The classification stage corresponds to the classical neural network architecture that has been around for over 30 years. Any neural network tutorial covering multilayer perceptrons will do to learn more.

Based on the diagram for the network architecture in our demo application below, it should become clear that this implementation is very close to the LeNet architecture proposed in 1998.

Image title

This architecture is defined in the LeNet class of our demo application. The Deeplearning4j configuration for this architecture looks as follows.

/**
 * Provides the configuration for a convolutional neural network with 6 layers.
 * The network's architecture closely matches with the LeNet by Yann Le Cun.
 */
public static MultiLayerConfiguration networkConfiguration() {
 return new NeuralNetConfiguration.Builder()
  .seed(SEED).weightInit(WeightInit.XAVIER)
  .iterations(NUM_ITERATIONS)
  .regularization(true).l2(0.0005).learningRate(.01)
  .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
  .updater(Updater.NESTEROVS).momentum(0.9)
  .list()
  .layer(0, new ConvolutionLayer.Builder(5, 5)
   .stride(1, 1)
   .nIn(NUM_CHANNELS)
   .nOut(20)
   .activation(Activation.IDENTITY)
   .build())
  .layer(1, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
   .kernelSize(2, 2)
   .stride(2, 2)
   .build())
  .layer(2, new ConvolutionLayer.Builder(5, 5).stride(1, 1)
   .nOut(50)
   .activation(Activation.IDENTITY)
   .build())
  .layer(3, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
   .kernelSize(2, 2)
   .stride(2, 2)
   .build())
  .layer(4, new DenseLayer.Builder()
   .activation(Activation.RELU)
   .nOut(500)
   .build())
  .layer(5, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
   .activation(Activation.SOFTMAX)
   .nOut(NUM_OUTPUTS)
   .build())
  .setInputType(InputType.convolutionalFlat(28, 28, 1))
  .backprop(true)
  .pretrain(false).build();
}

This might seem somewhat intimidating at first sight. But then again, this corresponds to the result of years of research. Luckily, the Deeplearning4j library comes with an extensive set of examples that provide valuable starting points for many different machine learning use cases.

Summary

This blog post describes a simple demo application to recognize numeral amounts on payment slips. The application has a user interface part implemented with the Eclipse Scout framework and a machine learning part implemented using the Deeplearning4j library.

Dealing with a task for which is sufficient to work with only six layers roughly corresponds to a deep learning “Hello World” exercise. At the same time, the described use case covers many of the recurring topics for machine learning problems. For many more complex problems, it is not unusual to work with dozens or even over hundred layers as in the case of the ImageNet challenges.

In our experience, integrating Deeplearning4j with Eclipse Scout applications proved to be straightforward. If you’d like to play around with the demo application, clone the Anagnostes repository and import the project as an existing Maven project in your Eclipse IDE (please use the Scout package as described on the Scout homepage).

Topics:
machine learning ,ai ,image recognition ,deep learning ,deeplearning4j ,eclipse scout ,tutorial

Published at DZone with permission of Matthias Zimmermann. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}