Over a million developers have joined DZone.

Improving Neuroph Performance

DZone's Guide to

Improving Neuroph Performance

· Performance Zone ·
Free Resource

Sensu is an open source monitoring event pipeline. Try it today.

After a recently published benchmark of the three leading Java Neural Network frameworks, that is, Neuroph, Encog, and JOONE, we at Neuroph realized that we need to optimize Neuroph, a NetBeans Platform application, in order to provide better performance (faster training) and to support new technologies (multi core, GPUs, and cluster computing). There is a discussion about all this on our forums:


What We Did

So we did some basic optimizations at Java level and managed to improve Neuroph's performance significantly. We did the following:

  • Converted all Vectors to ArrayLists and arrays
  • Removed boxing for all double variables
  • Added optimized implementation for WeightedSum input function since it was one of the bottlenecks we discovered using NetBeans profiler

And just doing this improved Neuroph's performance nearly two times. Not bad, but we need more.

The next thing we tried is to make a matrix based implementation of neural network (see package org.neuroph.contrib.matrixmlp). In this approach each layer consists of few arrays which contains data from all neurons and weights in array/matrix form. Bruce Wooton was the first who suggested this kind of implementation based on the ujmp (Universal Java Matrix Package). We also created the implementation based on plain Java arrays and it turned out to be pretty fast. The good thing is that we were able to reuse existing class hierarchies and architecture, and just to provide different implementation at layer and learning rule level.

The Benchmark

We used the same benchmark as here. Benchmark tests how fast Multi Layer Perceptron with Momentum Backpropagation can push some random data forward and backward. The same kind of network and learning rule was used with Encog and JOONE. This benchmark assumes that all networks are using the same training algorithm (so it will require the same number of iterations to train network), and wants to determine which network is fastest at learning/processing data.

Benchmark settings:

Input neurons 10
Hidden neurons 20
Output neurons 10
Training set size 10000
Number of iterations 50

Hardware used:

CPU AMD Phenom II x4 965 3.4 Ghz
Memory 4 GB
OS Win7 32-bit


The benchmark results are shown in the table and picture bellow.

Encog 2.4 Neuroph 2.5
(matrix implementation)
Neuroph 2.5
(object implementation)
JOONE 2RC1 Neuroph 2.4
3.432 8.908 26.848 38.07 43.627
3.416 6.302 26.864 38.197 43.6
3.385 7.707 26.723 37.615 43.681
3.432 7.581 26.863 37.465 43.583
3.395 7.307 26.924 39.334 43.539

From the benchmark results above we see that:

  1. Neuroph 2.5 is faster than JOONE, both matrix and object based implementations. Object based implementation is about 30% faster, while the matrix based is about 5-6 times faster.

  2. The Encog is still the fastest, but Neuroph matrix implementation is getting close to it. The Encog is about 2-3 times faster compared to the matrix based Neuroph implementation, and about 9 times faster compared to the object based implementation.

Conclusion: Neuroph 2.5 brings significant performance improvement over version 2.4, but it is still slower then Encog 2.4. Also, important note is that this benchmark did not used multi core support for Encog, which makes it even faster (see this for more details). The Neuroph still does not support multi cores.

Neuroph Design and Performance Analysis

While the Neuroph philosophy to be intuitive, easy to use, and follow strong object model which corresponds to domain model has been successful so far, it is obvious that price is paid in the terms of performance. So, now we need a way to keep the current architecture and, at the same time, to provide performance improvement.
We are going to achieve this by adding a new layer into the current architecture, which should provide high performance calculation and, at the same time, allow to keep the current API to the end users. The current object model will be transformed into corresponding high performance implementation under the hood. That way it should provide friendly and intuitive API to end users and high performance. We are still discussing this in order to find the best solution, and current matrix based implementation for MultiLayerPerceptron and MomentumBackpropagation (MatrixMultiLayerPerceptron and MatrixMomentumBackpropagation) is example of this.

Neural Network Model Layer
(Rich OO API to create and manipulate neural networks)
High performance (matrix based?) calculation layer

High performance architecture for Neuroph neural networks

Why is Encog Fast?

Encog architecture is already optimized for speed. It is layer based, which means that the basic building components are Layers vs. Neuroph, where basic building components are Neurons (also Connections and Weights). Encog mostly works with simple arrays, and it has even more - something called FlatNetwork, which is a Multi Layer Perceptron converted into few one-dimensional arrays. This makes all operations very fast compared to the basic layered Encog architecture and even faster compared to the Neuroph object model. Also, there is an interesting solution with CalculateGradient, which is used to calculate gradients during learning and which supports multi-threading. Check out the source of these classes to get idea what is going on inside: BasicNetwork, BasicLayer, WeightedSynapse, BasicTraining, Propagation, Backpropagation, CalculateGradient, FlatNetwork, TrainFlatNetworkBackPropagation.

What's Next To Do?

  1. We will continue to optimize existing code, both matrix and object based implementations, until we reach the best possible performance, while preserving the current architecture.

  2. We need to add multi core, GPU and clustering support.

  3. We have to develop new improved algorithms such as ResilientBackpropagation, QuickPropagation, batch mode Backpropagation, Delta bar delta, etc.

  4. We need to improve existing benchmarking code and do some more benchmarking for specific learning rules and data sets. These benchmarks will help us make the real performance comparison since existing benchmark just measures data flow speed. Also, we have some preliminary results which show that recently published benchmark results and comparison (http://www.codeproject.com/KB/recipes/benchmark-neuroph-encog.aspx and http://www.codeproject.com/KB/recipes/xor-encog-neuroph-joone.aspx) can vary for different benchmark parameters, but we need to investigate this in more detail.


Download full NetBeans projects for development version of Neuroph 2.5 alpha and benchmarking code bellow. If you want to play with version 2.5a and experiment with the benchmark, make sure you put reference to appropriate project/jar from benchmark project.



Sensu: workflow automation for monitoring. Learn more—download the whitepaper.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}