Learning Deep Learning: A Tutorial on KNIME Deeplearning4J Integration
Learning Deep Learning: A Tutorial on KNIME Deeplearning4J Integration
Are you ready to take your first steps with deep learning? In this in-depth tutorial, learn how to create a simple deep learning network for image recognition.
Join the DZone community and get the full member experience.Join For Free
The most visionary programmers today dream of what a robot could do, just like their counterparts in 1976 dreamed of what personal computers could do. Read more on MistyRobotics.com and enter to win your own Misty.
The aim of this blog post is to highlight some of the key features of the KNIME Deeplearning4J (DL4J) integration and help newcomers to either deep learning or KNIME to be able to take their first steps with deep learning in KNIME Analytics Platform.
If you’re new to KNIME, here is a link to get familiar with the KNIME Analytics Platform. If you’re new to Deep Learning, there are plenty of resources on the web, but this link and this link worked well for me. If you are new to the KNIME nodes for deep learning, you can read more in the relevant section of the Node Guide.
With a little bit of patience, you can run the example provided in this blog post on your laptop since it uses a small dataset and only a few neural net layers.
We will use the MNIST dataset. The MNIST dataset consists of handwritten digits from 0 to 9 as 28x28 pixel grayscale images. There is a training set of 60,000 and a test set of 10,000 images. The data are available here.
Our workflow downloads the datasets, decompresses them, and converts them to two CSV files: one for the training set and one for the test set. We then read in the CSV files, convert the image content to KNIME image cells, and then use the KNIME DL4J nodes to build a variety of classifiers to predict which number is present in each image.
We aim at an accuracy of >95% according to the error rates listed in the original article by Le Cun et al.
The workflows built throughout this blog post are available on the KNIME EXAMPLES Server under 04_Analytics/14_Deep_Learning/14_MNIST-DL4J-Intro. This workflow group consists of:
- The workflow named DL4J-MNIST-LeNet-Digit-Classifier.
- The “data” workflow group to contain the downloaded files.
- The “metanodes” workflow group, which contains the three metanode templates used in the example.
The workflow named DL4J-MNIST-LeNet-Digit-Classifier (Figure 1) actually consists of two workflows: the top one uses a simpler net architecture, while the lower one uses a five-layer net architecture.
We ran the whole experiment using a KNIME Cloud Analytics Platform running on an Azure NC6 instance. Of course, you can equally run it on an Amazon Cloud p2.xlarge instance or your local machine.
Beware: This workflow could take around an hour to run depending on whether you have a fast GPU and how powerful your machine is!
Here are the tools, extensions, and more that you'll need.
- KNIME Analytics Platform 3.3.1 (or greater) on your machine, KNIME Cloud Analytics Platform on Azure Cloud, or KNIME Cloud Analytics Platform on AWS Cloud
- Python 2.7.x configured for use with KNIME Analytics Platform
- KNIME Deeplearning4J extension from KNIME Labs Extensions/KNIME Deeplearning4J Integration (64-bit only)
- KNIME Image Processing extension from the KNIME Community Contributions: Image Processing and Analysis
- KNIME Image Processing: Deep Learning 4J Integration
- Vernalis KNIME Nodes from KNIME Community Contributions: Cheminformatics
- KNIME File Handling Nodes and KNIME Python Integration from KNIME and Extensions
If you are running KNIME Analytics Platform on your machine:
- KNIME Image Processing: Deeplearning4J Integration from the Stable Community Contributions update site (note that this update site must be manually enabled)
Optionally, if you have GPUs:
Importing the Image Data
Often when working with images, it is possible to read them directly in KNIME Analytics Platform from standard formats like PNG, JPG, or TIFF. Unfortunately for us, the MNIST dataset is only available in a non-standard binary format. Luckily, it is straightforward to download the dataset and convert the files to a CSV format that can be easily read into KNIME.
The data import is handled by the Download dataset and convert to CSV metanode. Here, the data files are downloaded from the LeCunn website, written to the folder named data, and contained in the workflow group 14_MNIST-DL4J-Intro that you have downloaded from the EXAMPLES server.
To extract the pixel values for each image, we use a Python Source node to read the binary files and output to two CSV files (
mnist_train.csv). We have implemented an
IF statement that only downloads and converts the files if the
mnist_train.csv files do not already exist; there’s no sense doing that download twice!
Figure 2: Content of the metanode named "Download dataset and convert to CSV," which downloads the data files and writes them to a local "data" folder, then through a Python Source node converts the binary pixel content into a CSV file.
Now, we have numerical columns representing images. In the wrapped metanode named Normalize images (train) (Figure 3), a File Reader reads the numerical columns and normalizes them with a Normalizer node.
The conversion back into binary images is obtained via the Data Row to Image node from the KNIME Image Processing extension. The Deeplearning4J (DL4J) integration in KNIME can handle numbers, strings, collections, or images (when the KNIME Image Processing: Deep Learning 4J extension is installed from the stable community update site) as input features.
It is important to randomize the order of the input rows in order to not bias the model training with the input sequence structure. For that, we used the Shuffle node.
Figure 3: The content of the "Normalize images (train)" wrapped metanode. Notice the execution in streaming mode and the transformation output port for the Normalization model.
The normalization model produced by the Normalizer node is exported from the Wrapped Metanode. We do this so that we can re-apply the same normalization to the test dataset in the wrapped metanode named Normalize images (test).
First Try: A Simple Network
In addition to the typical KNIME Learner/Predictor schema, the DL4J Learner node requires a network architecture as input for the learning process (Figure 4 vs. Figure 5).
Figure 4: Classic Learner/Predictor schema in KNIME Analytics Platform. First the Learner, then the Predictor. That is all you need.
Figure 5: Deep Learning Learner/Predictor schema. The Learner node also requires a neural network architecture as input.
There are two ways to define a network architecture:
- Select one from some well-known pre-built network architectures available under KNIME Labs/Deep Learning/Networks in the Node Repository.
- Build your own neural architecture from scratch.
Figure 6: Deep Learning/Networks sub-category contains a number of pre-built commonly used deep learning architectures.
Since we are experimenting, we will build our own network. We’ll start with a toy network.
We start with the DL4J Model Initializer node. We don’t need to set any options for this node. Next, we introduce the Dense Layer. This time we need to set some options, but let’s stick with the default options for now, which creates the output layer with only one output unit to represent the numbers from 0 to 9, activation function ReLu, random weight initialization according to the XAVIER strategy, and a low learning rate value as 0.1. We have created the simplest possible (and not very deep!) neural network.
Now, we can link our training set and the simple neural network architecture to the DL4J Feedforward Learner (Classification) node. This learner node needs configuration.
The configuration window of the DL4J Feedforward Learner (Classification) node is somewhat complex since it requires settings in five configuration tabs: Learning Parameter, Global Parameter, Data Parameter, Output Layer Parameter, and Column Selection. In general, there are many options available to set, the Deep Learning 4J website has some nice hints to help people get started.
The first two tabs, Learning Parameter and Global Parameter, define the learning parameters used to train our network. Here, since we are just getting started at this stage, I accept the default options. The defaults are to use Stochastic Gradient Descent for optimizing the network. Nesterovs is the updater, with a momentum of 0.9. We don’t set any global parameters, which would override those parameters set in the individual network layer node configuration dialogs. We’ll work on tuning the learning parameters in the LeNet workflow that we’ll get to shortly.
The third tab, Data Parameter, defines how data is used to train the model. Here, I set Batch Size to 128, Epochs to 15, and Image Size to 28,28,1. Batch size defines the number of images that are passed through the network and used to calculate the error before updating the network. Larger batch sizes mean longer to wait between each update, but also give the possibility of learning more information with each iteration. Epochs describe the number of full passes of the dataset that are made, choices here can help to guard against under/over-fitting of the data. The image size is the size of the image in pixels (x,y,z).
Figure 7: The "Data Parameter" tab in the configuration window of the DL4J Feedforward Learner (Classification) node. Batch Size defines the number of images used for each network parameter update and is set to 128 as a trade off between accuracy and speed. Epochs are set to 3 since, in this case, we want the example to run for only a short time. Image Size defines the size of the image on the x, y, z axis in number of pixels.
The Column Selection tab contains all information about input columns and target column. The target column is set to column Target, which contains the number represented in the image. The image column named AggregatedValues is used as the input feature.
Figure 8: "Column Selection" tab in the configuration window of the DL4J Feedforward Learner (Classification) node. This tab sets the target column and the input columns.
Finally, connecting up the corresponding Predictor and the Scorer nodes, we can test the model quality (see upper workflow in Figure 1).
To train our first not-so-deep learning model, we need to execute the DL4J Feedforward Learner (Classification). The execution of this node can take some time (probably more than 10 minutes). However, it is possible to monitor the learning progress, and even to terminate it early, if a suitable model has already been reached. Right-clicking DL4J Feedforward Learner (Classification) and selecting View: Learning Status from the context menu displays a window including the current training epoch and the corresponding Loss (=Error) calculated on the whole training set (Figure 9). If the loss is sufficient for our purpose or if we have become impatient, we can hit the Stop Learning button to stop the training process.
Once the calculation is complete, you can execute the Scorer node to evaluate the model accuracy (Figure 10).
Figure 9: Learning Status window for the DL4J Feedforward Learner (Classification) node. This window is open by right-clicking the node and selecting the option "View: Learning Status." Here, you can monitor the learning progress of your deep learning architecture. You can also stop it at any moment by hitting the “Stop Learning” button.
Figure 10: Confusion Matrix and Accuracy of the single-layer neural network trained on the number data set to recognize numbers in images. Notice the disappointing ~35% Accuracy. A single-layer network is not enough?
Did you notice the accuracy just a little above 35%? That was a little disappointing! But not entirely unexpected. We didn’t spend any time optimizing the input parameters since we’re not aiming to evaluate what the optimal network architecture is; rather, we're aiming to see how easy it is to reproduce one of the more well known complex architectures. It is well known that deep learning networks often require several layers and careful optimization of input parameters. So in order to go a bit deeper, in the next section, we’re going to take the LeNet network that has been pre-packaged in the Node Repository and use that.
Something Closer to What Is Described by Lecunn Et Al.
We can quickly import a well-known architecture that has been shown to work well for this problem by dragging and dropping the LeNet metanode from the Node Repository into the workspace. Double-clicking LeNet lets us take a look at the network topology. We see that there are now five layers defining the network (Figure 11).
The process of building the network architecture is triggered again by a DL4J Model Initializer node, requiring no settings. We then add a Convolution Layer (which applies a convolution between some filter with defined size to each pixel in the image), a Pooling Layer (pooling layers reduce the spatial size of the network, in this case, halving the resolution at each application), then again a Convolution Layer, a Pooling Layer, and a Dense Layer (neurons in a dense layer have full connections to all outputs of the previous layer). The result is a five-layer neural network with mixed types of layers.
Figure 11: LeNet neural network architecture as built in the "LeNet" metanode.
Finally, we make a few more changes in order to closely match the parameters originally described in the article by LeCun et al. That means setting the learning rate to 0.001 in the DL4J Feedforward Learner (Classification) node. The Output Layer parameters are 0.1 learning rate, XAVIER weight initialization, and Negative Log Likelihood loss function.
Evaluating the results, we can clearly see that adding the layers and tweaking the parameters has made a huge difference in the results. We can now predict the digits with 98.71% accuracy!
Figure 12: Confusion Matrix and Accuracy of a neural network shaped according to the LeNet architecture, that is introducing five hidden mixed type layers in the network architecture. The network is trained again on the number data set to recognize numbers in images. Now, we get almost 99% accuracy. This is much closer to the performances obtained by LeCun et al.
Deep learning is a very hot topic in machine learning at the moment, and there are many, many possible use cases. However, you’ll need to spend some time to find the right network topology for your use case and the right parameters for your model. Luckily, the KNIME Analytics Platform interface for DL4J makes setting those models up straightforward.
What’s more, the integration with KNIME Image Processing allows you to apply Deep Learning to image analysis, and using the power of GPUs in the cloud, it might not take as long as you think to get started.
Perhaps, more importantly than that, it is also easy to deploy those models using the WebPortal functionality of the KNIME Server, but that discussion is for another blog post…
- The workflow used for this blog post is available on the KNIME EXAMPLES Server at 04_Analytics/14_Deep_Learning/14_MNIST-DL4J-Intro
Published at DZone with permission of Jon Fuller , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.