Predicting Cancer Type With KNIME Deep Learning and Keras
Predicting Cancer Type With KNIME Deep Learning and Keras
In this post, I'll take a dataset of images from three different subtypes of lymphoma and classify the image into the (hopefully) correct subtype.
Join the DZone community and get the full member experience.Join For Free
In my previous blog post, Learning Deep Learning, I showed how to use the KNIME Deep Learning/DL4J Integration to predict the handwritten digits from images in the MNIST dataset. That's a neat trick, but it's a problem that has been pretty well-solved for a while. What about trying something a bit more difficult? In this post, I'll take a dataset of images from three different subtypes of lymphoma and classify the image into the (hopefully) correct subtype.
KNIME Deep Learning/Keras Integration brings new deep learning capabilities to KNIME Analytics Platform. You can now use the Keras Python library to take advantage of a variety of different deep learning backends. The new KNIME nodes provide a convenient GUI for training and deploying deep learning models while still allowing model creation/editing directly in Python for maximum flexibility.
The workflows mentioned in this post require a fairly heavy amount of computation (and waiting), so if you're just looking to check out the new integration, see the simple workflow here that recapitulates the results of the previous blog post using the new Keras integration. There are quite a few more example workflows for both DL4J and Keras, which can be found in the relevant section of the Node Guide.
Right, back to the challenge. Malignant lymphoma affects many people, and among malignant lymphomas, CLL (chronic lymphocytic leukemia), FL (follicular lymphoma), and MCL (mantle cell lymphoma) are difficult for even experienced pathologists to accurately classify.A typical task for a pathologist in a hospital would be to look at those images and make a decision about what type of lymphoma is present. In many cases, follow-up tests to confirm the diagnosis are required. An assistive technology that can guide the pathologist and speed up their job would be of great value. Freeing up the pathologist to spend their time on those tasks that computers can't do so well, has obvious benefits for the hospital, the pathologist, and the patients.
Figure 1: The modeling process adopted to classify lymphoma images. At each stage, the required components are listed.
Getting the Dataset
Since I just have my laptop and Keras runs really fast on modern GPUs, I'll be logging into KNIME Analytics Platform hosted on an Azure N-series GPU. You could, of course, do the same using an AWS equivalent or a fast GPU in your workstation. Full details on how to configure the KNIME Deep Learning/Keras integration are available here.
The full dataset is available as a single tar.gz file containing 374 images. I created a workflow that downloads the file and extracts the images into three sub-directories — one for each lymphoma type. Finally, I created a KNIME table file that stores the path to the image files and labels the image according to the class of lymphoma.
Preparing the Images
Each image has the dimensions 1388 by 1040px and the information required to determine the classification is a property of the whole image (i.e. it's not reliant only on individual cells, which can be the case in some image classification problems). In the next step, we'll use the VGG16 CNN to train a classifier. This model expects image sizes of 64 by 64px, so we'll need to chop the whole images into multiple patches that we'll then use for learning.
Figure 2: Workflow to load and preprocess the image files. The results are small image patches of 64x64px.
The general workflow just splits the input KNIME table into two datasets (train and test). Inside the Load and preprocess images (Local Files) wrapped metanode, we use the KNIME Image Processing extension to read the image file, normalize the full image, and then crop and split the image into 64 by 64px patches. Once again, we write out a KNIME table file, this time containing the image patches and the class information.
Figure 3: The steps to cut a full image into multiple 64x64px image patches, contained in wrapped metanodes named Load and preprocess images (Local Files).
Training the Model
Since developing a completely novel CNN is both difficult and time-consuming, we're going to first try using an existing CNN that has been pre-trained for solving image classification problems. The CNN that we'll use is VGG16, which was originally trained to predict the class of members of the ImageNet dataset containing 1,000 different class labels.
Figure 4: Fine-tune the VGG16 architecture read in the DL Keras Network Reader node by replacing the top layers in the DL Python Network Editor node and training them in the DL Python Network Learner node with the lymphoma image patches created in the previous steps.
None of these class labels were the type of object that we were looking for, but we can easily replace the top layers of our model using the DL Python Network Editor node, and then fine-tune the resulting network for our problem using the DL Python Network Learner node and the ~75,000 patches created from the training set images.
Figure 5: The Python code (in 'DL Python Network Editor* node) to edit the VGG16 network to replace the top-layers.
Once the model is fine-tuned, we evaluate its performance on the test set images. To predict the class of an image, we generate predictions for each of the 64 by 64px patches we split it into, and then combine those predictions using a simple majority voting scheme. Using this approach, we see that our classifier has achieved 96% accuracy (fine-tuning for a few more epochs can push the accuracy to 98%).
Figure 6: Classification performance for the fine-tuned VGG16 network model.
Note that executing this workflow could take around 12 hours. By editing the number of epochs used in the fine-tuning, you can decrease the amount of time required significantly, at the expense of classification performance.
Deploying the Predictions to the Pathologist
We put together a simple KNIME Server WebPortal workflow to allow a pathologist to access those predictions via a Web Browser.
Figure 7: The WebPortal workflow to score new images from a web browser. The wrapped metanode named View Results presents the final results in a summary a web page. The workflow is available on our EXAMPLES Server under 50_Applications/31_Histopathology_Blog_Post
The workflow allows upload of an image file, which is then classified using the deployed model. Finally, the result of the classification is displayed on a page that summarizes the key results. That's one simple way to deploy the results, but it would be equally possible to deploy the model as a REST API which would allow for simple integration into existing tools such as slide viewers.
The workflows presented here give you some idea of how you can tackle image classification problems using KNIME Image Processing and KNIME Deep Learning Keras Integration. There were some great talks at the KNIME Fall Summit 2017 in Austin which showed just how far you can go with image analysis in KNIME Analytics Platform. See for example the talks by Prabhakar R. Gudla (National Cancer Institute, National Institutes of Health) and Andries Zijlstra (Vanderbilt University Medical Center).
Here, we also showed just how easy it is to take those models and deploy the results to multiple end users using the KNIME Server WebPortal.
Published at DZone with permission of Jon Fuller , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.