DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Machine Learning in a Box (Week 8): SAP HANA EML and TensorFlow Integration

Machine Learning in a Box (Week 8): SAP HANA EML and TensorFlow Integration

Take a look at a tutorial that explains how to get started with the SAP HANA External Machine Learning and TensorFlow integration.

Abdel Dadouche user avatar by
Abdel Dadouche
·
Sep. 30, 18 · Tutorial
Like (1)
Save
Tweet
Share
5.59K Views

Join the DZone community and get the full member experience.

Join For Free

A Quick Recap

Last time, we looked at how to use Jupyter notebooks to run kernels in Python and R and even SQL. I'll be honest, Jupyter is now my new "go to" tool.

I even contributed to the SQLAlchemy for SAP HANA GitHub repository recently to add the support of HDB User Store, which will allow you to connect without providing your user credentials in one of the cells.

I hope you all managed to try this out, and probably some of you already decided to switch from the "good old" Eclipse IDE and its SAP HANA Tools plugin to run your SQL.

I know that I promised to dive into the TensorFlow integration over and over.

All is now available and ready!!! I have fixed the little technical difficulties on my NUC and my Virtual Machines.

I also upgraded to HXE SPS03, which I highly recommend you doing so as well.

For example, SPS03 brings the support of INT64 to SAP HANA EML (else you will need to adjust your models signature and cast your tensors to float).

So, let's get started!

You probably all heard about Google TensorFlow and how it can help you solve many Machine Learning problems especially when it requires the use of neural networks or Deep Learning.

It is true that there are many applications where Deep Learning and neural networks have reached a level of accuracy that now surpasses human capabilities. Also, thanks to hardware evolution and cloud resources, you can now complete the tasks in a fraction of the time that was required a few years back.

I won't pretend that I can explain all the details about the benefits of Deep Learning, neural networks, or even TensorFlow, as there is plenty of content out there from people with much more experience and credits than I have.

So today, my goal will be to help you get started with the SAP HANA External Machine Learning and TensorFlow integration.

Using TensorFlow implies some understanding of the TensorFlow programming concepts but also some Python coding skills. And I really encourage you to have a look at the TensorFlow Get started and Tutorials pages.

About TensorFlow and TensorFlow Serving (a.k.a ModelServer)

TensorFlow ™ is an open-source software library for high-performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices.

Originally developed by researchers and engineers from the Google Brain team within Google's AI organization, it comes with strong support for Machine Learning and Deep Learning and the flexible numerical computation core is used across many other scientific domains.

For more details, you can check the TensorFlow web site.

TensorFlow™ Serving is a flexible, high-performance serving system for Machine Learning models, designed for production environments. It makes it easy to deploy new algorithms and experiments while keeping the same server architecture and APIs. It also provides out-of-the-box integration with TensorFlow models but can be easily extended to serve other types of models and data.

For more details, you can check the TensorFlow Serving web site.

About SAP HANA External Machine Learning (EML)

The integration of Google TensorFlow within SAP HANA is based on the SAP HANA Application Function Library (AFL), meaning that you have now the ability to interact with your TensorFlow models from within SQLScript executed in SAP HANA.

Using Google's gRPC remote procedure call, SAP HANA will access models exported in the SavedModel format from the TensorFlow Serving system.

Here is a quick diagram explaining the interactions:

For more details, you can check the SAP HANA External Machine Learning Library (EML) documentation.

Installing SAP HANA EML, TensorFlow, and TensorFlow Serving

The SAP HANA External Machine Learning (EML) library is part of the SAP HANA, express edition downloadable package, so there is no trick here around its installation. You can simply follow the installation guide.

Regarding TensorFlow and TensorFlow Serving, you can install directly on your SAP HANA, express edition server or any other machine.

But here is the "trick."

If you decide to install TensorFlow Serving on a SUSE Linux Enterprise or Red Hat Enterprise Linux system (which are the officially supported platforms for SAP HANA, express edition), you will need to compile it from the ground.

If you decide to install it on SAP HANA, express edition downloadable virtual machines, then you are also using SUSE Linux Enterprise, so you will need to compile it too.

And if you want to install it on Debian/Ubuntu distributions, the installation is pretty straightforward, it is as simple as something like:

apt-get install tensorflow-model-server

For detailed step by step setup instructions, I produced the following tutorial that guides you through the process:

This tutorial cover Debian/Ubuntu distributions, SUSE Linux Enterprise and Red Hat Enterprise.

The TensorFlow SavedModel Format Explained

When exposing your TensorFlow models in TensorFlow Serving for SAP HANA consumption, you need to save them using the SavedModel format as documented in the SAP HANA EML documentation.

You need to pay a particular attention to the model Signature definition, especially the shapes used for the input and output elements.

The following 2 examples will highlight some of the common situations you will need to address when using content available on the TensorFlow site.

Image Retraining

For the Image Retraining scenario, here is the default signature definition that will be generated when using the provided retrain script:

The given SavedModel SignatureDef contains the following input(s):
  inputs['image'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 299, 299, 3)
      name: Placeholder:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['prediction'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 5)
      name: final_result:0
Method name is: tensorflow/serving/predict

You can see that there is an input tensor ( and an output tensor (prediction).

The input tensor has the following shape (-1, 299, 299, 3), which a rank 4 shape, a 4 dimensions input.

In other words, you can represent the input as a vector where each vector entry is a 299 by 299 table (representing the 299 by 299 pixels of the input image) where each table cell represents a vector of 3 floats elements (representing the 3 RGB colors).

You can check the following link for more details about Tensor dtype and shape.

However, SAP HANA EML only support input tensors of rank two at most which match a table or matrix form.

Therefore, I've added some steps in my tutorial to explain how to add a series of steps that process a raw image blob represented as a string and transforms it into the expected shape.

The following signature will then be generated:

The given SavedModel SignatureDef contains the following input(s):
  inputs['inputs'] tensor_info:
      dtype: DT_STRING
      shape: (-1)
      name: RawJPGInput:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['classes'] tensor_info:
      dtype: DT_STRING
      shape: (-1, 5)
      name: index_to_string_Lookup:0
  outputs['scores'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 5)
      name: TopKV2:0
Method name is: tensorflow/serving/predict

With this signature, to process a SAP HANA EML call for this model, you will provide:

  • one input table/view with one column that represents the image raw blob
  • one output table with ten float columns (5 for the classes and 5 for the scores)

Iris Classification Problem

With the Iris classification problem, the original script doesn't actually include a save function.

In most content available online, you will be advised to save your models using the "parse example" API and define the serving input receiver like this:

def serving_input_receiver_fn():
    feature_spec = tf.feature_column.make_parse_example_spec(feature_columns)
    return tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)()

At the end, the input signature will end up like this:

  inputs['inputs'] tensor_info:
      dtype: DT_STRING
      shape: (-1)
      name: input_example_tensor:0

The Iris model actually uses four input floats to represent the Petal and Sepal width and length.

Where did they go if now it expects only one string?

Here is the saved model graph for the Iris model with the "parse example" API:

When using the "parse example" API, it assumes that you will be using tf.train.Example Protobuf objects as an input.

A tf.train.Example Protobuf object contains a Features object, which contains a map of Feature, which is a list float (FloatList), byte (ByteList), or integer (Int64List).

The tf.train.Example Protobuf objects are not supported by SAP HANA EML.

Instead, you will need to save the model using the raw tensors and produce a model graph that looks like this instead:

You can achieve that with a piece of code like this:

# Define the input receiver spec
feature_spec = {
  'PetalLength': tf.placeholder(dtype=tf.float32, shape=[None,1], name='PetalLength'),
  'PetalWidth' : tf.placeholder(dtype=tf.float32, shape=[None,1], name='PetalWidth'),
  'SepalLength': tf.placeholder(dtype=tf.float32, shape=[None,1], name='SepalLength'),
  'SepalWidth' : tf.placeholder(dtype=tf.float32, shape=[None,1], name='SepalWidth'),
}
# Define the input receiver for the raw tensors
def serving_input_receiver_fn():
  return tf.estimator.export.build_raw_serving_input_receiver_fn(feature_spec)()

Which will end up generating the following signature:

The given SavedModel SignatureDef contains the following input(s):
  inputs['PetalLength'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 1)
      name: PetalLength:0
  inputs['PetalWidth'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 1)
      name: PetalWidth:0
  inputs['SepalLength'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 1)
      name: SepalLength:0
  inputs['SepalWidth'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 1)
      name: SepalWidth:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['predicted_class_id'] tensor_info:
      dtype: DT_INT64
      shape: (-1, 1)
      name: dnn/head/predictions/ExpandDims:0
  outputs['probabilities'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 3)
      name: dnn/head/predictions/probabilities:0
Method name is: tensorflow/serving/predict

With this signature, to process a SAP HANA EML call for this model, you will provide:

  • four distinct input tables/views with one column each that represents the Petal and Sepal width and length (it will be using the tensor alphabetical order, as in the signature order)
  • one output table with four columns, one integer column for the predicted class id and three floats for the classes probability

As you can notice, each input is using a distinct table/view, whereas all the outputs are stored into a single table.

Note: DT_INT64 type is not supported with SPS02 or prior, you will need to cast it to a DT_FLOAT type

Serve a TensorFlow Model in SAP HANA, Express Edition

Now, let's deploy a TensorFlow model and consume it in SAP HANA, express edition!

Actually, it's not going to be one, but two models that I have content available for. One that uses the well know Iris Flowers dataset and one that process Flowers images.

The idea with the second model was to demonstrate how a blob representing an image stored in SAP HANA could be processed thru the EML library.

And here are the tutorial links:

You can, of course, use a different set of images to retrain your ImageNet model.

Executing an EML call is no different from other AFL call. Here is the example for Iris:

SET SCHEMA EML_DATA;

-- Define table types for iris
CREATE TYPE TT_IRIS_PARAMS                AS TABLE ("Parameter" VARCHAR(100), "Value" VARCHAR(100));
CREATE TYPE TT_IRIS_FEATURES_SEPALLENGTH  AS TABLE (SEPALLENGTH   FLOAT);
CREATE TYPE TT_IRIS_FEATURES_SEPALWIDTH   AS TABLE (SEPALWIDTH    FLOAT);
CREATE TYPE TT_IRIS_FEATURES_PETALLENGTH  AS TABLE (PETALLENGTH   FLOAT);
CREATE TYPE TT_IRIS_FEATURES_PETALWIDTH   AS TABLE (PETALWIDTH    FLOAT);
CREATE TYPE TT_IRIS_RESULTS   AS TABLE (
    -- when SPS02 or prior, make PREDICTED_CLASS_ID type FLOAT instead of INTEGER
    PREDICTED_CLASS_ID INTEGER,
    PROBABILITIES0 FLOAT,  PROBABILITIES1 FLOAT,  PROBABILITIES2 FLOAT
);
-- Create description table for procedure creation
CREATE COLUMN TABLE IRIS_PROC_PARAM_TABLE (
    POSITION        INTEGER,
    SCHEMA_NAME     NVARCHAR(256),
    TYPE_NAME       NVARCHAR(256),
    PARAMETER_TYPE  VARCHAR(7)
);
-- Create the result table
CREATE TABLE IRIS_RESULTS LIKE TT_IRIS_RESULTS;

-- Drop the wrapper procedure
CALL SYS.AFLLANG_WRAPPER_PROCEDURE_DROP(CURRENT_SCHEMA, 'IRIS');

-- Populate the wrapper procedure parameter table
INSERT INTO IRIS_PROC_PARAM_TABLE VALUES (1, CURRENT_SCHEMA, 'TT_IRIS_PARAMS'               , 'in');
INSERT INTO IRIS_PROC_PARAM_TABLE VALUES (2, CURRENT_SCHEMA, 'TT_IRIS_FEATURES_PETALLENGTH' , 'in');
INSERT INTO IRIS_PROC_PARAM_TABLE VALUES (3, CURRENT_SCHEMA, 'TT_IRIS_FEATURES_PETALWIDTH'  , 'in');
INSERT INTO IRIS_PROC_PARAM_TABLE VALUES (4, CURRENT_SCHEMA, 'TT_IRIS_FEATURES_SEPALLENGTH' , 'in');
INSERT INTO IRIS_PROC_PARAM_TABLE VALUES (5, CURRENT_SCHEMA, 'TT_IRIS_FEATURES_SEPALWIDTH'  , 'in');
INSERT INTO IRIS_PROC_PARAM_TABLE VALUES (6, CURRENT_SCHEMA, 'TT_IRIS_RESULTS'              , 'out');

-- Create the wrapper procedure
CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE('EML', 'PREDICT', CURRENT_SCHEMA, 'IRIS', IRIS_PROC_PARAM_TABLE);

-- Create and populate the parameter table
CREATE TABLE IRIS_PARAMS  LIKE TT_IRIS_PARAMS;
INSERT INTO IRIS_PARAMS   VALUES ('Model', 'iris');
INSERT INTO IRIS_PARAMS   VALUES ('RemoteSource', 'TensorFlow');
INSERT INTO IRIS_PARAMS   VALUES ('Deadline', '10000');

-- Create the input views
CREATE VIEW IRIS_FEATURES_SEPALLENGTH  AS SELECT SEPALLENGTH  FROM TF_DATA.IRIS_DATA ORDER BY ID;
CREATE VIEW IRIS_FEATURES_SEPALWIDTH   AS SELECT SEPALWIDTH   FROM TF_DATA.IRIS_DATA ORDER BY ID;
CREATE VIEW IRIS_FEATURES_PETALLENGTH  AS SELECT PETALLENGTH  FROM TF_DATA.IRIS_DATA ORDER BY ID;
CREATE VIEW IRIS_FEATURES_PETALWIDTH   AS SELECT PETALWIDTH   FROM TF_DATA.IRIS_DATA ORDER BY ID;

-- Call the TensorFlow model
CALL IRIS (IRIS_PARAMS, IRIS_FEATURES_PETALLENGTH, IRIS_FEATURES_PETALWIDTH, IRIS_FEATURES_SEPALLENGTH, IRIS_FEATURES_SEPALWIDTH, IRIS_RESULTS) WITH OVERVIEW;

Conclusion

Adding TensorFlow to our stack opens a huge set of capabilities including the ability to process images, videos and other type of "unstructured" content like text.

You can also use it for let's say more classic models like with the Iris dataset.

Again ( and sorry for repeating myself), what you really to pay attention to is the signature input and output elements where the SAP HANA EML enforce some restrictions with the shape dimensions on the both the input and output.

A huge thank you to Burkhard Neidecker-Lutz, Former Member and Christoph Morgen from the SAP HANA engineering team for their support when producing this content.

Machine learning TensorFlow Integration

Published at DZone with permission of Abdel Dadouche, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • What Is JavaScript Slice? Practical Examples and Guide
  • Implementing PEG in Java
  • 5 Steps for Getting Started in Deep Learning
  • Introduction Garbage Collection Java

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: