Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Using InsightEdge With BigDL for Scalable Deep Learning Innovation

DZone's Guide to

Using InsightEdge With BigDL for Scalable Deep Learning Innovation

Together, these technologies fill a critical gap by creating an intelligent insight platform that makes innovating on real-time advanced analytics apps easy.

· AI Zone
Free Resource

Find out how AI-Fueled APIs from Neura can make interesting products more exciting and engaging. 

Last week, we announced we've joined forces with Intel to simplify artificial intelligence (AI) through an integration between GigaSpaces' InsightEdge platform and Intel's BigDL open-source deep learning library for Apache Spark. The combined solution forms an enhanced insight platform based on Apache Spark, offering a distributed deep learning framework that empowers insight-driven organizations.

The adoption of AI innovations like deep learning is growing rapidly across industries such as financial services, healthcare, transportation, and retail. In addition, the company has expanded its analytics portfolio over the past year to incorporate full-stack analytics (SQL, streaming, machine learning) through Apache Spark. The BigDL and AI portfolio provides an infrastructure-optimized solution for deep learning workloads leveraging Intel® Xeon® Scalable processors. Together, the technologies fill a critical market gap by creating an intelligent insight platform that makes it easy to innovate on real-time advanced analytics applications with low risk and TCO.

Key benefits of the integration include:

    • Cost savings: BigDL eliminates the need for a dense specialized hardware for deep learning, meaning low-cost compute infrastructure using Intel Xeon Scalable processors that can train and run large-scale deep learning workloads without relying on GPUs.
    • Simplicity: Deep learning scenarios are complex and require advanced and complex training workflows. InsightEdge's simplified analytics stack, leveraging BigDL and Apache Spark (open-source and widely adopted) eliminates cluster and component sprawl complexity; radically minimizing the amount of moving parts while capitalizing on existing Spark competency.
    • Scalability: The integration allows organizations to innovate on text mining, image recognition, and advanced predictive analytics workflows from a handful of machines to thousands of nodes in the cloud or on-premises, using the same application assets and deployment lifecycle.
"Harnessing the business value of artificial intelligence is often challenged by the lack of mature compute infrastructure and technology complexity, leading to inefficiency and slower time-to-analytics," says Ali Hodroj, Vice President of Products and Strategy at GigaSpaces. "Our integration with BigDL helps enterprises deploy, manage, and optimize a simplified and comprehensive AI technology stack for automated intelligence without the need for expensive, specialized hardware or complex big data solutions."

The solution will be demonstrated at Intel's booth at the Strata Data Conference, September 26-28, in New York, NY and at the Intel booth at Microsoft Ignite in Orlando, Florida September 25-29 2017. The demo presented will feature:

  • AI-driven customer experience analytics through natural language processing.
  • Unified NLP, deep learning, and search in one simplified Spark distribution.

During the demo, in order to illustrate an enhanced customer experience, customers will speak in their own words to a company's interactive voice response (IVR). The IVR will be able to quickly understand what the customer needs and solve their problem through ML.

"BigDL's efficient large-scale distributed deep learning framework, built on Apache Spark*, expands the accessibility of deep learning to a broader range of big data users and data scientists," says Michael Greene, Vice President, Software and Services Group, General Manager, System Technologies and Optimization, Intel Corporation. "The integration with GigaSpaces' in-memory insight platform, InsightEdge, unifies fast-data analytics, artificial intelligence, and real-time applications in one simplified, affordable, and efficient analytics stack."

In the demo below, we will show you how to combine real-time speech recognition with real-time speech classification based on Intel's BigDL library and InsightEdge.

What Is BigDL?

BigDL is a distributed deep learning library for Apache Spark. You can learn more about deep learning and neural networks on Coursera.

With BigDL it's possible to write deep learning applications as standard Spark programs, thus allowing to leverage Spark during model training, prediction, and tuning. High performance and throughput is achieved with Intel Math Kernel Library. Read more about BigDL here.

Motivation

For example, let's consider a big company with a huge client base that requires organizing call centers. In order to service client correctly, it's vital to direct to the right specialistected. The current demo takes advantage of cutting-edge technologies to resolve such tasks in an effective manner less than in 100ms. Here is a general workflow:

Architecture

Let's take a helicopter view of the application components.

How to Run It

Used software:

  • scala v2.10.4
  • java 1.8.x
  • kafka v0.8.2.2
  • insightedge v1.0.0
  • BigDL v0.2.0
  • sbt
  • maven v3.x

Prerequisites:

  • Download and extract data(first three steps) as described here
  • Set INSIGHTEDGE_HOME and KAFKA_HOME env variables
  • Make sure you have Scala installed: scala -version
  • Change variables according to your needs in runModelTrainingJob.shrunTextPredictionJob.sh, and runKafkaProducer.sh 

The running demo is divided into three parts:

  1. Build project and start components
    • Clone this repo
    • Go to insightedge directory: cd BigDL/insightedge
    • Build the project: sh build.sh
    • Start ZooKeeper and Kafka server: sh kafka-start.sh
    • Create Kafka topic: sh kafka-create-topic.sh; to verify that topic was created, run sh kafka-topics.sh
    • Start Insightedge in demo mode: sh ie-demo.sh
    • Deploy processor-0.2.0-jar-with-dependencies.jar in GS UI
  2. Train BigDL model
  3. Run Spark streaming job with trained BigDL classification model
    • In separate terminal tab, start Spark streaming for predictions: sh runTextClassificationJob.sh
    • Start web server: cd BigDL/web and sh runWeb.sh

Now go to https://localhost:9443:

  1. Click on a microphone button and start talking. Click the microphone button one more time to stop recording and send the speech to Kafka.
  2. Shortly, you will see a new record in the "In-process calls" table. It means that the call is currently processed.
  3. After a while, rows from the "In-process call" table will be moved to the "Call sessions" table. In the column "Category," you can see which category the speech was classified into by the BigDL model. In the column "Time," you will see how much time in milliseconds it took to classify the speech.

Shutting Down

  • Stop kafka: sh kafka-stop.sh
  • Stop Insightedge: sh ie-shutdown.sh

To find out how AI-Fueled APIs can increase engagement and retention, download Six Ways to Boost Engagement for Your IoT Device or App with AI today.

Topics:
ai ,deep learning ,apache spark ,bigdl ,intel ,scalability ,insightedge ,real-time analytics ,tutorial

Published at DZone with permission of Rajiv Shah, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}