DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Scoring Machine Learning Models at Scale [Video]

Scoring Machine Learning Models at Scale [Video]

This video shows a demo of MemSQL and Apache Spark for entity resolution and fraud detection across a dataset composed of a huge group of people.

Mason Hooten user avatar by
Mason Hooten
·
May. 06, 17 · Big Data Zone · Presentation
Like (1)
Save
Tweet
6.06K Views

Join the DZone community and get the full member experience.

Join For Free

At Strata+Hadoop World, MemSQL Software Engineer John Bowler shared two ways of making production data pipelines in MemSQL:

  1. Using Spark for general purpose computation.

  2. Through a transform defined in MemSQL pipeline for general purpose computation.

In the video below, John runs a live demonstration of MemSQL and Apache Spark for entity resolution and fraud detection across a dataset composed of a hundred thousand employees and fifty million customers. John uses MemSQL and writes a Spark job along with an open source entity resolution library called Duke to sort through and score combinations of customer and employee data.

MemSQL makes this possible by reducing network overhead through the MemSQL Spark Connector along with native geospatial capabilities. John finds the top 10 million flagged customer and employee pairs across 5 trillion possible combinations in only three minutes. Finally, John uses MemSQL Pipelines and TensorFlow to write a machine learning Python script that accurately identifies thousands of handwritten numbers after training the model in seconds.

About the speaker: John Bowler is a Software Engineer at MemSQL. John has a background in machine learning, algorithms, and distributed data warehouses. John is a graduate of MIT who previously interned at SpaceX where he helped write control algorithms for the SuperDraco rocket engine.

Machine learning

Published at DZone with permission of Mason Hooten, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Comparing Distributed Databases
  • The Definitive Guide to Building a Data Mesh With Event Streams
  • Setting Up a Dedicated Database Server on Raspberry Pi
  • Understand Source Code — Deep Into the Codebase, Locally and in Production

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo