Introducing 'Project: Machine Learning in a Box'
Introducing 'Project: Machine Learning in a Box'
The goal of this series is to help you open the black box that a lot of people think is machine learning and help you find out what is inside.
Join the DZone community and get the full member experience.Join For Free
In 2017, the very first machine learning-oriented content based on the SAP Predictive Service was rolled out on the SAP Developer Center with dedicated series of tutorials and a brand new CodeJam topic that was delivered at over 12 locations.
During TechEd, I wasn’t expecting such a success at the Developer Garage with the Machine Learning AppSpace track.
And the feedback I got was consistent and simple: We want more!
Welcome to the First in a Series of Posts About “Project: Machine Learning in a Box”
The goal will be to let you open the “black box” that a lot of people think is machine learning and help you find out what is inside. Let’s see if we can transform that “black box” into a transparent and versatile box for you to use in the future.
The other intent is also to help build your own machine learning “box” and run your experiments and projects.
How Do You Start?
Before we can get you started on this journey with installing products and tools, downloading data, and doing some coding, we need to set the scene and define what machine learning is.
Over the next few weeks, we will discuss the following.
Running a machine learning project requires a methodology just like any other project. And just like with any other project, the coding/modeling represents only a small portion of the overall project duration and effort.
Different Types and Families of Algorithms
We'll help you understand the difference between supervised and unsupervised machine learning and the associated the families of algorithms (association, classification, clustering, regression, amd time series).
Lingo and Terminology
One of the biggest challenges when getting on board with machine learning is to actually understand the lingo. When talking to very intelligent folks with PhDs, they tend to use a very obscure language, which only they understand, so we will try to clarify some of the common terms and concepts.
Platform and Environment
We'll provide details about the platform that will be required to run this series and the associated example. This environment will evolve based on the needs, as, for example, at some point, we will be using some R script or TensorFlow, which will require additional software to be installed. And if there are some requests, we might even look SAP Predictive Analytics at some point.
Spoiler alert: We will be using SAP HANA, Express Edition. I know that I’m lucky to have a 64 GB of RAM machine, and it’s not the case for everyone. But I’ll always consider this constraint with the example and dataset that I’ll choose. It will show you that you don’t need that much to run SAP HANA, Express Edition and still can do and learn some cool stuff.
After that, we will dive into some cool datasets and look at some of the algorithms available out-of-box within the SAP HANA libraries to address the use case:
- SAP HANA Automated Predictive Library (APL) provides a single powerful algorithm for each family of algorithm leveraging the SAP HANA platform resources.
- SAP HANA Predictive Analytics Library (PAL) provides over 90+ industry standard algorithms implemented and optimized for the SAP HANA platform.
We won’t limit ourselves with just the SAP HANA libraries, as we will next look at the Open Source R integration along with External Machine Learning (EML) with TensorFlow.
Is That Going to Be All?
Of course not — I can be really creative! No kidding!
We will then start looking at what are the strategies to go live with a model and building a few apps or extensions to demonstrate how to leverage our model results or capabilities.
But of course, based on your feedback, this can change.
Goals: Get Hands-On Experience and Share Feedback and Knowledge!
This series won’t be just about getting you familiar with the terminology. It will help you understand some of the concepts and theories of machine learning and how, why, when, and where to use machine learning.
I mean, it will be great if you can make sense of all the lingo depicted below, but your goal will be to get practical experience with examples and use cases for you to try. It will also be about sharing!
During this series, I will be promoting existing or new tutorials but also referencing some existing SAP content including openSAP courses (don’t be scared — not the all courses, but just particular units) and external interesting content.
What Skills Do I Need?
I don’t expect everyone to be proficient in everything. I’ll just assume that you have some basic knowledge of statistics and mathematics, in addition to some basic SQL or programming skills.
And if you feel you are missing some of these skills or want to get deeper, no worries. Just ask. There is plenty of valuable content out there.
When Does It Start?
It has already started, and here is the first piece:
There is still a lot of confusion about the differences between data science, machine learning, deep learning, AI, etc. This article tries to describe the main differences:
Data science produces insights.
Machine learning produces predictions.
Artificial intelligence produces actions.
Here's an informative article you may want to check out:
It's another great article that aims to describe the types of data scientists.
How Often Do You Plan to Publish?
My target will be to publish a new piece of content on a weekly basis. The best way to track the series would be to either follow me on:
After publishing a new piece, I will also update the previous publication with the link to the next one. Now, I expect from you to share your thoughts, to contribute, to engage with others, and to share with your friends and colleagues!
Remember: Sharing and giving feedback is caring!
Update: Here are the links to all the Machine Learning in a Box weekly blogs:
- Introducing 'Project: Machine Learning in a Box'
- Machine Learning in a Box (Week 2): Project Methodologies
- Recap Machine Learning in a Box (Week 2): Project Methodologies
- Machine Learning in a Box (Week 3): Algorithms Learning Styles
- Machine Learning in a Box (Week 4): Get Your Environment Up and Running
- Machine Learning in a Box (Week 5): Upload Machine Learning Datasets
- Machine Learning in a Box (Week 6): SAP HANA R Integration
- Machine Learning in a Box (Week 7): Jupyter Notebook
Opinions expressed by DZone contributors are their own.