Unlocking Machine Learning’s Hidden Challenge: Integration
A recent study found that 48 percent of businesses using machine learning have experienced increased profitability as the top benefit.
Join the DZone community and get the full member experience.Join For Free
A recent study by SAP and the Economist Intelligence Unit (EIU) found that 48 percent of businesses using machine learning have experienced increased profitability as the top benefit. Many companies investing in machine learning technology are at the very beginning of the process — just starting their first project or proof of concept. At this stage of implementation, it usually hasn’t emerged yet that data integration is a major challenge to overcome. Businesses will quickly uncover that there’s a lot of effort required to align data management and security concepts across machine learning communication channels. If an organization has more than one channel that it wants to orchestrate and needs to communicate on different channels, there’s a complexity to that, which needs to be solved.
When developing a machine learning platform, mature lifecycle management and version management can often be challenging. Part of the complexity that comes with machine learning is not just adding more software into the enterprise, but aligning different developments with data science and lifecycle management. A machine learning platform must support the end-to-end lifecycle model, which includes functions such as data discovery, feature engineering, iterative model development, model training, and model scoring. In addition to managing the data that’s constantly used to retrain the machine learning models, organizations need to manage versioning and identity management, access control, and security measures.
Reasons Why Organizations Find Themselves Dealing With Data Siloes
Consider a brand that analyzes images for customer insights using machine learning. If the organization detects 40,000 images it’s not just one machine learning model that needs to be trained, but many. The machine learning platform is conducting object detection, image classification, and more. Organizations doing this type of analysis require custom implementation and must ensure data from different sources is available. The data required for various training models can come from different sources — some come from public sources like repositories of product imagery while others can come from internal systems, apps, web, or social channels.
The challenge organizations face with a wide range of data sources is how to define and orchestrate each layer that is required. While it is necessary to create a central repository to align all of the data, it is difficult to achieve. To accomplish this, organizations must find a common abstraction layer where all data can be referenced. For machine learning to be effective, it is critical to ensure data is connected across all channels.
How to Consolidate and Tame Data
With the massive amounts of data made available for machine learning, organizations need to ensure that information is secured and easily accessible. Pending on where in the world the companies are located, it is possible they will encounter legal limitations about how certain data can actually be used. For example, while a specific data set may be legally allowed to conduct predictions, it may be illegal to use it for training on the machine learning models. If data is misused, organizations can face a fine or other legal implications.
Restrictions don’t stop at legal regulations either. There are also technical limitations of the data. For example, if there is a bias coming from the data that is internal or external, the determined outputs are unreliable. While some machine learning algorithms need many compute resources and run in different cloud infrastructures, others can be deployed on-prem. Integrating cloud deployments and on-prem deployments is a complex challenge between these two systems. However, consumption of machine learning functionality should be seamless, and this complication should never be visible to the end-user. Users should not realize they are using a separate system or that functionality is a cloud service. The required integration here is user management, harmonized across on-prem, and cloud deployments.
As organizations continue to explore machine learning, it is important to consider how to solve integration challenges. Even if they are not present at the onset of the first few projects, they will pose an obstacle in the future as more datasets are analyzed. Despite these challenges, it is obvious that applying machine learning is worth the effort. To truly succeed in machine learning, organizations must realize that it is not a single step process. Their willingness to embark on an entire machine learning journey will prove the most valuable and generate meaningful business results.
Opinions expressed by DZone contributors are their own.