Machine Learning in Software Development — Techniques and Tools
Machine Learning in Software Development — Techniques and Tools
The ability to version-control ML models, automate testing, and provide better feedback.
Join the DZone community and get the full member experience.Join For Free
To learn about the current and future state of machine learning (ML) in software development, we gathered insights from IT professionals from 16 solution providers. We asked, "What machine learning techniques and tools are most effective for the SDLC?" Here's what we learned:
You may also like: The Fundamental Differences Between ML Model Development and Traditional Enterprise Software Development
- MLFlow, Bugspots, Helium, and Appvance are some pretty powerful tools. I particularly like MLFlow for its ease of use and ability to version-control ML models.
- We adopted MLFlow for our data platform — ML data platform management system. Operational database real-time and transactional for in-database ML to track the workflow of the data scientists. If you adopt a culture of experimentation, create 50 experiments a day, each running and producing a different result, you need to keep track of each. You need the ability to tag with parameters and metrics so you can go back and see why one model performed better than another.
- Tools simplify infrastructure and data engineering for developers. With ML an explosion of things needs to happen. Easy integration into the application. Debugging is more difficult because the ML modes are living entities and drift occurs as data and learning changes. The biggest challenge is the debuggability of code and application. Make sure you have the traceability of your model decisions. Model performance evaluation over time.
- The most effective technique is to define the task at hand as clearly as possible and immediately come up with an automatic evaluation method. Following this step, you ought to collect and label a small dataset for your problem, overfit to that dataset with any method, and try to close the whole production loop: dataset collection - training - evaluation - deployment. A majority of the time you’ll realize that your evaluation method is actually not what you had intended for your product, causing you to have to go through these stages again.
- Python by default is the language for scripting the frameworks. There are a lot of models that can be used, or you can build your own. Reinforcement learning (Deep adversarial, Q), semi-supervised and using Closed-loop ML techniques have proven to be beneficial in different phases of SDLC. When organizations build models, the underlying premise is that the model’s accuracy and efficiency are based on certain assumptions and is dependent on the training data set it is privy to. If there is a change in data patterns or unanticipated scenarios, the model’s accuracy and efficiency may diminish over time. For example, in a manufacturing plant, a model can be deployed to detect defects on parts being manufactured and assembled in the assembly line. Over time, the model’s ability to accurately identify the errors may diminish. This results in severe challenges if the software uses traditional analytics exclusively. However, when equipped with closed-loop functionalities, the smart agents can auto-detect and trigger a re-learning and re-training process to improve the accuracy and performance of the models automatically, leading to increased productivity, efficiency and cost-savings. The closed-loop ML technique for the SDLC can use a reinforcement or unsupervised algorithms to train, test and validate ML models to improve accuracy. Post the initial deployment, as needed, the model can self-learn, self-adjust and detect variations in its own accuracy and performance. In short, it will tune itself so that the output is optimal.
- Techniques I am seeing include learning techniques such as concept learning, decision trees, neural networks (and convolutional neural networks), if/then rules, reinforcement learning, inductive logic programming, and the like.
- Here are the main elements:
- 1) Ensuring business requirements and expectations are set from the beginning. This helps define the ROI for the project and what you’re looking to solve for (i.e., better customer engagement, reduce churn, etc.).
- 2) Converting the business problem into a technical problem. This lets you define what data is needed, the approach, where to start, etc. so you can set the scope of the solution. You take the business problem of improving customer satisfaction or gaining market share and you turn it into a data science problem: prediction for customer conversion/customer churn, user segmentation, product recommendation, etc. which is something that you can solve for using data and a model. 3) Establish what data is actually available to solve the problem. This can be one of the biggest limiting factors of applying ML in the SDLC. There needs to be sufficient and relevant data to solve the problem, and there needs to be a base level of normalization. Given the technical problem, you need to identify which entities can be relevant features to plug into the model. 4) Design the rotation process. Given your toolkit, start with the simplest approach possible and see how it performs. Based on those results, you have a sense of direction for where to go and how to add complexity. 5) Experimentation and Quality: Design experiments so you can test performance, make modifications, re-evaluate, then rinse and repeat. Make sure you pick the right metrics, so you measure what really matters.
Here’s who we heard from:
- Dipti Borkar, V.P. Products, Alluxio
- Adam Carmi, Co-founder & CTO, Applitools
- Dr. Oleg Sinyavskiy, Head of Research and Development, Brain Corp
- Eli Finkelshteyn, CEO & Co-founder, Constructor.io
- Senthil Kumar, VP of Software Engineering, FogHorn
- Ivaylo Bahtchevanov, Head of Data Science, ForgeRock
- John Seaton, Director of Data Science, Functionize
- Irina Farooq, Chief Product Officer, Kinetica
- Elif Tutuk, AVP Research, Qlik
- Shivani Govil, EVP Emerging Tech and Ecosystem, Sage
- Patrick Hubbard, Head Geek, SolarWinds
- Monte Zweben, CEO, Splice Machine
- Zach Bannor, Associate Consultant, SPR
- David Andrzejewski, Director of Engineering, Sumo Logic
- Oren Rubin, Founder & CEO, Testim.io
- Dan Rope, Director, Data Science and Michael O’Connell, Chief Analytics Officer, TIBCO
Opinions expressed by DZone contributors are their own.