What Developers Need to Know About Machine Learning in the SDLC
Execute the fundamentals of software development, understand how ML differs from code and application development, and always know the business problem to be solved.
Join the DZone community and get the full member experience.
Join For FreeTo learn about the current and future state of machine learning (ML) in software development, we gathered insights from IT professionals from 16 solution providers. We asked, "What do developers need to keep in mind when using machine learning in the SDLC?" Here's what we learned:
You might also like: How Machine Learning Will Affect Software Development
Fundamentals
The biggest issue for ML is viewing it as an omnipotent savior of the SDLC, thereby negating the need to adhere to traditional SDLC design and protocol. ML can greatly improve efficiency and allow developers to better allocate their time to actions that require human input. It cannot, however, completely take the place of conscientious, diligent and thoughtful software planning, design, development, and version control.
Machine Learning Is Different From Code
Use the tools provided by public cloud providers. They are self-paced, painless, and you can get certified. ML is the opposite of writing code. It’s constantly evolving models based on need training and new data. Think more like a skeptical scientist where you’re always trying to disprove and improve.
The process in the ML applications in the SDLC is less deterministic so it relies quite a bit on the ML engineer’s skill to be able to structure data and algorithms in the right way. The introduction of bias is something developers have to be mindful of. Developers need to keep in mind that the machines only know what they are taught and learn from the data we feed them. If the data is bias the machine will be bias – there are countless examples of where machines didn’t recognize faces, accents, or genders they had not been exposed to through the data they learned from.
Another risk lies in building a product that is less than optimal that will keep promoting itself in your code and SDLC. This is much more prevalent than in a traditional coding environment, where incorrect outputs can manifest themselves quickly and often under the radar. The importance is to figure out the guard rails, how you ensure that you are getting to the right outcomes, and then how you ensure that machine knowledge is transferable and gets better over time rather than getting worse.
Building inclusivity into the desired outcomes is key and not losing focus on optimization through build all the way to post-production is key. Feedback will form a large part of this process - collecting, interpreting and incorporating it back into the data science is where we will find real success.
Know the Business Problem You Are Solving
Look at the big picture. How can you help solve the challenges of what you’re encountering? Where can AI help you solve those problems? That will help your career and your company. Understand what your customers are doing. How can we use AI to help them? There’s a lot of value in end-to-end testing to improve the overall quality of what you are producing. Code is moving to the front.
No monitoring models keep in mind the accuracy and effectiveness and computational performance metrics. The cost of an infection in a hospital is $10,000 versus the cost of intervention $100. The cost of a false positive is much less than a false negative. Consider the business problem you are working on. Auto null gives engineers the opportunity to try this on their own and collaborate with data science to make them more productive.
Other
Use of both classical algorithms and ML models will often prove successful when used in complement to one another.
Don’t use ML when it’s not needed. It has a lot of disadvantages. If you’re smart enough to write maintain code do it. When you cannot you will go to ML. It’s statistical, requires a lot of data, it’s not completely accurate. Don’t try to solve the hard problem in-house. You should buy a tool. Be aware that it’s not 100% accurate and know the cost to compensate.
There’s a lot of hype and noise. Tools and techniques are important for data scientists. Data scientists need to understand the business problem you are solving. Data scientists over time become developers. More focused on scripting. The roles are getting merged. Statisticians have picked up Python and scripting. Picking up frameworks. Titles blend as tools blend. The ease of data engineering will lead to a more blended role. The people with the business context and the ability to implement will make the biggest impact. You no longer need to know how to manage a Hadoop cluster.
Developers should ensure they select the right tooling and techniques, as well as be mindful of project complexity. Picking the right ML algorithms and areas where to employ them play a vital role. Knowing the inherent underpinnings of their development ecosystem, the choice of appropriate algorithms, supervision of the ML efficiencies and tooling are areas where developers need to pay closer attention in building a smart development ecosystem. The SDLC can become a complex assembly of multiple moving parts, numerous ecosystem components, and programming components.
In my current product construction for e.g., modules are built in C++, some in Java, some in Python, some in JavaScript, a number of third-party libraries, etc. Additionally, various testing techniques are needed for different modules and different functionalities. Different ML algorithms and training sets need to be designed into the SDLC process. It is essential to be aware of the complexity and select the tools and techniques that work for your specific ecosystem.
Understand the ML model and the lack of predictability. Develop a partnership with a data scientist on your team and become comfortable with data to be able to understand what apps are doing.
There are tradeoffs in modeling between bias and variance, underfit or overfit, only able to analyze one piece of data. Two forces to understand and find the right balance in the model.
Here’s who we heard from:
- Dipti Borkar, V.P. Products, Alluxio
- Adam Carmi, Co-founder & CTO, Applitools
- Dr. Oleg Sinyavskiy, Head of Research and Development, Brain Corp
- Eli Finkelshteyn, CEO & Co-founder, Constructor.io
- Senthil Kumar, VP of Software Engineering, FogHorn
- Ivaylo Bahtchevanov, Head of Data Science, ForgeRock
- John Seaton, Director of Data Science, Functionize
- Irina Farooq, Chief Product Officer, Kinetica
- Elif Tutuk, AVP Research, Qlik
- Shivani Govil, EVP Emerging Tech and Ecosystem, Sage
- Patrick Hubbard, Head Geek, SolarWinds
- Monte Zweben, CEO, Splice Machine
- Zach Bannor, Associate Consultant, SPR
- David Andrzejewski, Director of Engineering, Sumo Logic
- Oren Rubin, Founder & CEO, Testim.io
- Dan Rope, Director, Data Science and Michael O’Connell, Chief Analytics Officer, TIBCO
Further Reading
Opinions expressed by DZone contributors are their own.
Trending
-
Essential Architecture Framework: In the World of Overengineering, Being Essential Is the Answer
-
Understanding Dependencies...Visually!
-
The SPACE Framework for Developer Productivity
-
Implementing a Serverless DevOps Pipeline With AWS Lambda and CodePipeline
Comments