Over a million developers have joined DZone.

Using Machine Learning to Streamline Drug Production

DZone's Guide to

Using Machine Learning to Streamline Drug Production

A machine learning algorithm was trained using thousands of previous experimental reactions to learn and then predict what a reaction’s main products will be.

· AI Zone ·
Free Resource

Bias comes in a variety of forms, all of them potentially damaging to the efficacy of your ML algorithm. Read how Alegion's Chief Data Scientist discusses the source of most headlines about AI failures here.

I’ve written a number of times recently about the prospects for automation and AI to enhance the research and development of new medicines, whether it’s in supporting more effective research, testing for side effects or in the production of drugs themselves.

A recent paper examines this in more detail, with a specific look at how chemical engineers mass produce the chemical compound itself. There can often be hundreds of different sequences of reactions that produce the same end result, but some will use cheap reagents than others. Equally, some are easier to run continuously.

Efficient Drug Production

The team from MIT used a machine learning algorithm that was trained using thousands of previous experimental reactions to learn and then predict what a reaction’s main products will be.  The system was able to predict the major product of a reaction 72% of the time.

“There’s clearly a lot understood about reactions today,” the authors say, “but it’s a highly evolved, acquired skill to look at a molecule and decide how you’re going to synthesize it from starting materials.”

As with many machine learning applications in healthcare today, the aim is to speed up the process by which that understanding about reactions is obtained. Suffice to say that the algorithm will need to be refined to improve upon its current 72% success rate, but even now, the team believes it can help chemical engineers converge on the best sequence of reactions faster than they do today.

Traditionally, chemists have used computer models to characterize the reactions, but even these usually require scientists to research exceptions themselves, with sometimes more than a dozen of these exceptions required for a single model.

Circumventing the Process

If nothing else, the team hope to circumvent this process. The system was trained using 15,000 observed reactions recorded in patent filings. It was important for the system to learn what didn’t occur as well as what did, so more training fodder was sought. The team generated a number of additional possible products based on the reaction sites. The team then fed the system descriptions of reactions into the algorithm to rank possible products in order of likelihood.

This allowed the system to form a hierarchy of reactions without requiring any form of human input at all. Overall, it provides an interesting approach towards targeted synthesis that a number of drug companies have already expressed an interest in.

“Currently we rely heavily on our own retrosynthetic training, which is aligned with our own personal experiences and augmented with reaction-database search engines,” Novartis says. “This serves us well but often still results in a significant failure rate. Even highly experienced chemists are often surprised. If you were to add up all the cumulative synthesis failures as an industry, this would likely relate to a significant time and cost investment. What if we could improve our success rate?”

Your machine learning project needs enormous amounts of training data to get to a production-ready confidence level. Get a checklist approach to assembling the combination of technology, workforce and project management skills you’ll need to prepare your own training data.

ai ,machine learning ,heatlhcare ,algorithm

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}