How AI and machine learning are augmenting our approach to drug discovery

ORIGINALLY PUBLISHED:
19 December 2022


Written by:

Jon Paul Janet

Associate Principle Scientist, Molecular AI, AstraZeneca

Steven Kiddle

Director Health Data Science, AstraZeneca

Dino Oglić

Head of Research, Centre for AI, AstraZeneca

Machine learning is improving how we generate, screen and evaluate molecules as potential candidate medicines.


The fundamental challenge of drug discovery

The number of possible drug-like chemicals greatly outnumbers the stars in the universe. Developing new medicines therefore requires us to consider very large numbers of potential molecules (the ‘molecular space’) to identify suitable candidates that can then be investigated in more depth for their utility as potential therapeutics. 

To help us narrow our search for lead candidate molecules, we combine approaches such as high-throughput screening and computational chemistry. These processes start with a very large number of molecules and progressively ‘funnel’ the focus towards an ever-smaller number of molecules with the desired medicinal effects and safety profiles.

Studying potential molecules at the beginning of this funnel easily generates a lot of data, but provides limited insights. Conversely, as the analysis progresses along the funnel and focuses on fewer molecules more deeply, it becomes more expensive but provides increasingly meaningful data.

To overcome this challenge, we are continually improving and applying  machine learning models, such as graph neural networks and transformer models, to better understand the potential molecular space we want to investigate and predict the likely chemical properties of candidate drugs.

Using neural networks to predict molecule properties and identify candidate drugs

Given the large numbers of potential molecules, the feasibility of assessing them all for varied properties such as absorption, distribution in the body, metabolism, elimination, efficacy, and safety is almost impossible. Machine Learning tools such as graph neural networks are already in common use to help predict properties of large numbers of molecules.

Our scientists have now paired graph neural networks with ‘transfer learning’ – a type of machine learning strategy that can store knowledge from one task and can then ‘transfer’ these learnings to a different, related problem.1 We have used transfer learning to store the knowledge from datasets that are large and easily generated at the early stages of the drug discovery funnel (but, on their own, provide limited insights) to improve the predictive performance at the later stages of the funnel – where datasets are more expensive to generate but can provide deeper insights.



For the first time, to our knowledge, we have demonstrated how transfer learning with graph neural networks can use data from the full funnel to improve molecular property prediction, enabling scientists to make smarter decisions about which molecules to progress, particularly when there is not a lot of high-quality data available initially. Moreover, our study highlights limitations of standard graph neural networks and proposes solutions that enable transfer learning.

David Buterez PhD Student at University of Cambridge, UK. David undertook this research with funding from Data Science and AI at AstraZeneca.

Embedding data science and AI across our R&D

Data science and AI are helping us to analyse and interpret large quantities of data at all stages of drug discovery and development. By combining traditional chemistry high-throughput screening with AI and machine learning approaches, we are able to study increasingly diverse and promising molecules – helping us better predict their molecular function and suitability as a medicine. By embedding data-driven approaches across our R&D, we are accelerating how we design, develop and make the next generation of therapeutics for patients.


Topics:
 



You may also like


Reference:

1.       Buterez, D., Janet, J.P., Kiddle, S.J. et al. Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting. Nat Commun 15, 1517 (2024). http://doi.org/10.1038/s41467-024-45566-8

 

Veeva ID: Z4-61400
Date of preparation: February 2024