Seeing What Can’t Be Seen

An important question to ask within the context of today’s drug discovery and development environment is: At what point does the sheer volume and complexity of data available outstrip the human ability to efficiently gather and interpret them? The reality is that the turning point likely happened some time go, but there is a silver lining. As new technology driving warp-speed increases in computational power have made Moore’s Law obsolete, the same advances have enabled the emergence of a scientist’s worthy aide for drug discovery and development: artificial intelligence (AI).

Today, the use of AI is ubiquitous. From large language models (LLMs) such as those driving popular tools like ChatGPT to the behind-the scenes machine learning algorithms that can readily discern cancerous tumors in digital pathology, AI’s impact in the life sciences and healthcare is already being felt.

When it comes to fostering a better understanding of the complexities of human biology, applying AI to drug discovery and development may be the perfect fit. It also comes with the promise of shortening discovery and development timelines while significantly reducing the multi-billion-dollar cost of bringing a new drug to market.

AI leverages multiple technologies to make its impact: LLMs, predictive modeling, simulation, and in silico analysis that is driving improvements from hypothesis generation, target identification, and drug design to clinical trial optimization and drug repurposing. Further, its ability to combine and interpret both traditional clinical data with real-world patient data and unstructured data is unlocking critical insights by uncovering patterns and relationships that would be virtually impossible for human researchers to uncover on their own.

“The days of 12 to 14 years and $2 billion to $3 billion (development) costs for mega blockbuster drugs—I think those days are behind us,” said Mohan Uttarwar, CEO of 1Cell.ai, a precision oncology company leveraging multi omic data to improve cancer care and surveillance via the application of AI. The question is by leveraging AI, “can we now bring that down to three to five years and $300 million to $500 million?”

As it continues to evolve, the question is no longer whether AI will play a role in drug discovery and development, but how deeply and how quickly it will reshape it.

Table of Contents

Use of real-world and multi-omic data

Leveraging AI in drug discovery and development has opened new doors of analysis using both the growing corpus of multi-omic data generated in life sciences research and real-world data like electronic health records (EHRs). But these data come with challenges.

According to Jagdeep Podichetty, PhD, senior director of predictive analytics at the Critical Path Institute (C-Path), a non-profit organization dedicated to improving and streamlining the process of drug development, “This is a secondary use of data, in the sense that it was for a different purpose. In our case, maybe a clinical trial simulation tool or identification of a biomarker,” he said. “That applies to EHR (data) as well, which is mainly for billing purposes for hospitals.”

But although these data might not have fit inside a traditional clinical trial, its value, Podichetty noted, is the lens it shines on the real-world setting to provide information on the variability of human diseases. This expansion to the use of secondary data is particularly important for more complex conditions like neurological diseases. “Just looking at the controlled setting of a clinical trial for complex diseases—it’s not enough,” he said.

But EHR data must first be standardized before it can be applied in this way. “EHR by itself is very messy,” Podichetty explained. “A single word can mean many different things. So you want to be able to map it to something more standardized.” This standardization pipeline is a prerequisite for effective modeling, allowing for large-scale queries across a range of different datasets.

Likewise, the use of multi-omic data for drug discovery is greatly aided by AI. At 1Cell.ai, the focus is on analyzing live circulating tumor cells (CTCs) to develop insights both for the development of diagnostics and target identification for cancer treatments. As Uttarwar noted, “Cancer, as much as it is a disease, it’s a data problem, and multi-omics is exponentially higher in complexity.”

The role of AI in this context is not simply to parse the data but to take disparate types of molecular data, such as whole-genome, transcriptome, and proteome data, and harmonize them into a single view that can accurately depict tumor behavior, tumor origin, and tumor heterogeneity.

Communication story telling diversity, education, message, language, chat, chat bot, artificial intelligence — Credit: Just_Super / Getty Images Plus

Leveraging AI to dynamically curate and analyze these data can accelerate translational research. In 1Cell.ai’s case this is accomplished via the collection of longitudinal data. “We are no longer relying on cell lines. With live CTCs, target identification and even target validation are huge opportunities,” Uttarwar said. Additionally, live CTCs opens the opportunity for longitudinal tracking of disease development across multiple data metrics, which can then be mined by AI to identify treatment resistance, emerging mutations, and new therapeutic targets.

Leveraging LLMs

As multi-omics data are increasingly integrated with clinical and real-world data, the utility of AI to interpret them continues to grow. For model-informed drug discovery, LLMs have moved to the fore due to their dynamic abilities to query data. A powerful example of using LLMs for these purposes lies in “prompt engineering,” which is the strategic design of queries to guide model behavior intended to find patterns in research data that can help guide drug development. C-Path is working to optimize LLMs to assist with data curation tasks, and employing prompt engineering to replace the traditional and cumbersome methods of coding required for traditional data management.

“C-Path has explored the utility of LLMs in data curation pipelines,” noted a perspective published last year in Clinical Pharmacology & Therapeutics with Podichetty as the senior author. “LLMs enable zero-or-few shot question answering with configurable tasks through templated prompts without the need for task-specific tuning.”

In this way, LLMs can, when prompted effectively, deliver meaningful information from clinical notes or semi-structured data that in the past would require manual annotation and tuning of models.

Further, prompt engineering for LLMs is an important method for data discovery, allowing them to easily interact with structured data systems and reduce the need for specialized, highly technical database queries that can significantly slow drug development. “Large language models excel in translating natural language inquiries into queries that conform to schemas defined in a few-shot setting to retrieve structured information from graphs or tabular databases,” wrote the C-Path researchers in their perspective.

But Podichetty told Inside Precision Medicine that there is an art to prompt engineering and that the ease of querying LLMs should not preclude planning how to best query them.

“Throwing stuff on the wall might not be a good strategy,” he cautioned. “You might get information that’s useless—or worse, incorrect.” Instead, pharma researchers should consider taking a staged approach: using broader, more generalized LLMs for early exploratory work, then grounding what was returned with curated datasets in a vector database or retrieval-augmented generation pipeline to help ensure the reliability of information generated. This approach is an important safeguard against AI’s Achilles’ heel—hallucinations, which is the unfortunate ability to conjure what isn’t there.

As multi-omics data—genomics, proteomics, transcriptomics, and more—are increasingly integrated with clinical and real-world data, the use of AI promises to bring multiple layers of biological and patient-level data into coherent, AI-readable formats, with an implied promise to bring human biology and disease treatment into sharper focus. “Think of it as a pie,” Podichetty said. “You are only able to capture so much from a particular data type. [Each type of data] provides us with a bigger chunk of that pie.” The challenge now, Podichetty noted, is not in accessing the data, but interpreting it responsibly and rigorously.

Informing clinical trials

Many clinical trials fail because they do not have the correct patient population for testing a drug. The ability of AI to detect subtle patterns from data can help drug sponsors focus recruitment efforts on the patient populations best suited for a particular trial, or to spot aspects of the trial design or study location that might be generating skewed data.

Podichetty said that data outliers that could jeopardize a trial might be something as subtle as the time of [the] day patients are being assessed. “It can be a very minute and simple thing that can be the difference from a trial failing and not failing,” he noted. AI can catch these small factors and allow trial managers to correct them before the factors jeopardize the entire program.

A standout example of leveraging AI in a trial setting was enriching a patient population based on data that ran different trial scenarios as part of the Type 1 Diabetes Consortium. This use of AI identified patients who were likely to get diabetes in the next two or three years.

Using this method to analyze the patient population “you can find an intervention,” Podichetty said. “So, the intervening drug can gain maximum benefit by either prolonging the time to getting type 1 diabetes or [preventing] someone from getting it.”

In addition, data from consenting patients allow for the creation of synthetic populations that let AI test a range of different trial scenarios in silico, with the aim of determining the best trial design. Similarly, synthetic populations can be used as the control arm of clinical trials, alleviating one of the more burdensome aspects of clinical development: patient recruitment. With synthetic populations, there are no physical patients in the control arm and drug companies only need to test real patients in the treatment arm of the trial.

Insilico Medicine: The poster child for AI-guided drug discovery and development

While the use of AI in various aspects of drug development is common today, and many companies developing AI and offering specific tools to aid with specific tasks, Insilico Medicine represents a company focused on informing the entire development cycle with AI, from target identification and therapy design through clinical trials.

It also boasts the most advanced potential therapy in clinical development, rentosertib, for the treatment of idiopathic pulmonary fibrosis (IPF). Using their generative AI engine, the team at Insilico Medicine identified the novel target Traf2- and NCK- interacting kinase, a serine/threonine kinase whose activation plays a crucial role in cellular signaling processes that drive fibrosis development.

In June, the company announced positive results from a Phase IIa clinical trial of rentosertib for the treatment of IPF. Based on these data, the company plans to launch a Phase III trial of the drug in China and a Phase IIb trial in the U.S. later this year.

What is remarkable about the drug is that it is the farthest advanced candidate whose target and drug design were conceived entirely in silico using generative AI.

“Novel target, novel molecule is an infinitesimally lower probability success task,” said Alex Zhavoronkov, PhD, CEO and co-founder of Insilico Medicine at a company celebration in 2021 for the first human dosing of rentosertib to treat IPF. What made the discovery of the target and the synthesis remarkable, he noted, was that the company didn’t have a wet lab or traditional biotech drug discovery arm. “Even without having the main expertise ourselves, we trained AI to outperform humans in all of those areas and create a general-purpose engine that allows you to do all of that,” Zhavoronkov said.

Founded in 2014, Insilico’s journey was initially met with a healthy dose of skepticism. The original concept was to employ deep neural networks to test cell-drug interactions without the use of animal models. It was a 2019 paper published in Nature Biotechnology, which detailed the company’s use of its generative tensorial reinforcement learning approach for de novo drug design, that established it could discover novel drug candidates—and do it quickly. In this case, it took merely 21 days to create roughly 30,000 different molecule designs with the potential to target a protein linked to fibrosis. From there, it took another 25 days to narrow down to a single molecule that showed drug-like qualities.

Insilico Hong Kong — Insilico’s Hong Kong office

Now a clinical stage company, Insilico also offers its AI drug discovery services to pharma and biotech customers via its Pharma.ai division. But as Zhavoronkov has noted, the company first needed to demonstrate that it could achieve novel target discovery and create novel drug designs itself first.

Currently, the company has more than a dozen drug candidates in the pipeline at the lead optimization or more advanced stages for the treatment of diseases like cancer, inflammatory bowel disease, and obesity.

But what really sets the AI approach to drug discovery and drug design apart is the potential to slash drug development costs and timelines. Rentosertib took only four years to pass through Phase II trials and land on the cusp of Phase III. “Insilico’s cost per program is only $3–5 million to reach developmental candidate, compared to the industry averages that can reach over hundreds of millions,” Zhavoronkov told Inside Precision Medicine’s sister publication GEN in June.

Clinical Pharmacology & Therapeutics
Nature Biotechnology

Chris Anderson, a Maine native, has been a B2B editor for more than 25 years. He was the founding editor of Security Systems News and Drug Discovery News, and led the print launch and expanded coverage as editor in chief of Clinical OMICs, now named Inside Precision Medicine.

link

Lievell

Seeing What Can’t Be Seen

Use of real-world and multi-omic data

Leveraging LLMs

Informing clinical trials

Insilico Medicine: The poster child for AI-guided drug discovery and development

Leave a Reply Cancel reply

8 Best Software Documentation Tools for 2025

Application Development and Integration Market Size to Hit USD 497.81 Billion by 2035

Harness design for long-running application development \ Anthropic

Software Market Size, Share, Trend, Forecast, 2026-2034