Multimodal Language Models for Accelerated Molecular Screening and Optimization

Traditional molecular screening methods often face significant challenges including high computational costs and long design cycles. These methods also rely heavily on high quality 3D protein structures which are not always available or reliable. In many drug discovery scenarios, reliable crystal structures are difficult to obtain and this severely limits the scope of structure-based drug design. To address these limitations, recent research has developed CoDrug which is an innovative multimodal fusion framework that marks a shift toward text-driven molecular discovery.

CoDrug integrates textual information and protein sequences and compound structural representations to achieve high accuracy virtual screening and multi-property optimization without requiring 3D structural data.

Harnessing Multimodal Fusion for Drug Discovery

The core of the CoDrug framework lies in its ability to simultaneously understand and correlate biomedical text, amino acid sequences and molecular chemical structures. This multidimensional approach provides a more holistic view of drug-target interactions.

Complementary Fusion Strategies

The framework employs two distinct strategies to capture complex biochemical relationships.

  • Text protein sequence fusion

SciBERT is used to encode functional descriptions while the ESM model extracts sequence-level features. This allows the model to connect a protein's functional definition to its underlying sequence patterns.

  • Text compound structure fusion

ChemFormer encodes molecular SMILES structures paired with SciBERT processing compound related textual descriptions. This builds a semantic bridge between chemical architecture and biological activity.

Contrastive Learning and Cross Attention

CoDrug utilizes contrastive learning to align features from different modes within a unified latent space. By incorporating a cross-attention mechanism, the model dynamically focuses on the most relevant segments of both the protein and the ligand, which significantly enhances the accuracy of binding affinity predictions.

Multi-objective Property Optimization

Beyond virtual screening, CoDrug serves as a powerful platform for lead optimization. Through a multi-task learning architecture, it simultaneously predicts and optimizes multiple critical properties, such as Quantitative Estimate of Drug Likeness and partition coefficient and molecular weight. This parallel optimization capability dramatically reduces the time required to move from a lead compound to a clinical candidate.

Multimodal language models workflowFig. 1. Workflow of SMILES-text data set construction. (Gu R.; et al. 2026)

Strategic Performance Breakthroughs

In various benchmarks and practical scenarios, CoDrug has demonstrated superior performance compared to traditional single mode or simple dual mode models.

  • Superior screening precision

CoDrug outperforms existing cross-modal models in both classification and regression tasks, and accurately identifies active molecules from massive chemical libraries.

  • Zero-shot screening capability

Even for novel targets with no known ligand data, CoDrug can perform effective candidate retrieval by leveraging its deep understanding of protein functional text and sequences.

  • Efficient multi-property prediction

The model achieves high correlation and low error rates in predicting QED and LogP, and ensures that the generated molecules meet regulatory and pharmacological standards.

  • Optimized molecule generation

When coupled with generative algorithms, CoDrug produces novel chemical structures that possess both high binding affinity and favorable pharmacokinetic profiles.

Revolutionize Your Pipeline with Protheragen MedAI

At Protheragen MedAI, we are dedicated to integrating multimodal AI into modern drug discovery. Our AI services empower your pipeline in several key areas.

  • AI-powered drug discovery and design

We provide high accuracy virtual screening that does not rely on 3D structures, and allows you to target proteins based on sequence and functional data alone.

  • Multi property lead optimization

Our platform supports the simultaneous optimization of multiple objectives and ensures that potency, solubility and safety are refined in a single and efficient stage.

  • Text driven target analysis

We extract deep insights into target drug associations from vast repositories of literature and patent data.

  • High throughput screening

We apply multimodal predictive models to billion scale molecule libraries, and provide a continuous stream of high quality leads for your development pipeline.

Whether you are exploring a novel biological target or fine tuning the properties of an existing lead, Protheragen MedAI provides the precision and technological edge required for success. Contact us today to accelerate your intelligent drug discovery journey.

Original Article:

  1. Gu R.; et al. (2026). CoDrug: A Text-Driven Molecular Virtual Screening and Multiproperty Optimization Framework via Multimodal Language Model. Journal of Chemical Information and Modeling. 2026.

Services Related in the Article:

Online Inquiry