DrugCLIP: Deep Contrastive Learning Enables Ultrafast Genome-Wide Virtual Screening

A substantial portion of the human druggable genome remains untargeted by small-molecule therapeutics, largely due to existing computational methods that cannot support genome-wide screening in practice. Molecular docking is too computationally intensive for that purpose. And previous deep learning methods for virtual screening either don't generalize well or rely on limited datasets to train on.

Jia Y et al. introduce DrugCLIP, a contrastive learning framework for virtual screening that sets a new state of the art. It showcases how AI can enable fast and accurate genome-wide virtual screening in the post-AlphaFold era.

DrugCLIP: Redefining Virtual Screening as Dense Retrieval

DrugCLIP (Contrastive Learning for Interaction Prediction) turns virtual screening into a dense retrieval problem. It models protein pockets and small molecules using encoders that project them into the same vector space. Predicting binding probability is then as simple as computing vector similarity, significantly accelerating the screening process.

Innovative Training Strategy

DrugCLIP works by pre-training on massive amounts of data, then fine-tuning using what's available from experiments.

  • Pre-training synthetically generated protein-small molecule pairs with ProFSA

One of the biggest challenges with using deep learning for virtual screening is labeling bioactivity data is limited. To overcome this problem, the authors created the Protein Fragment-Surrounding Alignment (ProFSA) framework. This method generates massive amounts of synthetic data by extracting protein fragments that mimic ligand-pocket interactions, allowing the model to learn interaction-aware representations without relying solely on limited experimental data.

  • Fine-tuning with experimentally solved protein-ligand complexes

DrugCLIP is further fine-tuned on experimentally solved protein-ligand complex structures. A contrastive loss function maximizes the similarity between true binders and their targets while minimizing it for non-binders, ensuring high discriminatory power.

Unmatched Speed and Robustness

DrugCLIP is 10 million times faster than conventional docking programs. It can virtually screen trillions of protein-ligand pairs in less than one day on just 8 GPUs. The model was also found to be incredibly robust to structural errors and works extremely well with predicted structures from AlphaFold2 as well.

Framework of DrugCLIPFig. 1. The framework of DrugCLIP. (Jia Y.; et al. 2026)

Experimental Validation and Wet-Lab Success

Researchers validated DrugCLIP both in silico on various benchmarks as well as in wet-lab tests, yielding impressive results.

Benchmark performance

DrugCLIP outperforms popular docking tools and current deep learning baselines on enrichment factors.

Identifying novel binders

  • 5HT2A receptor: The model identifies potent agonists with nanomolar affinity for this key psychiatric target.
  • Norepinephrine Transporter (NET): DrugCLIP identified hits with new chemotypes. Hit binding modes were validated by Cryo-EM, demonstrating the utility of the model.

Binding the Undruggable

Most impressive to the team, inhibitors were found for TRIP12, an E3 ubiquitin ligase which had no previously reported small molecule binders or solved holo structures. By combining DrugCLIP with GenPack—a generative pocket refinement module designed for AlphaFold structures—they achieved a 17.5% hit rate, opening new doors for targeting proteins that lack experimental structural data.

GenomeScreenDB: A Resource for Global Discovery

Leveraging the speed of DrugCLIP, the researchers conducted a genome-wide virtual screening campaign across approximately 10,000 human proteins against a library of 500 million compounds. The resulting database, GenomeScreenDB, contains over 2 million potential hit molecules and covers nearly half of the human genome—significantly surpassing the coverage of existing databases like ChEMBL.

How Protheragen MedAI Can Help

DrugCLIP demonstrates how AI models can help overcome the limitations of traditional approaches in drug discovery. At Protheragen MedAI, we harness the power of similar state-of-the-art deep learning models to speed up your drug discovery efforts.

Drug discovery AI services enable you to search through vast chemical space to quickly identify your next hits.

  • AI-powered drug discovery and design

We utilize advanced generative models and dense retrieval techniques to perform high-throughput virtual screening, efficiently identifying hits for both novel and established targets.

  • Target identification

Leveraging predicted structures and AI-based pocket detection, we help you unlock undruggable targets that lack experimental crystal structures.

  • Structure-based drug design

Molecular dynamics and AI scoring functions are used to optimize leads for high binding affinity and desirable distinct chemical properties.

Targeting one protein or the whole genome, Protheragen MedAI has the computational resources and expertise to help you move predicted targets into clinical. Contact us today to learn how we can help you move your projects forward.

Original Article:

  1. Jia Y.; et al. (2026). Deep contrastive learning enables genome-wide virtual screening. Science. 2026, 391(6781): eads9530.

Services Related in the Article:

Online Inquiry