A substantial portion of the human druggable genome remains untargeted by small-molecule therapeutics, largely due to existing computational methods that cannot support genome-wide screening in practice. Molecular docking is too computationally intensive for that purpose. And previous deep learning methods for virtual screening either don't generalize well or rely on limited datasets to train on.
Jia Y et al. introduce DrugCLIP, a contrastive learning framework for virtual screening that sets a new state of the art. It showcases how AI can enable fast and accurate genome-wide virtual screening in the post-AlphaFold era.
DrugCLIP (Contrastive Learning for Interaction Prediction) turns virtual screening into a dense retrieval problem. It models protein pockets and small molecules using encoders that project them into the same vector space. Predicting binding probability is then as simple as computing vector similarity, significantly accelerating the screening process.
DrugCLIP works by pre-training on massive amounts of data, then fine-tuning using what's available from experiments.
One of the biggest challenges with using deep learning for virtual screening is labeling bioactivity data is limited. To overcome this problem, the authors created the Protein Fragment-Surrounding Alignment (ProFSA) framework. This method generates massive amounts of synthetic data by extracting protein fragments that mimic ligand-pocket interactions, allowing the model to learn interaction-aware representations without relying solely on limited experimental data.
DrugCLIP is further fine-tuned on experimentally solved protein-ligand complex structures. A contrastive loss function maximizes the similarity between true binders and their targets while minimizing it for non-binders, ensuring high discriminatory power.
DrugCLIP is 10 million times faster than conventional docking programs. It can virtually screen trillions of protein-ligand pairs in less than one day on just 8 GPUs. The model was also found to be incredibly robust to structural errors and works extremely well with predicted structures from AlphaFold2 as well.
Fig. 1. The framework of DrugCLIP. (Jia Y.; et al. 2026)
Researchers validated DrugCLIP both in silico on various benchmarks as well as in wet-lab tests, yielding impressive results.
Benchmark performance
DrugCLIP outperforms popular docking tools and current deep learning baselines on enrichment factors.
Identifying novel binders
Binding the Undruggable
Most impressive to the team, inhibitors were found for TRIP12, an E3 ubiquitin ligase which had no previously reported small molecule binders or solved holo structures. By combining DrugCLIP with GenPack—a generative pocket refinement module designed for AlphaFold structures—they achieved a 17.5% hit rate, opening new doors for targeting proteins that lack experimental structural data.
Leveraging the speed of DrugCLIP, the researchers conducted a genome-wide virtual screening campaign across approximately 10,000 human proteins against a library of 500 million compounds. The resulting database, GenomeScreenDB, contains over 2 million potential hit molecules and covers nearly half of the human genome—significantly surpassing the coverage of existing databases like ChEMBL.
DrugCLIP demonstrates how AI models can help overcome the limitations of traditional approaches in drug discovery. At Protheragen MedAI, we harness the power of similar state-of-the-art deep learning models to speed up your drug discovery efforts.
Drug discovery AI services enable you to search through vast chemical space to quickly identify your next hits.
We utilize advanced generative models and dense retrieval techniques to perform high-throughput virtual screening, efficiently identifying hits for both novel and established targets.
Leveraging predicted structures and AI-based pocket detection, we help you unlock undruggable targets that lack experimental crystal structures.
Molecular dynamics and AI scoring functions are used to optimize leads for high binding affinity and desirable distinct chemical properties.
Targeting one protein or the whole genome, Protheragen MedAI has the computational resources and expertise to help you move predicted targets into clinical. Contact us today to learn how we can help you move your projects forward.
Original Article:
Services Related in the Article: