ESMFold2 Designs Protein Binders in Days Instead of Years
Biohub released a trio of open models today: ESMC (a language model of 2.8 billion protein sequences), ESMFold2 (structure prediction and design), and ESM Atlas (a searchable map of 6.8 billion sequences and 1.1 billion predicted structures). The key claim: ESMFold2 can design functional protein binders against cancer targets in days, with lab-validated hit rates of 36–88% for minibinders and 15–29% for antibody-derived formats.
For context, a single preclinical binder candidate typically takes three to four years to develop. The researchers targeted five proteins central to cancer and immunology: EGFR, PDGFRβ, PD-L1, CTLA-4, and CD45. For PD-L1, the designed binders restored T cell signaling in lab tests, blocking the same pathway as approved checkpoint therapies.
ESMC: A Language Model Trained on All Life
ESMC is a Transformer-based protein language model trained on approximately 2.8 billion sequences from across the tree of life. The training objective is simple: predict the amino acids that evolution selects. Because evolution preserves functional proteins, the model internalizes the physical rules governing folding, interaction, and function. Biohub calls this a "world model of protein biology."
ESMC's representations can be used for downstream tasks without fine-tuning. The model achieves state-of-the-art on standard protein benchmarks, but the real innovation is in design. The researchers used ESMC's embeddings as input to ESMFold2 for structure prediction and binder design.
ESMFold2 Beats AlphaFold 3 on Antibody-Antigen Prediction
ESMFold2 predicts structures of protein complexes at atomic resolution. According to Biohub's preprint, ESMFold2 outperforms AlphaFold 3 on antibody-antigen interaction prediction when using only ESMC representations (no MSA). When provided with the same multiple sequence alignment (MSA) as AlphaFold, ESMFold2 is the strongest predictor on both general protein-protein interactions and antibody-antigen benchmarks.
Key benchmark results from the source:
- ESMFold2 leads on standard protein folding benchmarks for protein-protein and antibody-antigen interactions.
- It consistently improves with more compute when scoring multiple predictions by confidence.
ESM Atlas: Navigate 6.8 Billion Sequences
ESM Atlas organizes 6.8 billion protein sequences and 1.1 billion predicted structures by the relationships ESMC learned. It surfaces connections not captured by existing databases, including evolutionary links between gene-editing enzymes across distant branches of life. Much of that biology has never been annotated. For researchers studying poorly understood diseases, ESM Atlas makes uncharacterized biology searchable.
Open and Freely Available
All three components are available at the Biohub Platform (biohub.org). The models are open, and the team encourages researchers to use them for therapeutic design and fundamental biology. The press release emphasizes open science: "Making these tools freely available means researchers everywhere can move faster toward personalized cures."
Technical Details for Developers
- ESMC: Transformer language model, ~2.8B sequences, no MSA required for predictions.
- ESMFold2: Structure prediction model, uses ESMC embeddings, outputs atomic 3D coordinates.
- ESM Atlas: Search interface over 6.8B sequences with 1.1B precomputed structures.
- Training data: sequences from bacteria, extremophiles, and human proteins (20,000+ types).
- Design validation: 5 targets, 36-88% minibinder hit rate, 15-29% antibody-derived hit rate.
If you want to try it, head to biohub.org and access the models via API or download. For structure prediction, you can use ESMFold2 via the platform or run it locally if you have the compute. The preprint with full benchmark details is available on Biohub's site.
Why It Matters
For developers building in biotech or AI, this is a concrete open-source alternative to AlphaFold. It's not just prediction — it's design. If you work on drug discovery pipelines, you can integrate ESMFold2 to generate candidates in days instead of years. The models are free, so there's no barrier to entry.



