PatentBind

PatentBind Benchmark

Evaluating molecular binding prediction through patent-derived SAR

Quick Snapshot

Targets
2

PKMYT1, CDK1

Ligands
600+

Canonical + synthetic

Task Types
8

Pointwise, pairwise, listwise

Models
4

Baselines + docking + cofolding

Why this benchmark?

Most binding affinity benchmarks measure how well a model fits measurements in a dataset. This does not reflect real-world use. Drug discovery operates through local design decisions: selecting the next analogue, prioritising within a series, deciding whether a modification improves potency.

Patent SAR over curated datasets

Dense structure–activity relationships from real medicinal chemistry programs, not curated academic subsets.

Local ranking over global regression

Emphasises ranking within analogue series — the decisions that actually drive lead optimisation.

Models must beat trivial heuristics

Property-based baselines (molecular weight, lipophilicity) ensure models demonstrate real predictive power.

Practical medicinal chemistry decisions

Evaluation mirrors real questions: 'Which analogue should we synthesise next?'

Key Findings

Ranking

Models perform best on ranking tasks (~50–63% pairwise accuracy), suggesting ranking is more tractable than absolute affinity prediction.

Regression

Continuous affinity prediction is very poor — negative R² values mean models perform worse than predicting the mean.

Baselines

Property-based heuristics (molecular weight, cLogP) are surprisingly competitive, revealing how much current models rely on simple molecular properties.

SAR

SAR winner identification (~12–21% accuracy) remains extremely challenging — far below what is useful for real medicinal chemistry.

Next Steps