Single-patent scope
highThe current benchmark is derived from a single patent (WO2024112853A1) covering PKMYT1 inhibitors. This limits target diversity and may not generalise to other protein families, binding mechanisms, or chemical scaffolds.
An honest assessment of the benchmark's constraints, biases, and areas for improvement.
Every benchmark has limitations. Acknowledging them helps researchers interpret results correctly and identify areas for future improvement.
The current benchmark is derived from a single patent (WO2024112853A1) covering PKMYT1 inhibitors. This limits target diversity and may not generalise to other protein families, binding mechanisms, or chemical scaffolds.
With ~600 ligands and 11 test examples for some tasks, metrics are inherently noisy. Confidence intervals on metric values are wide, making it difficult to distinguish small performance differences between models.
Only two targets (PKMYT1, CDK1) are included, both kinases. Performance patterns may differ substantially for other target classes (GPCRs, nuclear receptors, ion channels).
Synthetic negatives are computationally generated, not experimentally validated. Some may actually bind the target, while others may be trivially distinguishable from true actives by simple molecular descriptors.
The benchmark currently covers only reversible orthosteric inhibitors. Covalent inhibitors, allosteric modulators, and other binding mechanisms are not represented.
The benchmark does not enforce a temporal train/test split. In production, models would need to predict novel compounds not seen during training, which is a harder problem.
All measurements within an assay group are assumed comparable, but batch-to-batch variability, different protein constructs, or subtle condition changes may introduce noise.
Strong correlation between molecular properties (size, lipophilicity) and activity in the dataset means property-based baselines perform reasonably well. This may partially be an artefact of the specific chemical series.