Limitations

An honest assessment of the benchmark's constraints, biases, and areas for improvement.

Known Limitations

Every benchmark has limitations. Acknowledging them helps researchers interpret results correctly and identify areas for future improvement.

High Priority

Single-patent scope

high

The current benchmark is derived from a single patent (WO2024112853A1) covering PKMYT1 inhibitors. This limits target diversity and may not generalise to other protein families, binding mechanisms, or chemical scaffolds.

Small dataset size

high

With ~600 ligands and 11 test examples for some tasks, metrics are inherently noisy. Confidence intervals on metric values are wide, making it difficult to distinguish small performance differences between models.

Limited target diversity

high

Only two targets (PKMYT1, CDK1) are included, both kinases. Performance patterns may differ substantially for other target classes (GPCRs, nuclear receptors, ion channels).

Medium Priority

Synthetic negative quality

medium

Synthetic negatives are computationally generated, not experimentally validated. Some may actually bind the target, while others may be trivially distinguishable from true actives by simple molecular descriptors.

Reversible orthosteric binding only

medium

The benchmark currently covers only reversible orthosteric inhibitors. Covalent inhibitors, allosteric modulators, and other binding mechanisms are not represented.

No temporal split

medium

The benchmark does not enforce a temporal train/test split. In production, models would need to predict novel compounds not seen during training, which is a harder problem.

Lower Priority

Assay comparability assumptions

low

All measurements within an assay group are assumed comparable, but batch-to-batch variability, different protein constructs, or subtle condition changes may introduce noise.

Property-baseline correlation

low

Strong correlation between molecular properties (size, lipophilicity) and activity in the dataset means property-based baselines perform reasonably well. This may partially be an artefact of the specific chemical series.

Future Directions

01Expand to multiple patents covering diverse targets and target classes.
02Include allosteric and covalent inhibitors.
03Add experimentally validated negatives alongside synthetic ones.
04Implement temporal and scaffold-based train/test splits.
05Expand to larger datasets that support more statistically robust metric estimates.
06Include structure-based features (docked poses, binding site analysis) as optional inputs.