A property-based baseline that scores molecules solely by molecular weight. Larger molecules tend to have more intermolecular contacts, so this tests whether models beat the trivial heuristic of size-based ranking.
Design Rationale
If a model can't outperform 'bigger is better', it may be learning size artifacts rather than genuine binding signals.
Evaluation Scores
Area under the precision-recall curve. More informative when class balance is skewed.
Area under the ROC curve. Measures discrimination ability across all thresholds.
Enrichment factor at 1%. How many actives found in the top 1% vs random.
Enrichment factor at 5%. How many actives found in the top 5% vs random.
Fraction within ±1 rank of the true rank. Reflects triage decisions.
Harrell's C statistic — probability that a random pair is correctly ordered.
Rank correlation averaged within assay groups.
Fraction of predictions matching the exact ordinal rank.
Fraction of pairs correctly ordered by ordinal rank.