A property-based baseline that scores molecules by computed lipophilicity (cLogP via RDKit). Greasier compounds often bind more strongly through the hydrophobic effect, but this is undesirable in drug design.
Design Rationale
This baseline should specifically fail on LLE tasks, where lipophilicity is penalised. A model that can't beat this on LLE tasks is not learning specificity.
Evaluation Scores
Area under the precision-recall curve. More informative when class balance is skewed.
Area under the ROC curve. Measures discrimination ability across all thresholds.
Enrichment factor at 1%. How many actives found in the top 1% vs random.
Enrichment factor at 5%. How many actives found in the top 5% vs random.
Fraction within ±1 rank of the true rank. Reflects triage decisions.
Harrell's C statistic — probability that a random pair is correctly ordered.
Rank correlation averaged within assay groups.
Fraction of predictions matching the exact ordinal rank.
Fraction of pairs correctly ordered by ordinal rank.