Evaluation tools#
Evaluating docking poses across a stratified test set#
The plinder.eval
subpackage allows (1) assessing protein-ligand complex predictions against reference plinder
systems, and
(2) correlating the performance of these predictions against the level of similarity of each test system to the corresponding training set.
The output file from running the scripts src/plinder/eval/docking/write_scores.py
and src/plinder/eval/docking/stratify_test_set.py
generates the same evaluation metrics as the ones we have on the public leaderboard.
The plinder-eval
package allows
assessing protein-ligand complex predictions against reference
plinder
systems, andcorrelating the performance of these predictions against the level of similarity of each test system to the corresponding training set.
The output files from running plinder-eval
will be used to populate the MLSB leaderboard.
Input#
predictions.csv
with each row representating a protein-ligand pose, and the following columns:
id
: An identifier for the prediction (same across different ranked poses of the same prediction)reference_system_id
:plinder
system ID to use as referencereceptor_file
: Path to protein CIF file. Leave blank if rigid docking, the system’s receptor file will be used.rank
: The rank of the pose (1-indexed)confidence
: Optional score associated with the poseligand_file
: Path to the pose SDF file or directory of SDF files for multi ligand poses
split.parquet
with, at a minimum, system_id
and split
columns mapping PLINDER systems to train
, or test
.
Commands#
Write scores#
plinder_eval --prediction_file tests/test_data/eval/predictions.csv --output_dir test_eval/ --num_processes 8
This calculates accuracy metrics for all predicted poses compared to the reference. JSON files of each pose are stored in test_eval/scores
and the summary file across all poses is stored in test_eval/scores.parquet
.
The predicted pose is compared to the reference system and the following ligand scores are calculated per each ligand:
lddt_pli
: lDDT-PLI for the matched ligandbisy_rmsd
: binding-site superposed symmetry-corrected RMSD for the matched ligandlddt_lp
: lDDT score for the residues in the matched ligand pocketbest_rmsd_matched_reference_chain
: chain tag of the bestbisy_rmsd
matched ligand chain in reference (useful for mutli ligand systems)best_pli_matched_reference_chain
: chain tag of the bestlddt_pli
matched ligand chain in reference (useful for mutli ligand systems)
and in aggregate per system:
fraction_reference_ligands_mapped
: Fraction of reference ligand chains with corresponding model chainsfraction_model_ligands_mapped
: Fraction of model ligand chains mapped to corresponding reference chainslddt_pli_ave
: average lDDT-PLI across mapped ligandslddt_pli_wave
: average lDDT-PLI across mapped ligands weighted by number of atomslddt_lp_ave
: average lDDT-LP score for the residues in the matched ligand pocketlddt_lp_wave
: average lDDT-LP across mapped ligands weighted by number of atomsbisy_rmsd_ave
: average binding-site superposed symmetry-corrected RMSD across mapped ligandsbisy_rmsd_wave
: average binding-site superposed symmetry-corrected RMSD across mapped ligands weighted by number of atoms
If --score_receptor
flag is used, then protein in receptor_file
is compared to the reference
system receptor file and the following scores are calculated:
fraction_reference_proteins_mapped
: Fraction of reference protein chains with corresponding model chainsfraction_model_proteins_mapped
: Fraction of model protein chains mapped to corresponding reference chainslddt
: all atom lDDTbb_lddt
: CA lDDTper_chain_lddt_ave
: average all atom lDDT across all mapped chainsper_chain_lddt_wave
: average all atom lDDT across all mapped chains weighted by chain lengthper_chain_bb_lddt_ave
: average CA lDDT across all mapped chainsper_chain_bb_lddt_wave
: average CA lDDT across all mapped chains weighted by chain length
For oligomeric complexes:
qs_global
- Global QS scoreqs_best
- Global QS-score - only computed on aligned residuesdockq_ave
- Average of DockQ scoresdockq_wave
- Same as dockq_ave, weighted by native contacts
If score_posebusters
is True, all posebusters checks are saved.
You can inspect the results at test_eval/scores.parquet
>>> import pandas as pd
>>> df = pd.read_parquet("test_eval/scores.parquet")
>>> df.T
0 1
model 1a3b__1__1.B__1.D 1ai5__1__1.A_1.B__1.D
reference 1a3b__1__1.B__1.D 1ai5__1__1.A_1.B__1.D
num_reference_ligands 1 1
num_model_ligands 1 1
num_reference_proteins 1 2
num_model_proteins 1 2
fraction_reference_ligands_mapped 1.0 1.0
fraction_model_ligands_mapped 1.0 1.0
lddt_pli_ave 0.85815 0.510695
lddt_pli_wave 0.85815 0.510695
bisy_rmsd_ave 1.617184 3.665143
bisy_rmsd_wave 1.617184 3.665143
rank 1 1
Write test stratification data#
(This command will not need to be run by a user, the test_set.parquet
and val_set.parquet
file will be provided with the split release)
plinder_stratify --split_file split.csv --output_dir test_data
Makes test_data/test_set.parquet
which
Labels the maximum similarity of each test system to the training set across all the similarity metrics
Stratifies the test set based on training set similarity into
novel_pocket_pli
,novel_ligand_pli
,novel_protein
,novel_ligand
,novel_all
andnot_novel
Labels test systems with high quality.
To inspect the result of the run, do:
>>> import pandas as pd
>>> df = pd.read_parquet("test_eval/test_set.parquet")
>>> df.T
0 1
system_id 1a3b__1__1.B__1.D 1ai5__1__1.A_1.B__1.D
pli_qcov 0.0 0.0
protein_seqsim_qcov_weighted_sum 0.0 0.0
protein_seqsim_weighted_sum 0.0 0.0
protein_fident_qcov_weighted_sum 0.0 0.0
protein_fident_weighted_sum 0.0 0.0
protein_lddt_qcov_weighted_sum 0.0 0.0
protein_lddt_weighted_sum 0.0 0.0
protein_qcov_weighted_sum 0.0 0.0
pocket_fident_qcov 0.0 0.0
pocket_fident 0.0 0.0
pocket_lddt_qcov 0.0 0.0
pocket_lddt 0.0 0.0
pocket_qcov 0.0 0.0
tanimoto_similarity_max 0.0 0.0
passes_quality False False
novel_pocket_pli True True
novel_ligand True True
novel_protein True True
novel_all True True
not_novel False False
>>>