Evaluation tools#
Evaluating docking poses across a stratified test set#
The plinder.eval
subpackage allows (1) assessing protein-ligand complex predictions against reference plinder
systems, and
(2) correlating the performance of these predictions against the level of similarity of each test system to the corresponding training set.
The output file from running the scripts src/plinder/eval/docking/write_scores.py
and src/plinder/eval/docking/stratify_test_set.py
generates the same evaluation metrics as the ones we have on the public leaderboard.
The plinder-eval
package allows
assessing protein-ligand complex predictions against reference
plinder
systems, andcorrelating the performance of these predictions against the level of similarity of each test system to the corresponding training set.
The output file from running plinder-eval
can be used as a submission to the
leaderboard (coming soon).
Input#
predictions.csv
with each row representating a protein-ligand pose, and the following columns:
id
: An identifier for the prediction (same across different ranked poses of the same prediction)reference_system_id
:plinder
system ID to use as referencereceptor_file
: Path to protein CIF file. Leave blank if rigid docking, the system’s receptor file will be used.rank
: The rank of the pose (1-indexed)confidence
: Optional score associated with the poseligand_file
: Path to the SDF file of the pose
split.parquet
with, at a minimum, system_id
and split
columns mapping PLINDER systems to train
, or test
.
Commands#
Write scores#
plinder_eval --prediction_file tests/test_data/eval/predictions.csv --data_dir tests/test_data/eval --output_dir test_eval/ --num_processes 8
This calculates accuracy metrics for all predicted poses compared to the reference. JSON files of each pose are stored in test_eval/scores
and the summary file across all poses is stored in test_eval/scores.parquet
.
The predicted pose is compared to the reference system and the following ligand scores are calculated:
fraction_reference_ligands_mapped
: Fraction of reference ligand chains with corresponding model chainsfraction_model_ligands_mapped
: Fraction of model ligand chains mapped to corresponding reference chainslddt_pli_ave
: average lDDT-PLI across mapped ligandslddt_pli_wave
: average lDDT-PLI across mapped ligands weighted by number of atomslddt_pli_amd_ave
: average lDDT-PLI including penalising added model contacts across mapped ligandslddt_pli_wave
: average lDDT-PLI including penalising added model contacts across mapped ligands weighted by number of atomsscrmsd_ave
: average symmetry-corrected RMSD across mapped ligandsscrmsd_wave
: average symmetry-corrected RMSD across mapped ligands weighted by number of atoms
If score_protein
is set to True, the protein in receptor_file
is compared to the system receptor file and the following scores are calculated:
fraction_reference_proteins_mapped
: Fraction of reference protein chains with corresponding model chainsfraction_model_proteins_mapped
: Fraction of model protein chains mapped to corresponding reference chainslddt
: all atom lDDTbb_lddt
: CA lDDTper_chain_lddt_ave
: average all atom lDDT across all mapped chainsper_chain_lddt_wave
: average all atom lDDT across all mapped chains weighted by chain lengthper_chain_bb_lddt_ave
: average CA lDDT across all mapped chainsper_chain_bb_lddt_wave
: average CA lDDT across all mapped chains weighted by chain length
For oligomeric complexes:
qs_global
- Global QS scoreqs_best
- Global QS-score - only computed on aligned residuesdockq_ave
- Average of DockQ scoresdockq_wave
- Same as dockq_ave, weighted by native contacts
If score_posebusters
is True, all posebusters checks are saved.
You can inspect the results at test_eval/scores.parquet
>>> import pandas as pd
>>> df = pd.read_parquet("test_eval/scores.parquet")
>>> df.T
0 1
model 1a3b__1__1.B__1.D 1ai5__1__1.A_1.B__1.D
reference 1a3b__1__1.B__1.D 1ai5__1__1.A_1.B__1.D
num_reference_ligands 1 1
num_model_ligands 1 1
num_reference_proteins 1 2
num_model_proteins 1 2
fraction_reference_ligands_mapped 1.0 1.0
fraction_model_ligands_mapped 1.0 1.0
lddt_pli_ave 0.889506 0.557841
lddt_pli_wave 0.889506 0.557841
lddt_pli_amd_ave 0.85815 0.510695
lddt_pli_amd_wave 0.85815 0.510695
scrmsd_ave 1.617184 3.665143
scrmsd_wave 1.617184 3.665143
rank 1 1
Write test stratification data#
(This command will not need to be run by a user, the test_set.parquet
and val_set.parquet
file will be provided with the split release)
plinder_stratify --split_file split.csv --data_dir PLINDER_DATA_DIR --output_dir test_data
Makes test_data/test_set.parquet
which
Labels the maximum similarity of each test system to the training set across all the similarity metrics
Stratifies the test set based on training set similarity into
novel_pocket_pli
,novel_ligand_pli
,novel_protein
,novel_ligand
,novel_all
andnot_novel
Labels test systems with high quality.
To inspect the result of the run, do:
>>> import pandas as pd
>>> df = pd.read_parquet("test_eval/test_set.parquet")
>>> df.T
0 1
system_id 1a3b__1__1.B__1.D 1ai5__1__1.A_1.B__1.D
pli_qcov 0.0 0.0
protein_seqsim_qcov_weighted_sum 0.0 0.0
protein_seqsim_weighted_sum 0.0 0.0
protein_fident_qcov_weighted_sum 0.0 0.0
protein_fident_weighted_sum 0.0 0.0
protein_lddt_qcov_weighted_sum 0.0 0.0
protein_lddt_weighted_sum 0.0 0.0
protein_qcov_weighted_sum 0.0 0.0
pocket_fident_qcov 0.0 0.0
pocket_fident 0.0 0.0
pocket_lddt_qcov 0.0 0.0
pocket_lddt 0.0 0.0
pocket_qcov 0.0 0.0
tanimoto_similarity_max 0.0 0.0
passes_quality False False
novel_pocket_pli True True
novel_pocket_ligand True True
novel_protein True True
novel_all True True
not_novel False False
>>>