Evaluation tools#

Evaluating docking poses across a stratified test set#

The plinder.eval subpackage allows (1) assessing protein-ligand complex predictions against reference plinder systems, and (2) correlating the performance of these predictions against the level of similarity of each test system to the corresponding training set.

The output file from running the scripts src/plinder/eval/docking/write_scores.py and src/plinder/eval/docking/stratify_test_set.py generates the same evaluation metrics as the ones we have on the public leaderboard.

The plinder-eval package allows

assessing protein-ligand complex predictions against reference plinder systems, and
correlating the performance of these predictions against the level of similarity of each test system to the corresponding training set.

The output files from running plinder-eval will be used to populate the MLSB leaderboard.

Input#

predictions.csv with each row representating a protein-ligand pose, and the following columns:

id: An identifier for the prediction (same across different ranked poses of the same prediction)
reference_system_id: plinder system ID to use as reference
receptor_file: Path to protein CIF file. Leave blank if rigid docking, the system’s receptor file will be used.
rank: The rank of the pose (1-indexed)
confidence: Optional score associated with the pose
ligand_file: Path to the pose SDF file or directory of SDF files for multi ligand poses

split.parquet with, at a minimum, system_id and split columns mapping PLINDER systems to train, or test.

Commands#

Write scores#

plinder_eval --prediction_file tests/test_data/eval/predictions.csv --output_dir test_eval/ --num_processes 8

This calculates accuracy metrics for all predicted poses compared to the reference. JSON files of each pose are stored in test_eval/scores and the summary file across all poses is stored in test_eval/scores.parquet.

The predicted pose is compared to the reference system and the following ligand scores are calculated per each ligand:

lddt_pli: lDDT-PLI for the matched ligand
bisy_rmsd: binding-site superposed symmetry-corrected RMSD for the matched ligand
lddt_lp: lDDT score for the residues in the matched ligand pocket
best_rmsd_matched_reference_chain: chain tag of the best bisy_rmsd matched ligand chain in reference (useful for mutli ligand systems)
best_pli_matched_reference_chain: chain tag of the best lddt_pli matched ligand chain in reference (useful for mutli ligand systems)

and in aggregate per system:

fraction_reference_ligands_mapped: Fraction of reference ligand chains with corresponding model chains
fraction_model_ligands_mapped: Fraction of model ligand chains mapped to corresponding reference chains
lddt_pli_ave: average lDDT-PLI across mapped ligands
lddt_pli_wave: average lDDT-PLI across mapped ligands weighted by number of atoms
lddt_lp_ave: average lDDT-LP score for the residues in the matched ligand pocket
lddt_lp_wave: average lDDT-LP across mapped ligands weighted by number of atoms
bisy_rmsd_ave: average binding-site superposed symmetry-corrected RMSD across mapped ligands
bisy_rmsd_wave: average binding-site superposed symmetry-corrected RMSD across mapped ligands weighted by number of atoms

If --score_receptor flag is used, then protein in receptor_file is compared to the reference system receptor file and the following scores are calculated:

fraction_reference_proteins_mapped: Fraction of reference protein chains with corresponding model chains
fraction_model_proteins_mapped: Fraction of model protein chains mapped to corresponding reference chains
lddt: all atom lDDT
bb_lddt: CA lDDT
per_chain_lddt_ave: average all atom lDDT across all mapped chains
per_chain_lddt_wave: average all atom lDDT across all mapped chains weighted by chain length
per_chain_bb_lddt_ave: average CA lDDT across all mapped chains
per_chain_bb_lddt_wave: average CA lDDT across all mapped chains weighted by chain length

For oligomeric complexes:

qs_global - Global QS score
qs_best - Global QS-score - only computed on aligned residues
dockq_ave - Average of DockQ scores
dockq_wave - Same as dockq_ave, weighted by native contacts

If score_posebusters is True, all posebusters checks are saved.

You can inspect the results at test_eval/scores.parquet

>>> import pandas as pd
>>> df = pd.read_parquet("test_eval/scores.parquet")
>>> df.T
                                                   0                      1
model                              1a3b__1__1.B__1.D  1ai5__1__1.A_1.B__1.D
reference                          1a3b__1__1.B__1.D  1ai5__1__1.A_1.B__1.D
num_reference_ligands                              1                      1
num_model_ligands                                  1                      1
num_reference_proteins                             1                      2
num_model_proteins                                 1                      2
fraction_reference_ligands_mapped                1.0                    1.0
fraction_model_ligands_mapped                    1.0                    1.0
lddt_pli_ave                                 0.85815               0.510695
lddt_pli_wave                                0.85815               0.510695
bisy_rmsd_ave                               1.617184               3.665143
bisy_rmsd_wave                              1.617184               3.665143
rank                                               1                      1

Write test stratification data#

(This command will not need to be run by a user, the test_set.parquet and val_set.parquet file will be provided with the split release)

plinder_stratify --split_file split.csv --output_dir test_data

Makes test_data/test_set.parquet which

Labels the maximum similarity of each test system to the training set across all the similarity metrics
Stratifies the test set based on training set similarity into novel_pocket_pli, novel_ligand_pli, novel_protein, novel_ligand, novel_all and not_novel
Labels test systems with high quality.

To inspect the result of the run, do:

>>> import pandas as pd
>>> df = pd.read_parquet("test_eval/test_set.parquet")
>>> df.T
                                                  0                      1
system_id                         1a3b__1__1.B__1.D  1ai5__1__1.A_1.B__1.D
pli_qcov                                        0.0                    0.0
protein_seqsim_qcov_weighted_sum                0.0                    0.0
protein_seqsim_weighted_sum                     0.0                    0.0
protein_fident_qcov_weighted_sum                0.0                    0.0
protein_fident_weighted_sum                     0.0                    0.0
protein_lddt_qcov_weighted_sum                  0.0                    0.0
protein_lddt_weighted_sum                       0.0                    0.0
protein_qcov_weighted_sum                       0.0                    0.0
pocket_fident_qcov                              0.0                    0.0
pocket_fident                                   0.0                    0.0
pocket_lddt_qcov                                0.0                    0.0
pocket_lddt                                     0.0                    0.0
pocket_qcov                                     0.0                    0.0
tanimoto_similarity_max                         0.0                    0.0
passes_quality                                False                  False
novel_pocket_pli                               True                   True
novel_ligand                                   True                   True
novel_protein                                  True                   True
novel_all                                      True                   True
not_novel                                     False                  False
>>>