Evaluation tools#

Evaluating docking poses across a stratified test set#

The plinder.eval subpackage allows (1) assessing protein-ligand complex predictions against reference plinder systems, and (2) correlating the performance of these predictions against the level of similarity of each test system to the corresponding training set.

The output file from running the scripts src/plinder/eval/docking/write_scores.py and src/plinder/eval/docking/stratify_test_set.py generates the same evaluation metrics as the ones we have on the public leaderboard.

The plinder-eval package allows

  1. assessing protein-ligand complex predictions against reference plinder systems, and

  2. correlating the performance of these predictions against the level of similarity of each test system to the corresponding training set.

The output file from running plinder-eval can be used as a submission to the leaderboard (coming soon).

Input#

predictions.csv with each row representating a protein-ligand pose, and the following columns:

  • id: An identifier for the prediction (same across different ranked poses of the same prediction)

  • reference_system_id: plinder system ID to use as reference

  • receptor_file: Path to protein CIF file. Leave blank if rigid docking, the system’s receptor file will be used.

  • rank: The rank of the pose (1-indexed)

  • confidence: Optional score associated with the pose

  • ligand_file: Path to the SDF file of the pose

split.parquet with, at a minimum, system_id and split columns mapping PLINDER systems to train, or test.

Commands#

Write scores#

plinder_eval --prediction_file tests/test_data/eval/predictions.csv --data_dir tests/test_data/eval --output_dir test_eval/ --num_processes 8

This calculates accuracy metrics for all predicted poses compared to the reference. JSON files of each pose are stored in test_eval/scores and the summary file across all poses is stored in test_eval/scores.parquet.

The predicted pose is compared to the reference system and the following ligand scores are calculated:

  • fraction_reference_ligands_mapped: Fraction of reference ligand chains with corresponding model chains

  • fraction_model_ligands_mapped: Fraction of model ligand chains mapped to corresponding reference chains

  • lddt_pli_ave: average lDDT-PLI across mapped ligands

  • lddt_pli_wave: average lDDT-PLI across mapped ligands weighted by number of atoms

  • lddt_pli_amd_ave: average lDDT-PLI including penalising added model contacts across mapped ligands

  • lddt_pli_wave: average lDDT-PLI including penalising added model contacts across mapped ligands weighted by number of atoms

  • scrmsd_ave: average symmetry-corrected RMSD across mapped ligands

  • scrmsd_wave: average symmetry-corrected RMSD across mapped ligands weighted by number of atoms

If score_protein is set to True, the protein in receptor_file is compared to the system receptor file and the following scores are calculated:

  • fraction_reference_proteins_mapped: Fraction of reference protein chains with corresponding model chains

  • fraction_model_proteins_mapped: Fraction of model protein chains mapped to corresponding reference chains

  • lddt: all atom lDDT

  • bb_lddt: CA lDDT

  • per_chain_lddt_ave: average all atom lDDT across all mapped chains

  • per_chain_lddt_wave: average all atom lDDT across all mapped chains weighted by chain length

  • per_chain_bb_lddt_ave: average CA lDDT across all mapped chains

  • per_chain_bb_lddt_wave: average CA lDDT across all mapped chains weighted by chain length

For oligomeric complexes:

  • qs_global - Global QS score

  • qs_best - Global QS-score - only computed on aligned residues

  • dockq_ave - Average of DockQ scores

  • dockq_wave - Same as dockq_ave, weighted by native contacts

If score_posebusters is True, all posebusters checks are saved.

You can inspect the results at test_eval/scores.parquet

>>> import pandas as pd
>>> df = pd.read_parquet("test_eval/scores.parquet")
>>> df.T
                                                   0                      1
model                              1a3b__1__1.B__1.D  1ai5__1__1.A_1.B__1.D
reference                          1a3b__1__1.B__1.D  1ai5__1__1.A_1.B__1.D
num_reference_ligands                              1                      1
num_model_ligands                                  1                      1
num_reference_proteins                             1                      2
num_model_proteins                                 1                      2
fraction_reference_ligands_mapped                1.0                    1.0
fraction_model_ligands_mapped                    1.0                    1.0
lddt_pli_ave                                0.889506               0.557841
lddt_pli_wave                               0.889506               0.557841
lddt_pli_amd_ave                             0.85815               0.510695
lddt_pli_amd_wave                            0.85815               0.510695
scrmsd_ave                                  1.617184               3.665143
scrmsd_wave                                 1.617184               3.665143
rank                                               1                      1

Write test stratification data#

(This command will not need to be run by a user, the test_set.parquet and val_set.parquet file will be provided with the split release)

plinder_stratify --split_file split.csv --data_dir PLINDER_DATA_DIR --output_dir test_data

Makes test_data/test_set.parquet which

  • Labels the maximum similarity of each test system to the training set across all the similarity metrics

  • Stratifies the test set based on training set similarity into novel_pocket_pli, novel_ligand_pli, novel_protein, novel_ligand, novel_all and not_novel

  • Labels test systems with high quality.

To inspect the result of the run, do:

>>> import pandas as pd
>>> df = pd.read_parquet("test_eval/test_set.parquet")
>>> df.T
                                                  0                      1
system_id                         1a3b__1__1.B__1.D  1ai5__1__1.A_1.B__1.D
pli_qcov                                        0.0                    0.0
protein_seqsim_qcov_weighted_sum                0.0                    0.0
protein_seqsim_weighted_sum                     0.0                    0.0
protein_fident_qcov_weighted_sum                0.0                    0.0
protein_fident_weighted_sum                     0.0                    0.0
protein_lddt_qcov_weighted_sum                  0.0                    0.0
protein_lddt_weighted_sum                       0.0                    0.0
protein_qcov_weighted_sum                       0.0                    0.0
pocket_fident_qcov                              0.0                    0.0
pocket_fident                                   0.0                    0.0
pocket_lddt_qcov                                0.0                    0.0
pocket_lddt                                     0.0                    0.0
pocket_qcov                                     0.0                    0.0
tanimoto_similarity_max                         0.0                    0.0
passes_quality                                False                  False
novel_pocket_pli                               True                   True
novel_pocket_ligand                            True                   True
novel_protein                                  True                   True
novel_all                                      True                   True
not_novel                                     False                  False
>>>