Access system files with PlinderSystem#

We provide files for all holo PLINDER systems with <6 protein and <6 ligand chains. These can be accessed with the PlinderSystem object which also does the work of downloading and extracting only the relevant files if they haven’t been downloaded yet. All system files will be extracted into ~/.local/share/plinder/${PLINDER_RELEASE}/${PLINDER_ITERATION}/systems. The current default is PLINDER_RELEASE=2024-06 and PLINDER_ITERATION=v2.

from plinder.core import PlinderSystem

plinder_system = PlinderSystem(system_id="4agi__1__1.C__1.W")

Ligand#

The ligands are provided in SDF format in ligand_sdfs.

plinder_system.ligand_sdfs
{'1.W': '/home/runner/.local/share/plinder/2024-06/v2/systems/4agi__1__1.C__1.W/ligand_files/1.W.sdf'}

And the corresponding SMILES strings in smiles.

plinder_system.smiles
{'1.W': 'C[Se][C@@H]1O[C@@H](C)[C@@H](O)[C@@H](O)[C@@H]1O'}

Receptor#

The CIF/PDB files of the receptor are stored in receptor_cif and receptor_pdb and only contain the protein chains of the system.

plinder_system.receptor_pdb, plinder_system.receptor_cif
('/home/runner/.local/share/plinder/2024-06/v2/systems/4agi__1__1.C__1.W/receptor.pdb',
 '/home/runner/.local/share/plinder/2024-06/v2/systems/4agi__1__1.C__1.W/receptor.cif')

We recommend using the CIF file as PDB is an obsoleted format. However, if you must use the PDB file, an additional consideration is that the chains are renamed to single letters, which you can access with the chain_mapping attribute.

plinder_system.chain_mapping
{'1.C': 'A'}

The FASTA file and sequences of the receptor are stored in sequences_fasta and sequences respectively. These are the canonical sequences of all protein chains in the system.

plinder_system.sequences_fasta, plinder_system.sequences
('/home/runner/.local/share/plinder/2024-06/v2/systems/4agi__1__1.C__1.W/sequences.fasta',
 {'1.C': 'MSTPGAQQVLFRTGIAAVNSTNHLRVYFQDVYGSIRESLYEGSWANGTEKNVIGNAKLGSPVAATSKELKHIRVYTLTEGNTLQEFAYDSGTGWYNGGLGGAKFQVAPYSXIAAVFLAGTDALQLRIYAQKPDNTIQEYMWNGDGWKEGTNLGGALPGTGIGATSFRYTDYNGPSIRIWFQTDDLKLVQRAYDPHKGWYPDLVTIFDRAPPRTAIAATSFGAGNSSIYMRIYFVNSDNTIWQVCWDHGKGYHDKGTITPVIQGSEVAIISWGSFANNGPDLRLYFQNGTYISAVSEWVWNRAHGSQLGRSALPPA'})

Linked structures#

Where possible, we’ve linked plinder systems to associated apo structures from the PDB and predicted structures from AFDB. This was done using the same kind of similarity searches used for system clustering except with strict restrictions on the sequence identity and coverage of linked structures.

The linked_structures attribute is a pandas DataFrame with information on the links for a system which were both found and additionally scored for conformational difficulty. This additional scoring consists of superposing the found apo or predicted chain to the receptor of the system with global sequence-based alignment, transplanting the ligand to the found structure, and evaluating the resulting protein-ligand complex as though it were a predicted structure for the given system. So, the linked_structures DataFrame contains the similarity scores from the alignments as well as the metrics from the evaluation.

link_info = plinder_system.linked_structures
link_info[
    [
        "id",
        "pocket_fident",
        "lddt",
        "bb_lddt",
        "lddt_lp_ave",
        "lddt_pli_ave",
        "bisy_rmsd_ave",
        "sort_score",
        "kind",
    ]
]
id pocket_fident lddt bb_lddt lddt_lp_ave lddt_pli_ave bisy_rmsd_ave sort_score kind
0 4uou_B 100.0 0.972682 0.994065 0.987813 0.989777 0.159702 2.40 apo
1 4uou_C 100.0 0.973562 0.994687 0.967287 0.951068 0.194233 2.40 apo
2 4uou_D 100.0 0.973604 0.994235 0.972579 0.973048 0.101252 2.40 apo
3 4uou_A 100.0 0.967257 0.994800 0.976908 0.963504 0.214243 2.40 apo
4 Q4WW81_A 100.0 0.982275 0.998587 0.999679 0.997273 0.126228 98.57 pred

For example, here we can see that “4uou_B”

  • has 100% identical residues corresponding to the pocket of the system

  • has a very high lDDT and backbone lDDT scores, indicating that the structure is very similar to the receptor.

  • has a sort_score of 2.4, which is the resolution for an apo structure and the plDDT score for a predicted structure.

Indeed the superposition + transplant results show the same story

  • a global superposition puts the ligand in the right place (seen by the bisy_rmsd of the ligand pose),

  • the distances between the pocket atoms are similar (seen by the lddt_lp_ave metric),

  • and the distances between the ligand and protein atoms are similar (seen by the lddt_pli_ave metric).

get_linked_structure then gives the file path to the found structure

plinder_system.get_linked_structure("apo", "4uou_B")