Access system files with PlinderSystem#
We provide files for all holo PLINDER systems with <6 protein and <6 ligand chains. These can be accessed with the PlinderSystem
object which also does the work of downloading and extracting only the relevant files if they haven’t been downloaded yet. All system files will be extracted into ~/.local/share/plinder/${PLINDER_RELEASE}/${PLINDER_ITERATION}/systems
. The current default is PLINDER_RELEASE=2024-06
and PLINDER_ITERATION=v2
.
from plinder.core import PlinderSystem
plinder_system = PlinderSystem(system_id="4agi__1__1.C__1.W")
Ligand#
The ligands are provided in SDF format in ligand_sdfs
.
plinder_system.ligand_sdfs
{'1.W': '/home/runner/.local/share/plinder/2024-06/v2/systems/4agi__1__1.C__1.W/ligand_files/1.W.sdf'}
And the corresponding SMILES strings in smiles
.
plinder_system.smiles
{'1.W': 'C[Se][C@@H]1O[C@@H](C)[C@@H](O)[C@@H](O)[C@@H]1O'}
Receptor#
The CIF/PDB files of the receptor are stored in receptor_cif
and receptor_pdb
and only contain the protein chains of the system.
plinder_system.receptor_pdb, plinder_system.receptor_cif
('/home/runner/.local/share/plinder/2024-06/v2/systems/4agi__1__1.C__1.W/receptor.pdb',
'/home/runner/.local/share/plinder/2024-06/v2/systems/4agi__1__1.C__1.W/receptor.cif')
We recommend using the CIF file as PDB is an obsoleted format. However, if you must use the PDB file, an additional consideration is that the chains are renamed to single letters, which you can access with the chain_mapping
attribute.
plinder_system.chain_mapping
{'1.C': 'A'}
The FASTA file and sequences of the receptor are stored in sequences_fasta
and sequences
respectively. These are the canonical sequences of all protein chains in the system.
plinder_system.sequences_fasta, plinder_system.sequences
('/home/runner/.local/share/plinder/2024-06/v2/systems/4agi__1__1.C__1.W/sequences.fasta',
{'1.C': 'MSTPGAQQVLFRTGIAAVNSTNHLRVYFQDVYGSIRESLYEGSWANGTEKNVIGNAKLGSPVAATSKELKHIRVYTLTEGNTLQEFAYDSGTGWYNGGLGGAKFQVAPYSXIAAVFLAGTDALQLRIYAQKPDNTIQEYMWNGDGWKEGTNLGGALPGTGIGATSFRYTDYNGPSIRIWFQTDDLKLVQRAYDPHKGWYPDLVTIFDRAPPRTAIAATSFGAGNSSIYMRIYFVNSDNTIWQVCWDHGKGYHDKGTITPVIQGSEVAIISWGSFANNGPDLRLYFQNGTYISAVSEWVWNRAHGSQLGRSALPPA'})
Linked structures#
Where possible, we’ve linked plinder systems to associated apo structures from the PDB and predicted structures from AFDB. This was done using the same kind of similarity searches used for system clustering except with strict restrictions on the sequence identity and coverage of linked structures.
The linked_structures
attribute is a pandas DataFrame with information on the links for a system which were both found and additionally scored for conformational difficulty. This additional scoring consists of superposing the found apo or predicted chain to the receptor of the system with global sequence-based alignment, transplanting the ligand to the found structure, and evaluating the resulting protein-ligand complex as though it were a predicted structure for the given system. So, the linked_structures
DataFrame contains the similarity scores from the alignments as well as the metrics from the evaluation.
link_info = plinder_system.linked_structures
link_info[
[
"id",
"pocket_fident",
"lddt",
"bb_lddt",
"lddt_lp_ave",
"lddt_pli_ave",
"bisy_rmsd_ave",
"sort_score",
"kind",
]
]
id | pocket_fident | lddt | bb_lddt | lddt_lp_ave | lddt_pli_ave | bisy_rmsd_ave | sort_score | kind | |
---|---|---|---|---|---|---|---|---|---|
0 | 4uou_B | 100.0 | 0.972682 | 0.994065 | 0.987813 | 0.989777 | 0.159702 | 2.40 | apo |
1 | 4uou_C | 100.0 | 0.973562 | 0.994687 | 0.967287 | 0.951068 | 0.194233 | 2.40 | apo |
2 | 4uou_D | 100.0 | 0.973604 | 0.994235 | 0.972579 | 0.973048 | 0.101252 | 2.40 | apo |
3 | 4uou_A | 100.0 | 0.967257 | 0.994800 | 0.976908 | 0.963504 | 0.214243 | 2.40 | apo |
4 | Q4WW81_A | 100.0 | 0.982275 | 0.998587 | 0.999679 | 0.997273 | 0.126228 | 98.57 | pred |
For example, here we can see that “4uou_B”
has 100% identical residues corresponding to the pocket of the system
has a very high lDDT and backbone lDDT scores, indicating that the structure is very similar to the receptor.
has a
sort_score
of 2.4, which is the resolution for an apo structure and the plDDT score for a predicted structure.
Indeed the superposition + transplant results show the same story
a global superposition puts the ligand in the right place (seen by the
bisy_rmsd
of the ligand pose),the distances between the pocket atoms are similar (seen by the
lddt_lp_ave
metric),and the distances between the ligand and protein atoms are similar (seen by the
lddt_pli_ave
metric).
get_linked_structure
then gives the file path to the found structure
plinder_system.get_linked_structure("apo", "4uou_B")