{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Access system files with PlinderSystem"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We provide files for all holo PLINDER systems with <6 protein and <6 ligand chains. These can be accessed with the `PlinderSystem` object which also does the work of downloading and extracting only the relevant files if they haven't been downloaded yet. All system files will be extracted into `~/.local/share/plinder/${PLINDER_RELEASE}/${PLINDER_ITERATION}/systems`. The current default is `PLINDER_RELEASE=2024-06` and `PLINDER_ITERATION=v2`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from plinder.core import PlinderSystem\n",
    "\n",
    "plinder_system = PlinderSystem(system_id=\"4agi__1__1.C__1.W\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ligand\n",
    "\n",
    "The ligands are provided in SDF format in `ligand_sdfs`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plinder_system.ligand_sdfs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And the corresponding SMILES strings in `smiles`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plinder_system.smiles"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Receptor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The CIF/PDB files of the receptor are stored in `receptor_cif` and `receptor_pdb` and only contain the protein chains of the system."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plinder_system.receptor_pdb, plinder_system.receptor_cif"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We recommend using the CIF file as PDB is an obsoleted format. However, if you must use the PDB file, an additional consideration is that the chains are renamed to single letters, which you can access with the `chain_mapping` attribute.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plinder_system.chain_mapping"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The FASTA file and sequences of the receptor are stored in `sequences_fasta` and `sequences` respectively. These are the canonical sequences of all protein chains in the system.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plinder_system.sequences_fasta, plinder_system.sequences"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Linked structures\n",
    "\n",
    "Where possible, we've linked plinder systems to associated apo structures from the PDB and predicted structures from AFDB. This was done using the same kind of similarity searches used for system clustering except with strict restrictions on the sequence identity and coverage of linked structures.\n",
    "\n",
    "The `linked_structures` attribute is a pandas DataFrame with information on the links for a system which were both found and additionally scored for conformational difficulty. This additional scoring consists of superposing the found apo or predicted chain to the receptor of the system with global sequence-based alignment, transplanting the ligand to the found structure, and evaluating the resulting protein-ligand complex as though it were a predicted structure for the given system. So, the `linked_structures` DataFrame contains the similarity scores from the alignments as well as the metrics from the evaluation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "link_info = plinder_system.linked_structures"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "link_info[\n",
    "    [\n",
    "        \"id\",\n",
    "        \"pocket_fident\",\n",
    "        \"lddt\",\n",
    "        \"bb_lddt\",\n",
    "        \"lddt_lp_ave\",\n",
    "        \"lddt_pli_ave\",\n",
    "        \"bisy_rmsd_ave\",\n",
    "        \"sort_score\",\n",
    "        \"kind\",\n",
    "    ]\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, here we can see that \"4uou_B\"\n",
    "- has 100% identical residues corresponding to the pocket of the system\n",
    "- has a very high lDDT and backbone lDDT scores, indicating that the structure is very similar to the receptor.\n",
    "- has a `sort_score` of 2.4, which is the resolution for an apo structure and the plDDT score for a predicted structure.\n",
    "\n",
    "Indeed the superposition + transplant results show the same story\n",
    "- a global superposition puts the ligand in the right place (seen by the `bisy_rmsd` of the ligand pose),\n",
    "- the distances between the pocket atoms are similar (seen by the `lddt_lp_ave` metric),\n",
    "- and the distances between the ligand and protein atoms are similar (seen by the `lddt_pli_ave` metric).\n",
    "\n",
    "`get_linked_structure` then gives the file path to the found structure\n",
    "\n",
    "```python\n",
    "plinder_system.get_linked_structure(\"apo\", \"4uou_B\")\n",
    "```\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}