SILCS-Covalent -------------- Background ^^^^^^^^^^ The SILCS-Covalent workflow is a comprehensive approach to covalent drug design that combines molecular simulations, fragment-based binding site identification, and machine learning models. This workflow addresses the challenges of designing covalent drugs by providing insights into the binding preferences and reactivity of potential covalent ligands. It utilizes SILCS FragMaps to identify reactive cysteine residues and select optimal warhead groups. Other covalent linkages are also supported. Machine learning models trained on activity data further aid in predicting the efficacy and selectivity of warheads. The workflow provides the efficiency and reliability in identifying potential covalent ligands and optimizing their non-covalent interactions. Additional details on SILCS-Covalent are available in reference :cite:`Yu:2023`. .. thumbnail:: images/covalent_overview.png :width: 100% .. NOTE:: The SILCS-Covalent workflow is slightly different from the :ref:`silcsmc_covalent_docking_mode` workflow. The :ref:`silcsmc_covalent_docking_mode` workflow is designed to run the SILCS-MC Docking protocol with harmonic restraint on the reactive atom of the ligand. The SILCS-Covalent workflow is designed to run the SILCS-MC docking without any restraints on the reactive atom of the ligand first and then run the analysis to identify the ligand conformations with reactive atoms located within a certain distance from the reactive atoms of the residues in the binding site. The SILCS-Covalent workflow allows the reactivity of the target residues and the warheads or ligands to be assessed, whereas the :ref:`silcsmc_covalent_docking_mode` workflow only identifies energetically favored binding poses given that a covalent bond is formed between a target residue and the ligand. SILCS-Covalent Using the SilcsBio CLI ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1. **Launch SILCS-Covalent docking runs:** To set up and run SILCS-Covalent docking from the command line interface, create a directory containing all the ligands to be evaluated. Each ligand can be stored as a separate SDF or Mol2 file. Alternatively, all the ligands can be combined into a single SDF file. With this information, enter the following command to set up and launch SILCS-Covalent docking runs: :: ${SILCSBIODIR}/silcs-covalent/1_run_silcs_covalent prot= *Required parameters:* - Path and name of protein PDB file: :: prot= *Optional parameters:* - Mode of the SILCS-Covalent workflow: :: mode=< warhead or ligand > The mode of the SILCS-Covalent worflow can be set to either ``warhead`` or ``ligand``. The default mode is ``warhead``. * In ``ligand`` mode, the workflow will only run the SILCS-MC Docking protocol on the ligands provided to identify the most favorable binding poses of the ligands in the binding site around the linking residue(s). In the ``ligand`` mode, the user must provide the ligands in SD file or Mol2 format using either the ``ligdir`` or ``sdfile`` option. * In ``warhead`` mode, the workflow will run the SILCS-MC Docking protocol on the warheads provided to identify the most favorable binding poses of the warheads in the binding site around the linking residue. Additionally in ``warhead`` mode, the user may provide additional information on the experimental data of the warheads such as kGSH, the glutathione (GSH) reactivity rate constant. The experimental data can be used to train machine learning models to quantify the effectiveness of various warhead groups for covalent drug design for the given linking residue of your target protein. Currently, this feature is only available for the cysteine (CYS) residues as the linking residue and an ML model trained on the Petri's Warhead Library :cite:`Petri:2020` will be used to predict the activity class for the warheads specific to the linking cysteine residue. Please refer to reference :cite:`Yu:2023` for more details. - Path and name of directory containing ligand SD or Mol2 files: :: ligdir= If the ``ligdir`` option is used, only one molecule per file under ``ligdir`` will be processed for SILCS-MC docking. For an SD/SDF file containing multiple molecules, use ``sdfile=`` instead of the ``ligdir`` option. - Path and name of output directory for the SILCS-Covalent workflow: :: covaldir= - The residue name of the amino acid for covalent link with the ligand: :: linkresname= Currently, only cysteine (CYS), lysine (LYS), serine (SER), and threoine (THR) residues are supported. Asparagine (ASN) residues will be supported in the future. - The residue ID of the bound amino acid for covalent bonding with the ligand: :: linkresid= By default, SILCS-MC docking of each warhead or ligand will be performed in proximity to each residue with residue names specified by ``linkresname`` or each cysteine of the target protein if the ``linkresname`` option is not used. If the ``linkresid`` option is specified, then the SILCS-MC docking will only be performed in proximity to the residue with the residue ID specified by ``linkresid`` will be targeted. - The residue ID(s) of amino acids disqualified from covalent bonding with the ligand: :: skipresid= Certain residues may be ommitted from consideration in the SILCS-MC docking by specifying their residue IDs with the ``skipresid`` option. This option can be useful in omitting docking calculations near cysteine residues involved in disulfide bonds. - Path and name of directory containing FragMaps: :: mapsdir= The supplied ``mapsdir`` should contain the protein side chain probability maps. The probability maps for each type of residue are generated by default during the ``silcs/2b_gen_maps`` and ``silcs/2c_fragmap`` steps in the latest version of the SilcsBio software. Probability maps for a specific residue ID can be generated by re-running the ``silcs/2b_gen_maps`` and ``silcs/2c_fragmap`` steps with the ``residmap`` option. Please refer to :ref:`SILCS_simulation_using_cli` for more details. - Use GPU for the SILCS-MC Docking protocol: :: gpu= When ``gpu=false`` and if the ``bundle`` option is set to true, the workflow will run the SILCS-MC Docking protocol in parallel using multiple processors. The number of processors can be specified using the ``nproc`` option. The default value for ``nproc`` is the number of processors available on the system. When ``gpu=true``, the ``bundle=true`` option works differently. The ``bundle`` option will bundle ``nproc`` number of SILCS-MC Docking runs into a single job. The default value for ``nproc`` is the number of processors available on the system. :: bundle= nproc=<# of processor used when bundle=true> - Translation distance for defining sampling spheres in six directions around the reactive atom of the linking residue in Å: :: translation= - Path and name of the custom parameters file for the SILCS-MC Docking protocol: :: paramsfile= By default, the SILCS-MC Docking protocol in the SILCS-Covalent workflow uses the parameters file located in ``${SILCSBIODIR}/templates/silcs-covalent/params-.tmpl``. 2. **Evaluate docked poses:** After the SILCS-Covalent docking runs are completed, the docked poses are evaluated to identify the most favorable binding poses of the ligands in the binding site around the linking residue. To evaluate the docked poses, enter the following command: :: ${SILCSBIODIR}/silcs-covalent/2_calc_lgfe_probres prot= covaldir= The above command will evaluate the poses based on three criteria: - ``eq`` : Best pose when sorted by LGFE score - ``rx`` : Best pose when sorted by ProbRes score with LGFE < 0 kcal/mol - ``rxeq`` : Best pose when sorted by LGFE score with a ProbRes score greater than the ProbRes cutoff value. This pose will have both a favorable LGFE score and potential for covalent bond formation. *Required parameters:* - Path and name of protein PDB file: :: prot= - Path and name of output directory for the SILCS-Covalent workflow: :: covaldir= *Optional parameters:* - Path and name of the file containing the list of warhead/ligand name and reactive atom IDs: :: whidfile= The file specified by the ``whidfile`` option should contain two columns. In the first column, the warhead or ligand names are listed. In the second column, the reactive atom IDs of the warheads or ligands are listed: Each line should have two columns: `` `` By default, the ``whidfile`` for Petri's Warhead Library is used. The Petri's Warhead Library ``whidfile`` can be found in ``${SILCSBIODIR}/data/databases/petri_warhead.whid``. - Distance cutoff for covalent bond definition in Å: :: dcutoff= The reactive atoms of the ligand and linking residue for a given docked pose are considered to be sufficiently close to form a covalent bond if the distance between the two atoms is within the cutoff distance defined by the ``dcutoff`` option. - ProbRes cutoff for filtereing poses unlikely to form covalent bonds: :: prcutoff= The probability of the reactive atoms of the ligand and linking residue for a given docked pose to be within the cutoff distance defined by ``dcutoff`` is represented by ProbRes. Poses that do not have ProbRes larger than the cutoff defined by the ``prcutoff`` are filtered from further analysis. - Number of output poses: :: npose= By default, only three the best scoring, based on LGFE (``eq``), ProbRes (``rx``), and PRCutoff (``rxeq``), docked poses will be output. To output additional docked poses, use the ``npose`` option and specify the desired number of docked poses. E.g., if 5 poses are desired, using ``npose=5`` will result in the 5 best scoring poses being output for each warhead or ligand and each basis. - Option to output docked poses in PDB format: :: pdb= When ``pdb=true`` the docked poses will be output in PDB format in addition to the default SD file format. - Option to calculate 4DBA descriptors, and 4DBA and LGFE-4DBA scores: :: fourdba= When ``fourdba=true`` the four 4DBA descriptors of each input ligand will be calculated and output into ``lgfe-4dba.csv``. A high 4DBA score indicates that the ligand is bioavailable while a low 4DBA score indicates that the ligand is not bioavailable. For more information of 4DBA and LGFE-4DBA scores, please see :ref:`4dba_calculation`. The ``rdkit``, ``tqdm``, ``ipython`` and ``scipy`` packages must be installed for the 4DBA score calculation. Please refer to :ref:`python3_requirement` for more information. - Submit job to queuing system: :: batch= - Path to the Python executable: :: python= The default python path can be checked using the ``which python`` command. Assessment of SILCS-Covalent Results in CLI ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``silcs-covalent/2_calc_lgfe_probres`` command will generate a number of output files that can be used to identify the residues most ammenable to covalent bonding, identify the optimal warheads or ligands for a target protein or specific residues within a target protein, or optimize covalent inhibitors. In both ``warhead`` and ``ligand`` modes, three poses in SD file format are output into ``//minconfpdb/`` for each warhead or ligand, where ```` is the linking residue in proximity to which the warhead or ligand was docked. The three poses correspond to the most favorable poses based on LGFE (``eq``), ProbRes (``rx``), and PRCutoff (``rxeq``): - ``.eq.sdf`` : Pose with the most favorable energy (lowest LGFE). - ``.rx.sdf`` : Pose with the highest probability of ligand and linking residue reactive atoms being in proximity (highest ProbRes) with favorable energy (LGFE < 0 kcal/mol). - ``.rxeq.sdf`` : Pose with the most favorable energy (lowest LGFE) with the probability of the ligand and linking residue reactive atoms being in proximity greater than a cutoff value (ProbRes > 0.025 by default). This pose will have both a favorable LGFE score and potential for forming a covalent bond with the reactive residue. In addtion, for each reactive residue in proximity to which the warheads or ligands were docked, a summary of the LGFE and ProbRes of each of the three warhead or ligand poses (``eq``, ``rx``, ``rxeq``) will be output in ``/..csv``. Below are the contents of each column of ``/..csv``: - In the first column, the warhead or ligand names are listed. - In the second, third, and fourth columns, the LGFEs of the most favorable poses of the corresponding warhead or ligand in terms of LGFE (``eq``), ProbRes (``rx``), and PRCutoff (``rxeq``), respectively, are listed. - In the fifth, sixth, and seventh columns, the ProbRes values of the most favorable poses of the corresponding warhead or ligand in terms of LGFE (``eq``), ProbRes (``rx``), and PRCutoff (``rxeq``), respectively, are listed. In ``warhead`` mode with cysteine residues specified as the linking residue and the Petri's Warhead Library :cite:`Petri:2020` specified as the warheads, in addition to the output results described above, predicted kGSH, the GSH reactivity rate constant, and activity classifications are output in ``.rf-ml.predicted.csv``, where ```` is the linking cysteine in proximity to which the warheads were docked. This machine learning based prediction of reactivity is not currently available for other residues beyond the default cysteine or other warhead libraries. If you wish to apply the machine learning algorithm to other residues and/or warhead libraries, please contact us at support@silcsbio.com. Below are the contents of each column of ``.rf-ml.predicted.csv``: - In the first column, the warhead names are listed. - In the second and third columns, the LGFEs of the most favorable poses of the corresponding warhead in terms of LGFE (``eq``) and ProbRes (``rx``), respectively, are listed. - In the fourth column the ProbRes value of the most favorable pose of the corresponding warhead in terms of ProbRes (``rx``) is listed. - In the fifth column, the predicted kGSH of the corresponding warhead for the linking cysteine is listed. - In the sixth column, the predicted activity class of the corresponding warhead is listed, where ``L`` indicates low activity and ``H`` indicates high activity.