SILCS-Covalent

Background

The SILCS-Covalent workflow is a comprehensive approach to covalent drug design that combines molecular simulations, fragment-based binding site identification, and machine learning models. This workflow addresses the challenges of designing covalent drugs by providing insights into the binding preferences and reactivity of potential covalent ligands. It utilizes SILCS FragMaps to identify reactive cysteine residues and select optimal warhead groups. Other covalent linkages are also supported. Machine learning models trained on activity data further aid in predicting the efficacy and selectivity of warheads. The workflow provides the efficiency and reliability in identifying potential covalent ligands and optimizing their non-covalent interactions. Additional details on SILCS-Covalent are available in reference [21].

Note

The SILCS-Covalent workflow is slightly different from the SILCS-MC Docking in Restrained Mode workflow. The SILCS-MC Docking in Restrained Mode workflow is designed to run the SILCS-MC Docking protocol with harmonic restraint on selected atoms of the ligand. The SILCS-Covalent workflow is designed to run the SILCS-MC docking without any restraints on the reactive atom of the ligand first and then run the analysis to identify the ligand conformations with reactive atoms located within a certain distance from the reactive atoms of the residues in the binding site. The SILCS-Covalent workflow allows the reactivity of the target residues and the warheads or ligands to be assessed, whereas the SILCS-MC Docking in Restrained Mode workflow only identifies energetically favored binding poses given that a covalent bond is formed between a target residue and the ligand.

SILCS-Covalent Using the SilcsBio CLI

  1. Launch SILCS-Covalent docking runs:

    To set up and run SILCS-Covalent docking from the command line interface, create a directory containing all the ligands to be evaluated. Each ligand can be stored as a separate SDF or Mol2 file. Alternatively, all the ligands can be combined into a single SDF file. With this information, enter the following command to set up and launch SILCS-Covalent docking runs:

${SILCSBIODIR}/silcs-covalent/1_run_silcs_covalent prot=<prot pdb>

Required parameters:

  • Path and name of protein PDB file:

    prot=<protein pdb file>
    

Optional parameters:

  • Mode of the SILCS-Covalent workflow:

    mode=< warhead or ligand >
    

    The mode of the SILCS-Covalent worflow can be set to either warhead or ligand. The default mode is warhead.

    • In ligand mode, the workflow will only run the SILCS-MC Docking protocol on the ligands provided to identify the most favorable binding poses of the ligands in the binding site around the linking residue(s). In the ligand mode, the user must provide the ligands in SD file or Mol2 format using either the ligdir or sdfile option.

    • In warhead mode, the workflow will run the SILCS-MC Docking protocol on the warheads provided to identify the most favorable binding poses of the warheads in the binding site around the linking residue.

      Additionally in warhead mode, the user may provide additional information on the experimental data of the warheads such as kGSH, the glutathione (GSH) reactivity rate constant. The experimental data can be used to train machine learning models to quantify the effectiveness of various warhead groups for covalent drug design for the given linking residue of your target protein. Currently, this feature is only available for the cysteine (CYS) residues as the linking residue and an ML model trained on the Petri’s Warhead Library [22] will be used to predict the activity class for the warheads specific to the linking cysteine residue.

    Please refer to reference [21] for more details.

  • Path and name of directory containing ligand SD or Mol2 files:

    ligdir=<ligand directory>
    

    If the ligdir option is used, only one molecule per file under ligdir will be processed for SILCS-MC docking. For an SD/SDF file containing multiple molecules, use sdfile=<path to sdfile> instead of the ligdir option.

  • Path and name of output directory for the SILCS-Covalent workflow:

    covaldir=<output directory default=4_covalent_warhead>
    
  • The residue name of the amino acid for covalent link with the ligand:

    linkresname=<residue name for covalent link; CYS/LYS/SER/THR; default=CYS>
    

    Currently, only cysteine (CYS), lysine (LYS), serine (SER), and threoine (THR) residues are supported. Asparagine (ASN) residues will be supported in the future.

  • The residue ID of the bound amino acid for covalent bonding with the ligand:

    linkresid=<residue ID for covalent link; default=NULL>
    

    By default, SILCS-MC docking of each warhead or ligand will be performed in proximity to each residue with residue names specified by linkresname or each cysteine of the target protein if the linkresname option is not used. If the linkresid option is specified, then the SILCS-MC docking will only be performed in proximity to the residue with the residue ID specified by linkresid will be targeted.

  • The residue ID(s) of amino acids disqualified from covalent bonding with the ligand:

    skipresid=<residue IDs to skip; e.g. "145,231"; default=NULL>
    

    Certain residues may be ommitted from consideration in the SILCS-MC docking by specifying their residue IDs with the skipresid option. This option can be useful in omitting docking calculations near cysteine residues involved in disulfide bonds.

  • Path and name of directory containing FragMaps:

    mapsdir=<location and name of directory containing FragMaps; default=maps>
    

    The supplied mapsdir should contain the protein side chain probability maps. The probability maps for each type of residue are generated by default during the silcs/2b_gen_maps and silcs/2c_fragmap steps in the latest version of the SilcsBio software. Probability maps for a specific residue ID can be generated by re-running the silcs/2b_gen_maps and silcs/2c_fragmap steps with the residmap option. Please refer to SILCS Simulations Using the CLI for more details.

  • Use GPU for the SILCS-MC Docking protocol:

    gpu=<true/false; default=true>
    

    When gpu=false and if the bundle option is set to true, the workflow will run the SILCS-MC Docking protocol in parallel using multiple processors. The number of processors can be specified using the nproc option. The default value for nproc is the number of processors available on the system.

    When gpu=true, the bundle=true option works differently. The bundle option will bundle nproc number of SILCS-MC Docking runs into a single job. The default value for nproc is the number of processors available on the system.

    bundle=<true/false; default=false>
    nproc=<# of processor used when bundle=true>
    
  • Translation distance for defining sampling spheres in six directions around the reactive atom of the linking residue in Å:

    translation=<translation distance for defining sampling spheres; default=3.5>
    
  • Path and name of the custom parameters file for the SILCS-MC Docking protocol:

    paramsfile=<custom params file>
    

    By default, the SILCS-MC Docking protocol in the SILCS-Covalent workflow uses the parameters file located in ${SILCSBIODIR}/templates/silcs-covalent/params-<warhead/ligand>.tmpl.

  1. Evaluate docked poses:

After the SILCS-Covalent docking runs are completed, the docked poses are evaluated to identify the most favorable binding poses of the ligands in the binding site around the linking residue.

To evaluate the docked poses, enter the following command:

${SILCSBIODIR}/silcs-covalent/2_calc_lgfe_probres prot=<prot pdb> covaldir=<name of output directory>

The above command will evaluate the poses based on three criteria:

  • eq : Best pose when sorted by LGFE score

  • rx : Best pose when sorted by ProbRes score with LGFE < 0 kcal/mol

  • rxeq : Best pose when sorted by LGFE score with a ProbRes score greater than the ProbRes cutoff value. This pose will have both a favorable LGFE score and potential for covalent bond formation.

Required parameters:

  • Path and name of protein PDB file:

    prot=<protein pdb file>
    
  • Path and name of output directory for the SILCS-Covalent workflow:

    covaldir=<output directory default=4_covalent_warhead>
    

Optional parameters:

  • Path and name of the file containing the list of warhead/ligand name and reactive atom IDs:

    whidfile=<file with id list for warhead/ligand reactive atom; default="Petri's Warhead Lib">
    

    The file specified by the whidfile option should contain two columns. In the first column, the warhead or ligand names are listed. In the second column, the reactive atom IDs of the warheads or ligands are listed:

    Each line should have two columns: <warhead_name/ligand_name> <reactive_atom ID>

    By default, the whidfile for Petri’s Warhead Library is used. The Petri’s Warhead Library whidfile can be found in ${SILCSBIODIR}/data/databases/petri_warhead.whid.

  • Distance cutoff for covalent bond definition in Å:

    dcutoff=<distance cutoff for covalent bond definition; default=2.0>
    

    The reactive atoms of the ligand and linking residue for a given docked pose are considered to be sufficiently close to form a covalent bond if the distance between the two atoms is within the cutoff distance defined by the dcutoff option.

  • ProbRes cutoff for filtereing poses unlikely to form covalent bonds:

    prcutoff=<cutoff value for ProbRes; default=0.025>
    

    The probability of the reactive atoms of the ligand and linking residue for a given docked pose to be within the cutoff distance defined by dcutoff is represented by ProbRes. Poses that do not have ProbRes larger than the cutoff defined by the prcutoff are filtered from further analysis.

  • Number of output poses:

    npose=<number of best scoring poses; default=1>
    

    By default, only three the best scoring, based on LGFE (eq), ProbRes (rx), and PRCutoff (rxeq), docked poses will be output. To output additional docked poses, use the npose option and specify the desired number of docked poses. E.g., if 5 poses are desired, using npose=5 will result in the 5 best scoring poses being output for each warhead or ligand and each basis.

  • Option to output docked poses in PDB format:

    pdb=<true/false; default=false>
    

    When pdb=true the docked poses will be output in PDB format in addition to the default SD file format.

  • Option to calculate 4DBA descriptors, and 4DBA and LGFE-4DBA scores:

    fourdba=<true/false; default=false>
    

    When fourdba=true the four 4DBA descriptors of each input ligand will be calculated and output into lgfe-4dba.csv. A high 4DBA score indicates that the ligand is bioavailable while a low 4DBA score indicates that the ligand is not bioavailable. For more information of 4DBA and LGFE-4DBA scores, please see 4D Bioavailability (4DBA) Calculation. The rdkit, tqdm, ipython and scipy packages must be installed for the 4DBA score calculation. Please refer to Python 3 Requirement for more information.

  • Submit job to queuing system:

    batch=<submit job to queuing system; true/false; default=false>
    
  • Path to the Python executable:

    python=<path to python executable; default=${python}>
    

    The default python path can be checked using the which python command.

Assessment of SILCS-Covalent Results in CLI

The silcs-covalent/2_calc_lgfe_probres command will generate a number of output files that can be used to identify the residues most ammenable to covalent bonding, identify the optimal warheads or ligands for a target protein or specific residues within a target protein, or optimize covalent inhibitors.

In both warhead and ligand modes, three poses in SD file format are output into <covaldir>/<reactive residue>/minconfpdb/ for each warhead or ligand, where <reactive residue> is the linking residue in proximity to which the warhead or ligand was docked. The three poses correspond to the most favorable poses based on LGFE (eq), ProbRes (rx), and PRCutoff (rxeq):

  • <ligand|warhead>.eq.sdf : Pose with the most favorable energy (lowest LGFE).

  • <ligand|warhead>.rx.sdf : Pose with the highest probability of ligand and linking residue reactive atoms being in proximity (highest ProbRes) with favorable energy (LGFE < 0 kcal/mol).

  • <ligand|warhead>.rxeq.sdf : Pose with the most favorable energy (lowest LGFE) with the probability of the ligand and linking residue reactive atoms being in proximity greater than a cutoff value (ProbRes > 0.025 by default). This pose will have both a favorable LGFE score and potential for forming a covalent bond with the reactive residue.

In addtion, for each reactive residue in proximity to which the warheads or ligands were docked, a summary of the LGFE and ProbRes of each of the three warhead or ligand poses (eq, rx, rxeq) will be output in <covaldir>/<reactive residue>.<ligand|warhead>.csv. Below are the contents of each column of <covaldir>/<reactive residue>.<ligand|warhead>.csv:

  • In the first column, the warhead or ligand names are listed.

  • In the second, third, and fourth columns, the LGFEs of the most favorable poses of the corresponding warhead or ligand in terms of LGFE (eq), ProbRes (rx), and PRCutoff (rxeq), respectively, are listed.

  • In the fifth, sixth, and seventh columns, the ProbRes values of the most favorable poses of the corresponding warhead or ligand in terms of LGFE (eq), ProbRes (rx), and PRCutoff (rxeq), respectively, are listed.

In warhead mode with cysteine residues specified as the linking residue and the Petri’s Warhead Library [22] specified as the warheads, in addition to the output results described above, predicted kGSH, the GSH reactivity rate constant, and activity classifications are output in <reactive cysteine>.rf-ml.predicted.csv, where <reactive cysteine> is the linking cysteine in proximity to which the warheads were docked. This machine learning based prediction of reactivity is not currently available for other residues beyond the default cysteine or other warhead libraries. If you wish to apply the machine learning algorithm to other residues and/or warhead libraries, please contact us at support@silcsbio.com. Below are the contents of each column of <reactive cysteine>.rf-ml.predicted.csv:

  • In the first column, the warhead names are listed.

  • In the second and third columns, the LGFEs of the most favorable poses of the corresponding warhead in terms of LGFE (eq) and ProbRes (rx), respectively, are listed.

  • In the fourth column the ProbRes value of the most favorable pose of the corresponding warhead in terms of ProbRes (rx) is listed.

  • In the fifth column, the predicted kGSH of the corresponding warhead for the linking cysteine is listed.

  • In the sixth column, the predicted activity class of the corresponding warhead is listed, where L indicates low activity and H indicates high activity.