SILCS: Site Identification by Ligand Competitive Saturation

Background

The design of small molecules that bind with optimal specificity and affinity to their biological targets, typically proteins, is based on the idea of complementarity between the functional groups in a small molecule and the binding site of the target. Traditional approaches to ligand identification and optimization often employ the one binding site/one ligand approach. While such an approach might be straightforward to implement, it is limited by the resource intensive nature of screening and evaluating the affinity of large numbers of diverse molecules.

Functional group mapping approaches have emerged as an alternative, in which a series of maps for different classes of functional groups encompass the target surface to define the binding requirements of the target. Using these maps, medicinal chemists can focus their efforts on designing small molecules that best match the maps.

Site Identification by Ligand Competitive Saturation (SILCS) offers rigorous free energy evaluation of functional group affinity pattern for the entire 3D space in and around a protein [3]. The SILCS method yields functional group free energy maps, or FragMaps, which are precomputed and rapidly used in a variety of ways to facilitate ligand design. FragMaps are generated by molecular dynamics (MD) simulations that include protein flexibility and explicit solvent/solute representation, thus providing an accurate, detailed, and comprehensive collection of information that can be used qualitatively in database screening, fragment-based drug design and lead optimization.

In addition, in the context of biological therapeutics, the comprehensive nature of the FragMaps is of utility for excipient design as all possible binding sites of all possible excipients and buffers can be identified and quantified.

../_images/intro_silcs_fragmap.png

Fig. 1 Illustration of fragment density maps.

Fig. 1 illustrates FragMaps generation. Two MD simulations of a probe molecule (e.g., benzene) are performed in the presence of a protein and in aqueous solution (without protein), resulting in two occupancy maps, \(O^\mathrm{target}\) and \(O^\mathrm{bulk}\), respectively. The GFE (grid free energy) FragMaps are then derived by the following formula

\[\mathrm{GFE}^T_{x,y,z} = -RT \log \frac{O^\mathrm{target}_{x,y,z}}{\left< O^\mathrm{bulk} \right>}\]

By generating FragMaps using various small solutes representing different functional groups including benzene, propane, methanol, imidazole, formamide, acetaldehyde, methylammonium, acetate, and water, it is possible to map the functional group affinity pattern of a protein. FragMaps encompass the entire protein such that all FragMap types are in all regions. Thus, information on the affinity of all the different types of functional groups is available in and around the full 3D space of the protein or other target molecule.

FragMaps may be used in a qualitative fashion to facilitate ligand design by allowing the medicinal chemist to readily visualize regions where the ligand can be modified or functional groups added to improve affinity and specificity. As the FragMaps include protein flexibility, they indicate regions of the target protein that can “open” thereby identifying regions under the protein surface accessible for ligand binding. In addition to FragMaps, an “exclusion map” is generated based on regions of the system that are not sampled by water or probe molecules during the MD simulations. The exclusion map represents a strictly inaccessible surface.

Many proteins contain binding sites that are partially or totally inaccessible to the surrounding solvent environment that may require partial unfolding of the protein for ligand binding to occur. SILCS sampling of such deep or inaccessible pockets is facilitated by the use of a Grand Canonical Monte Carlo (GCMC) sampling technique in conjunction with MD simulations [4]. Thus, the SILCS technology is especially well-suited for targeting the deep and inaccessible binding sites found in targets like GPCRs and nuclear receptors.

Once a set of ligand-independent SILCS FragMaps are produced they may be rapidly used for various purposes that quantitatively rank ligand binding in a highly computationally efficient manner for multiple ligands WITHOUT recalculating the FragMaps. Applications of SILCS methodology include:

  • Binding site identification

    • Binding pocket searching with known ligands

    • Binding pocket identification via pharmacophore generation

    • Binding pocket identification via fragment screening

  • Database screening

    • SILCS 3D pharmacophore models

    • Ligand posing using available techniques (Catalyst, etc)

    • Ligand ranking

  • Fragment-based ligand design

    • Identification of fragment binding sites

    • Estimation of ligand affinity following fragment linking

    • Expansion of fragment types

  • Ligand optimization

    • Qualitative ligand optimization by FragMaps visualization

    • Quantitative evaluation of ligand atom contributions to binding

    • Quantitative estimation of relative ligand affinities

    • Quantitative estimation of large numbers of ligand chemical transformations

SILCS generates 3D maps of interaction patterns of functional group with your target molecule, called FragMaps. This unique approach simultaneously uses multiple small solutes with various functional groups in explicit solvent MD simulations that include target flexibility to yield 3D FragMaps that represent the delicate balance of target-functional group interactions, target desolvation, functional group desolvation and target flexibility. The FragMaps may then be used to both qualitatively direct ligand design and quantitatively to rapidly evaluate the relative affinities of large numbers of ligands following the computationally demanding precomputation of the FragMaps.

This chapter goes over the workflow of generating and visualizing FragMaps.

Running SILCS simulations from the SilcsBio GUI

FragMap generation using SILCS begins with a well-curated protein structure file (without ligand) in PDB file format. We recommend keeping only those protein chains that are necessary for the simulation, removing all unnecessary ligands, renaming non-standard residues, filling in missing atomic positions, and, if desired, modeling in missing loops. Please inquire if assistance with protein preparation is required.

SILCS simulation setup

Select New SILCS project from the landing page. This will take you to the new project setup page.

../_images/silcs-gui-new-project_marked_border.png

Enter a project name, select a remote server, and select a project location on the remote server. The project location is where the SILCS simulation will be performed. A typical SILCS simulation can produce output totaling in excess of 100 GB, so please select a folder with appropriate free space.

Next, select a protein PDB file. We recommend clean up PDB before using in SILCS simulation, such as only keeping the chains that are necessary for the simulation, removing any unnecessary ligands, renaming non-standard residues, and filling in the missing atomic positions.

If the interface detects missing heavy atoms, non-standard residue name, or non-contiguous residue numbering, it will inform the user and provide a button labeled as “Fix?”. If this button is clicked, a new PDB file with “_fixed” suffix will be created and used in the SILCS simulation.

Once all the information is entered correctly, press the “Setup” button at the bottom of the page. The interface will contact the remote server and perform the SILCS GCMC/MD setup process.

../_images/silcs-gui-setup_marked_border.png

This step automatically builds the topology of the simulation system, creating metal-protein bonds if metal ions are present, rotates side chain orientations to enhance sampling, and places probe molecules around the protein. This process may take 5-10 minutes depending on system size.

Note

The default setup should provide optimal setting for SILCS FragMap simulation. But, if you wish to change the number of simulations to run or run simulations with different number of CPUs, you can adjust this in the advanced setting option.

After confirming the setup is finished, please make sure all information entered is correct and the remote project location has ample storage space. SILCS GCMC/MD simulation can now be started by clicking the “Run SILCS Simulation” button.

../_images/silcs-gui-simulation_marked_border.png

This will submit SILCS jobs to the queue system. The progress of the jobs will be displayed on the same page. The status of the job is also shown next to the progress bar; “Q” in the queue, “R” for running, and “E” for finished.

Warning

If a simulation stops prematurely, “F” will display next to the progress bar and the restart button will show. When the job is restarted, the simulation will continue from the last cycle, but the progress bar may reset. This is because job recovery iterates over existing cycles to check if output was correctly produced. The completed cycles will be automatically skipped and the progress will update shortly to where the job had stopped.

FragMap generation

Once the SILCS jobs are finished, as indicated by all progress bars achieving 100%, the GUI can be used to create and visualize FragMaps. Click the “Prepare FragMap” button to proceed.

../_images/silcs-gui-prepare-fragmap_marked_border.png

If you wish to compare FragMaps from two different protein structures, then pre-align the input structures and use the aligned structure as the Reference PDB File to create FragMaps having the desired orientation.

../_images/silcs-gui-generate-fragmap_marked_border.png

Processing the GCMC-MD trajectories will take 10-20 minutes, with the GUI providing updates on progress from the server during the process. Once completed, the GUI will automatically copy the files from the server to the local computer and load them for visualization.

../_images/silcs-gui-visualize_border.png

Running SILCS simulations from the command line interface

FragMap generation using SILCS begins with a well-curated protein structure file (without ligand) in PDB file format. We recommend keeping only those protein chains that are necessary for the simulation, removing all unnecessary ligands, renaming non-standard residues, filling in missing atomic positions, and, if desired, modeling in missing loops. Please inquire if assistance with protein preparation is required.

SILCS simulation setup

${SILCSBIODIR}/silcs/1_setup_silcs_boxes prot=<Protein PDB>

Warning

The setup program internally uses the GROMACS utility pdb2gmx, which may have problems processing the protein PDB file. The most common reasons for errors at this stage involve mismatches between the expected residue name/atom names in the input PDB and those defined in the CHARMM force-field.

To fix this problem: Run the pdb2gmx command manually from within the 1_setup directory for a detailed error message. The setup script already prints out the exact commands you need to run to see this message.

cd 1_setup pdb2gmx –f <PROT PDB NAME>_4pdb2gmx.pdb –ff charmm36 –water tip3p –o test.pdb –p test.top -ter -merge all

Once all the errors in the input PDB are corrected, rerun the setup script.

Following completion of the setup, run 10 GCMC/MD jobs using the following script:

${SILCSBIODIR}/silcs/2a_run_gcmd prot=<Protein PDB>

This script will submit 10 jobs to the predefined queue. Each job runs out 100 ns of GCMC/MD with the protein and the solute molecules.

Once the simulations are complete, the 2a_run_gcmd/[1-10] directories will contain *.prod.125.rec.xtc trajectory files. If these files are not generated, then your simulations are either still running or stopped due to a problem. Look in the log files within these directories to diagnose problems.

FragMap generation

Once your simulations are done, generate the FragMaps using the following script.

${SILCSBIODIR}/silcs/2b_gen_maps prot=<Protein PDB>

This script will submit 10 single-core jobs. These will build occupancy maps spanning the simulation box for selected solute molecule atoms representing different functional groups. For a ~50K atom simulation system, this step takes approximately 10-20 minutes to complete. The FragMaps have a default spacing of 1 Å.

The next step is to combine the occupancy FragMaps generated from individual simulations and convert them into GFE FragMaps. Each voxel in a GFE FragMap contains a Grid Free Energy (GFE) value for the probe type that was used to create the corresponding occupancy FragMap.

${SILCSBIODIR}/silcs/2c_fragmap prot=<Protein PDB>

GFE FragMaps will be placed in the silcs_fragmap_<protein PDB>/maps directory. PyMOL and VMD scripts to loads these file will be placed in the silcs_fragmap_<protein PDB> directory.

<prot name>.benc.gfe.map    Aromatic map
<prot name>.prpc.gfe.map    Aliphatic map
<prot name>.aceo.gfe.map    Charged acceptor map
<prot name>.mamn.gfe.map    Charged donor map
<prot name>.forn.gfe.map    Polar nitrogen donor map (Formamide)
<prot name>.foro.gfe.map    Polar oxygen acceptor map (Formamide)
<prot name>.imin.gfe.map    Polar nitrogen acceptor map (Imidazole)
<prot name>.iminh.gfe.map   Polar nitrogen donor map (Imidazole)
<prot name>.aalo.gfe.map    Polar oxygen acceptor map (Acetaldehyde)
<prot name>.meoo.gfe.map    Polar oxygen acceptor map (Methanol)
<prot name>.apolar.gfe.map  Generic Apolar maps
<prot name>.hbdon.gfe.map   Generic donor maps
<prot name>.hbacc.gfe.map   Generic acceptor maps
<prot name>.excl.map        Exclusion map
<prot name>.ncla.map        Zero map

Cleanup

Raw trajectory files are large. To reduce file sizes, hydrogen atom positions can be deleted with the following:

${SILCSBIODIR}/silcs/2d_cleanup prot=<Protein PDB>

SILCS simulation setup with GPCR targets

You may use either a bare protein structure or your own pre-built protein/bilayer system with SILCS.

If you are beginning with a bare protein, SilcsBio provides a command line utility to embed transmembrane proteins, such as GPCRs, in a bilayer of 9:1 POPC/Cholesterol for subsequent SILCS simulations:

${SILCSBIODIR}/silcs-memb/1a_fit_protein_in_bilayer prot=<Protein PDB>

This command will align the first principal axis of the protein with the bilayer normal (Z-axis) and translate the protein center of the mass to the center of the bilayer (Z=0).

If the protein is already oriented as desired, orient_principal_axis=false will suppress automatic principal axis-based alignment.

${SILCSBIODIR}/silcs-memb/1a_fit_protein_in_bilayer prot=<Protein PDB> orient_principal_axis=false

By default the system size is set to 120 Å along the X and Y dimensions. To change the system size, use the bilayer_x_size and bilayer_y_size options.

${SILCSBIODIR}/silcs-memb/1a_fit_protein_in_bilayer prot=<Protein PDB> bilayer_x_size=<X dimension> bilayer_y_size=<Y dimension>

The resulting protein/bilayer system will be output in a PDB file with suffix _popc_chol. It is important to use a molecular visualization software at this stage to confirm correct orientation of the protein and size of the bilayer before using this output as the input in the next step.

Alternatively, you may use a protein/bilayer system that has been built using other software tools.

The following command will prepare the SILCS simulation box by adding water and probe molecules to the protein/bilayer system:

${SILCSBIODIR}/silcs-memb/1b_setup_silcs_with_prot_bilayer prot=<Protein/bilayer PDB>

SILCS GCMC/MD simulation can now be launched with the following command:

${SILCSBIODIR}/silcs-memb/2a_run_gcmd prot=<Protein/bilayer PDB>

The SILCS GCMC/MD will begin with a 6-step pre-equilibration to slowly relax the bilayer followed by 10 ns of relaxation of the protein into the bilayer prior to the production simulation.

Tip

GPCR simulation systems are typically large and can require significant storage space (> 200 GB) and simulation time (> 14 days with GPUs).

FragMap generation from the GCMC/MD simulation data follows the same protocol as for non-bilayer systems. Refer to previous instructions for $SILCSBIODIR/silcs/2b_gen_maps and $SILCSBIODIR/silcs/2c_fragmap.

If you encounter a problem, please contact support@silcsbio.com.

SILCS simulation setup halogen probes

Halogen atoms are useful functional group in exploring structure-chemical relationship during lead discovery and optimization process. Typically these halogen atoms are treated as non-polar groups, but recent discovery with halogen bond suggests halogen atoms warrants a special treatment.

CGenFF 2.3.0 allows halogen atoms to be treated with lonepairs giving them directionality in polar interaction. Such force field improvements are included in SILCS simulations from version 2020.1 and probing halogen-protein interaction can be acheived using a set of halogenated probes.

To include halogenated probes in your simulation, add halogen=true keyword in your command.

${SILCSBIODIR}/silcs/1_setup_silcs_boxes prot=<Protein PDB> halogen=true
${SILCSBIODIR}/silcs/2a_run_gcmd prot=<Protein PDB> halogen=true
${SILCSBIODIR}/silcs/2b_gen_maps prot=<Protein PDB> halogen=true
${SILCSBIODIR}/silcs/2c_fragmaps prot=<Protein PDB> halogen=true

This will create folders ends with _x (e.g., 1_setup_x, 2_run_gcmd, so on), signifies SILCS-Halogen simulation systems, opposed to standard SILCS simulations.

If you encounter a problem, please contact support@silcsbio.com.

Resuming stopped SILCS jobs

Situations such as exceeding workload manager (job queue) walltime limits can cause SILCS jobs to stop before running to completion. Resuming such jobs from where they stopped is straightforward. On the server where the job was running, go to the directory <Project Location>/2a_run_gcmd/i, where i is an integer from 1 to 10 and corresponds to the stopped job among the ten SILCS GCMC/MD jobs that were being run for that particular target. In that directory, directly use the file job_mc_md.cmd to submit the job to the workload manager, for example qsub job_mc_md.cmd for Sun Grid Engine (SGE) and sbatch job_mc_md.cmd for Slurm.

To ensure a successful restart, make sure to issue any required extra setup commands (for example, as listed in the “Extra setup” field under Settings and remote server configuration from the SilcsBio GUI Home screen) such as export LD_LIBRARY_PATH=<path/to/cuda/libraries> or module load <name of cuda module> before submitting the job.

Visualizing FragMaps

GFE FragMaps are created in the directory 2b_gen_maps. This folder contains the specific and generic FragMaps listed previously. FragMaps are defined using on non-hydrogen atoms. For certain solute types there are multiple FragMaps (imidazole, formamide). FragMaps for water oxygens (tipo) are included to identify water molecules that are difficult to displace. Generic FragMaps for apolar (benzene, propane), H-bond donor (neutral H-bond donors) and H-bond acceptor (neutral H-bond acceptors) probe types are included to simplify visualization.

The easiest way to visualize FragMaps is with the SilcsBio GUI. SilcsBio also provide plugins for the visualization softwares VMD and PyMOL:

FragMaps in VMD

Please ensure the SilcsBio VMD plugin is installed before continuing (see VMD plugin installation).

First, load the protein structure used for FragMap generation. Once the protein structure is loaded, select Extensions ‣ SilcsBio ‣ FragMap Tools to activate the plugin window.

../_images/vmd-fragmap-setup.png

Fig. 2 VMD FragMap Plugin

“Map folder” is the name of the folder that contains FragMap files, and is typically “2b_gen_maps”. “Map prefix” is the protein PDB name used when setting up and performing the SILCS simulations. If a protein PDB file was loaded in to VMD prior to opening the plugin window, the name of the PDB file will be taken as a default for this parameter. Once these two parameters are confirmed, click the “Visualize FragMap” button at the bottom of the window.

At this point, FragMaps will have been loaded in to the VMD Main window and a new tab will have appeared in the plugin window. This new tab allows quick enabling/disabling of FragMap display using checkboxes on the left hand side of the window. By default, GFE FragMaps are displayed at a contour level of -1.2 kcal/mol. This level can be changed by using either the +/- buttons or the slider.

../_images/vmd-fragmap-level.png

FragMaps in PyMOL

Please ensure the SilcsBio PyMol plugin is installed before continuing (see PyMOL plugin installation).

First, load the protein structure used for FragMap generation. Once the protein structure is loaded, select Plug ‣ FragMap Tools to activate the plugin window.

../_images/pymol-fragmap-setup.png

Fig. 3 PyMOL FragMap Plugin

“Map folder” is the name of the folder that contains FragMap files, and is typically “2b_gen_maps”. “Map prefix” is the protein PDB name used when setting up and performing the SILCS simulations. If a protein PDB file was loaded in to PyMol prior to opening the plugin window, the name of the PDB file will be taken as a default for this parameter. Once these two parameters are confirmed, click the “Visualize FragMap” button at the bottom of the window.

At this point, FragMaps will have been loaded in to the Pymol main window and a new tab will have appeared in the plugin window. This new tab allows quick enabling/disabling of FragMap display using checkboxes on the left hand side of the window. By default, GFE FragMaps are displayed at a contour level of -1.2 kcal/mol (with the exception of the Exclusion Map). This level can be changed by using either the +/- buttons or the slider.

../_images/pymol-fragmap-level.png