back to Table of Contents

12. Ligands

12.1 Overview

Free-moving, static, or flexible ligands are supported by EGAD. An arbitrary number of ligands may be considered simultaneously. Each ligand that is being considered must have a LIGAND entry in the inputfile:
LIGAND ligand_parameter_file

The format of ligand parameter files, as well as an explanation of how to build these files, is described in detail in the Appendix: Ligand parameter files.

Unless otherwise specified, a ligand is considered to be a VARIABLE_POSITION. If LIGAND entries are listed, it will be assumed that the ligands are moving according to the ligamers (ligand-rotamers) in the files. ligands are treated as if they are in chain l (lowercase L 'l'). The first LIGAND listed is 1l, the second is 2l, etc. 
For example:
START
...
LIGAND myligand_A
   #   default to 1l
LIGAND myligand_B
   #   default to 2l
LIGAND myligand_C
   #   default to 3l
...
VARIABLE_POSITIONS
    1 R,Q,M.
    5 all
END

Internally, myligand_A is at position 1l, myligand_B is at position 2l, and myligand_C is at position 3l. By default, all three of these are free to move, following the defined ligamers in the respective parameter files.

Like sidechains, ligands can also be defined as being fixed to the coordinates in the template pdb file, or the base coordinates in the ligand parameter file. If a stationary ligand with the coordinates listed in the TEMPLATE file or the LIGAND file is desired, define the ligand as being fixed in FIXED_POSITIONS. For example:
START
...
LIGAND myligand_A   #   default to 1l
LIGAND myligand_B   #  
default to 2l
LIGAND myligand_C   #  
default to 3l
...
VARIABLE_POSITIONS
    1 R,Q,M.
    5 all
    6 L.
    9 all
    10 hydrophobic
END
FIXED_POSTIONS
    1l
    3l
END


Eventhough 2l is not explicitly defined as a VARIABLE_POSITIONmyligand_B is moving. In contrast, since 1l and 3l are listed as FIXED_POSITIONSmyligand_A and myligand_C are fixed.

If OTHER_RESIDUES none is defined, then all the ligands are kept fixed, unless explicitly listed as a VARIABLE_POSITION.

12.1.1    Multiple ligand choices

Ligands can also be listed as VARIABLE_POSITIONS explicitly. As with sidechains, multiple choices are possible. As demonstrated below, this may be useful for virtually screening combinatorial drug libraries.

If multiple ligand choices are defined for a given position, the positions corresponding to ligands based at other positions should be set to the
dummy residue U. For example:
START
...
LIGAND myligand_A # a   default to 1l
LIGAND myligand_B # b
   default to 2l
LIGAND myligand_C # q
  default to 3l; "c" belongs to CYD
...
VARIABLE_POSITIONS
    1l a,b,q.
    2l U.
    3l U.
...
END

In this example, either myligand_Amyligand_B, or myligand_C are permitted, but not all three at once.

Consider the following:
...
VARIABLE_POSITIONS
    1l a,q.
    3l U.
...
END

In this case, myligand_B, which corresponds to position 2l, is also included, since 2l is not set to U. However, only either myligand_A or myligand_C, but not both at once, are permitted.

If multiple ligands are permitted at a given position, the choice of reference state energy may be important, as it is for standard protein design. If the reference state energy is not defined in the ligand parameter file, the program will automatically calculate the energy for the isolated ligand base coordinates. For rigid ligands, this may be reasonable, but, as for protein sidechains, some form of conformational averaging might be required for flexible ligands. A possible approximation would be to use a minimized structure for the ligand base coordinates.

12.2 Ligand binding protein design examples

The design and characterization of ligand-binding proteins is a topic for several PhD. theses, and is actively being explored in our lab. See the Handel lab website. The examples described here are for demonstrating inputfile formats only, and have not been experimentally characterized. If you are seriously interested in a ligand-binding design project, you may wish to collaborate with the lab.

Many variables such as the positions allowed to change identity, positions allowed to change rotamers, the fine-ness of the ligand rotamer and sidechain rotamer space, the requirement of hydrogen-bonding satisfaction, and models of the unbound state need to be explored more fully. One important test that needs to be done more fully is predicting the relative affinities of mutant proteins for their ligands. This may lend insight into the limitations of the energy function for ligand binding design. However, the fact that the effect of mutations on protein-protein complex affinities are predicted well is promising. In addition, EGAD is able to predict the relative affinities of aspartate aminotransferase mutants for maleate inhibitor (Voorhies et al in preparation). A structural analysis of ligand-binding designs for several different types of ligands in their natural receptor contexts may also be useful for identifying problems, as it has for protein design (Pokala and Handel 2005).

Examples are shown in examples/ligands/ for maltose binding protein (MBP). The construction of the LIGAND files are discussed in the Appendix, as well as in examples/energy_function/ligands/ .

12.2.1 Redesign of maltose binding protein (MBP) for maltose

The first example is the redesign of MBP for its natural ligand, maltose. First, the sidechain neighbors of maltose are identified with JOBTYPE list_variable_positions (examples/ligands/find_maltose_neighbors/maltose_neighbors.detailed_floating_positions_list). These residues are allowed to change sequence. It may be necessary to permit additional residues to change identity.

The different inputfiles in examples/ligands/design_maltose_bp/ examine the effects of floating the rotamer conformations of non-ligand-contacting residues. The inclusion of OTHER_RESIDUES neighbors permits the neighbors of ligand-contacting residues to change conformation, but not sequence, while leaving other positions fixed (design_maltose_bp.input). Using OTHER_RESIDUES none keeps these neighbors fixed (design_maltose_bp_fix_neighbors.input). The sequences designed by these are similar, but distinct.

12.2.2 Redesign of maltose binding protein for an alternative ligand

The second example redesigns MBP so it binds a different small molecule, bar (8-hydroxy-2-oxa-bicyclo[3.3.1]non-6-ene-3,5-dicarboxylic acid. As suggested by Hellinga and colleagues, a simple area to place a ligand is in the ligand-binding site of a natural binding protein . The ligand of interest can be crudely docked into the binding site, using the ALIGN_STRUCTURES functionality, aligning over ligand atoms. In this example, the coordinates of bar bound to another protein are extracted and docked to the maltose in MBP (design_bar_bp/align_bar_to_maltose.input). Since both ligands have oxygen-containing six-member rings, the ligands were aligned to overlay these atoms. The choice of alignment is completely arbitrary. Another option may be to align the centroid of the new ligand with the bound ligand. The docked coordinates are used as a base for the rotations and translations, as discussed in the ligand parameter file section.

The sidechain neighbors of maltose defined as described above are redesigned to bind bar (design_bar_bp/design_bar_bp.input and design_bar_bp/design_bar_bp_fix_neighbors.input).

12.2.3   Drug design - design of an N-substituted peptoid for binding src SH3

Small-molecule combinatorial libraries have a scaffold (akin to protein backbone) and variable groups (akin to protein sidechains). The example described here outlines a scheme for using EGAD for selecting an optimal combination from such a library. This example is naive and should be considered to be only a demonstration; some of the caveats and thoughts on improvement are also discussed. The files for this are all in examples/ligands/drug_design/.

The first step is choosing a scaffold. In principle, any scaffold may be used. This example uses N-substituted peptoids (Figure 12.2.3a). As discussed by Lim and colleagues, these molecules have peptide-like linkages, but since they are N-substituted, they are resistant to proteases (Nguyen et al 1998).

The second step is the identification of a targeted binding site for the scaffold. A simple approach is to align the drug scaffold to the scaffold of a natural or previously identified binding partner for the target of interest. For example, if one wished to design an inhibitor of ATP-binding, a potential scaffold should be aligned onto the adenine ring of the nucleotide-bound structure of the target protein; potential variable groups would branch off from the aligned (and, for the design calculation, fixed) scaffold. Structures of protein-protein complexes may also be used. For example, a patch of spacially contigious hotspot residues may be a fruitful site for aligning a scaffold. Selection methods, such as phage display, can be used to select binding peptides, often resulting in tight binders (Livnah et al 1996). Indeed, kits for peptide display are commercially available. Structures of selected peptides bound to their targets suggest potential binding sites for small molecules that are better suited as pharmaceuticals than peptides. It may also be useful to try orientation variations of the aligned scaffold (Wollacott & Desjarlais 2001).

The example discussed here uses the location of the central residues of the RALPPLPRY peptide bound to the src SH3 domain. The peptoid has five positions corresponding to residues 73-77 of the peptide. The proline at position 75 of the peptide (position 3 of the peptoid) was kept fixed, resulting in four fully variable positions.

The next step is to create ligand parameter files for each of the candidate R groups. A pdb file must be created for each R group, including the scaffold atoms required to build R-group atoms. These coordinates must conform to the rules in the ligand parameter file section, with unique atomnames, avoiding reserved names. Please also read the ligand parameter file section for information on converting a pdb file of a ligand to a resparam section of a ligand parameter file, and ligand_files/make_ligand_files/notes for details particular to this example.

Depending on the scaffold, it may be desirable to assign scaffold atoms to U in each R-group ligand file, and then include an additional ligand file for just the scaffold. This may be preferable for systems (such as rings) in which branch points the sidechains grow from are not identical in terms of atomtypes (Figure 12.2.3b). For other systems, such as the peptoid, it may be better to include the scaffold with the R-group.

After creating the resparam section of the ligand parameter file, the ligamers must be generated. The transform matrix bringing each sidechain to the correct location for each branch point must be identified. The ligamers are position-dependent and should be marked as such, as described in the ligamer format description. For the peptoid example, the transform matricies were determined for each position in ligand_files/make_ligand_files/find_transform_matrix/ and summarized in transform_matricies. Since four positions are being floated, and each R-group is permitted at each position, a transform matrix must be found for each position. In addition, each peptoid scaffold position also has a unique variable dihedral (psi; N1->CA1->C1->O1). 

If the R-groups have rotatable bonds, these must be marked in the atomlines of the resparam section, and the rotamers included in the ligamer lines. The rotamers considered in the peptoid example were not optimized for the scaffold, and naively used -180.0, 60.0, -60.0 (trans, gauche+, and gauche-) for each rotatable bond. Varying these to values more consistent with the scaffold may be useful, as would including more rotamers. Since each position has its own transform matrix, each rotamer must be attached to each transform matrix. For example, see the ligand file for the valine peptoid analog (ligand_files/val.ligand). The file ligand_files/prt.ligand defines the central proline residue and the terminal capping groups.

The design was done with N_sub_peptide.SH3.design.input. As discussed above, the dummy residuetype U must be defined as the only choice for all the ligand positions, before the actual variable positions are defined.  Multistate optimization is used to explicitly model the unbound state of the protein and ligand. Single-state optimization may also be used, and is in fact used to initialize the multistate optimization. However, as discussed in the ligand parameter file section, the automatically generated reference state energy considers only the base coordinates. If the base structure is strained for a particular choice, that R-group will be over-represented in the solution. Multistate optimization avoids this problem, since the conformation is optimized for each candidate solution. The optimal solution is shown in Figure 12.2.3c.

12.3 Ligand hydrogen-bonding satisfaction: UNSAT_HBOND_REFSTATE

Protein ligand binding sites satisfy the hydrogen-bonding groups on a ligand. This is especially important for binding specificity. However, hydrogen-bonding is usually less favorable than other types of interactions, due to the unfavorable burial of polar groups. As expected, designs that simply optimize the energy of the bound complex tend to be hydrophobic. In order to drive the selection of sequences that bind the ligand of interest specifically, unsatisfied hydrogen-bonding atoms on a ligand incur an additional UNSAT_HBOND_REFSTATE (default -VDW_CUTOFF; -1000 kcal/mol) energy to the specificity reference state; this is equivalent to a penalty for unsatisfied hydrogen-bonding atoms. This value may be defined in the input file or forcefield file:
UNSAT_HBOND_REFSTATE     reference state energy of unsatisfied hbond-atom; default -1000

The optimal value of this parameter is yet to be determined.

12.4 Ligand implementation

Functions regarding ligands are mostly in ligand_stuff.cpp and ligand_stuff.h. However, ligand-specific modifications are scattered in a few other files. The numerous modifications in read_forcefield.cpp, input_stuff.cpp, and readpdbfile.cpp deal with parsing ligand-related inputs, and are documented in the source code. Ligand-specific modifications in lookup_table.cpp prune ligamers that do not interact with the backbone (ie: unbound) and deal with the slightly different book-keeping that results from the use of static structures for ligamers. Other than these minor perturbations, the vast majority of functions in EGAD do not distinguish between proteins and free-moving ligands.

If a LIGAND entry is present, the LIGANDFILENAME array of strings is created; each string contains the filename of a LIGAND parameter file. read_forcefield.cpp: read_forcefield calls ligand_stuff.cpp: read_ligand_data for each ligand. This function parses the ligand files. read_forcefield.cpp: read_forcefield attaches the ligand resparam and rotamerlib entries into the main resparam and rotamerlib arrays. The resparam entry for a ligand is marked with resparam->ligand_flag=1.

The rotamerlib entry has pdbATOM *ligand_base_coords; if this is not NULL, it implies that the ligamers are transform matricies that use these coordinates as a base. In the rotamerlib->rotamer array, there is a pdbATOM *ligand_coords array; if this is not NULL, the coordinates are used in lieu of transform matrix ligamers (also, rotamerlib->numChi=1 and rotamerlib->rotamer[n]->chi[1] = n). All of the book-keeping required for building ligand coordinates are handled by functions in ligand_stuff.cpp; these functions are called by the appropriate functions in dihedral_cartesian.cpp, but should be otherwise invisible to most other functions.

After returning to input_stuff.cpp: input_stuff, ligand_stuff.cpp: initialize_stuff_for_ligands is called. This function parses base coordinates from the template structure, if they were included. It also sets up virtual backbone atoms for each ligand. These virtual backbone atoms are required to permit the rest of the program to treat ligands as if they were ordinary sidechains. If ligamers are automatically generated, this function also re-sets the transform matricies so that they are centered at the centroid of the base coordinates.

back to Table of Contents