back to Table of Contents
12. Ligands
12.1 Overview
Free-moving, static, or flexible ligands are supported by EGAD. An
arbitrary number of ligands may be considered simultaneously. Each
ligand that is being considered must have a LIGAND
entry in the inputfile:
LIGAND ligand_parameter_file
The format of ligand parameter files, as well as an explanation of how
to build these files, is described in detail in the Appendix:
Ligand parameter files.
Unless otherwise specified, a ligand is considered to be a VARIABLE_POSITION.
If LIGAND
entries are listed, it will be assumed that the ligands are moving
according to the ligamers (ligand-rotamers) in the files. ligands are
treated as if they are in chain l
(lowercase L 'l').
The first LIGAND
listed is 1l,
the second is 2l,
etc.
For example:
START
...
LIGAND
myligand_A
# default to 1l
LIGAND
myligand_B
# default to 2l
LIGAND
myligand_C
# default to 3l
...
VARIABLE_POSITIONS
1 R,Q,M.
5 all
END
Internally, myligand_A
is at position 1l, myligand_B
is at position 2l, and myligand_C
is at position 3l. By
default, all three of these are free to move, following the defined
ligamers in the respective parameter files.
Like sidechains, ligands can also be defined as being fixed to the
coordinates in the template pdb file, or the base coordinates in the
ligand parameter file. If a stationary ligand with the coordinates
listed in the TEMPLATE
file or the LIGAND
file is desired, define the ligand as being fixed in FIXED_POSITIONS.
For example:
START
...
LIGAND
myligand_A # default to 1l
LIGAND
myligand_B # default to 2l
LIGAND
myligand_C # default to 3l
...
VARIABLE_POSITIONS
1 R,Q,M.
5 all
6 L.
9 all
10 hydrophobic
END
FIXED_POSTIONS
1l
3l
END
Eventhough 2l
is not explicitly defined as a VARIABLE_POSITION,
myligand_B
is moving. In contrast, since 1l and 3l are
listed as FIXED_POSITIONS,
myligand_A
and myligand_C
are fixed.
If OTHER_RESIDUES none
is defined, then all the ligands are kept fixed, unless explicitly
listed as a VARIABLE_POSITION.
12.1.1
Multiple ligand choices
Ligands can also be listed as VARIABLE_POSITIONS
explicitly. As with sidechains, multiple choices are possible. As demonstrated below, this may be useful for
virtually screening combinatorial drug libraries.
If multiple ligand choices are defined for a given position, the
positions corresponding to ligands based at other positions should be
set to the
dummy residue U.
For example:
START
...
LIGAND
myligand_A # a default to 1l
LIGAND
myligand_B # b
default to 2l
LIGAND
myligand_C # q default to
3l; "c" belongs to
CYD
...
VARIABLE_POSITIONS
1l a,b,q.
2l U.
3l U.
...
END
In this example, either myligand_A,
myligand_B,
or myligand_C
are permitted, but not all three at once.
Consider the following:
...
VARIABLE_POSITIONS
1l a,q.
3l U.
...
END
In this case, myligand_B,
which corresponds to position 2l, is also
included, since 2l is not
set to U.
However, only either myligand_A
or myligand_C,
but not both at once, are permitted.
If multiple ligands are permitted at a given position, the choice of
reference state energy may be important, as it is for standard protein
design. If the reference state energy is not defined in the ligand
parameter file, the program will automatically calculate the energy for
the isolated ligand base coordinates.
For rigid ligands, this may be reasonable, but, as for protein
sidechains, some form of conformational averaging might be required for
flexible ligands. A possible approximation would be to use a minimized
structure for the ligand base coordinates.
12.2 Ligand binding protein design examples
The design and characterization of ligand-binding proteins is a topic
for several PhD. theses, and is actively being explored in our lab. See the Handel lab website.
The examples
described here are for demonstrating inputfile formats only, and have
not been experimentally characterized. If you are seriously interested
in a ligand-binding design project, you may wish to collaborate with
the lab.
Many variables such
as the positions allowed to change identity, positions allowed to
change rotamers, the fine-ness of the ligand rotamer and sidechain
rotamer space, the requirement of hydrogen-bonding
satisfaction, and
models of the unbound state need to be
explored more fully. One important test that needs to be done more
fully is
predicting the relative affinities of mutant proteins for their
ligands. This may lend insight into the limitations of the energy
function for
ligand binding design. However, the fact that the effect of mutations
on protein-protein complex affinities are
predicted well is promising.
In addition, EGAD is able to predict the relative affinities of
aspartate aminotransferase mutants for maleate inhibitor (Voorhies
et al in preparation). A
structural analysis of ligand-binding designs for several different
types of ligands in their natural receptor contexts may also be useful
for identifying problems, as it has for protein design (Pokala and Handel 2005).
Examples are shown in examples/ligands/
for maltose binding protein (MBP). The construction of the LIGAND files are discussed in the Appendix,
as well as in examples/energy_function/ligands/
.
12.2.1 Redesign of maltose binding protein
(MBP) for maltose
The first example is the redesign of MBP for its natural ligand,
maltose. First, the
sidechain neighbors of maltose are identified with JOBTYPE list_variable_positions
(examples/ligands/find_maltose_neighbors/maltose_neighbors.detailed_floating_positions_list).
These residues are allowed to change sequence. It may be necessary to
permit additional residues to change identity.
The different inputfiles in examples/ligands/design_maltose_bp/
examine the effects of floating the rotamer conformations of
non-ligand-contacting residues. The inclusion of OTHER_RESIDUES
neighbors permits the neighbors of
ligand-contacting residues to change conformation, but not sequence,
while leaving other positions fixed (design_maltose_bp.input).
Using OTHER_RESIDUES none keeps these
neighbors fixed (design_maltose_bp_fix_neighbors.input).
The sequences designed by these are similar, but distinct.
12.2.2 Redesign of maltose binding protein
for an alternative
ligand
The second example redesigns MBP so it binds a different small
molecule, bar (8-hydroxy-2-oxa-bicyclo[3.3.1]non-6-ene-3,5-dicarboxylic
acid. As
suggested by Hellinga and colleagues, a simple area to place a
ligand
is in the ligand-binding site of a natural binding protein . The ligand
of interest can be crudely docked into the binding site, using the ALIGN_STRUCTURES
functionality, aligning over ligand atoms. In this example, the
coordinates of bar bound to another protein are extracted and docked to
the maltose in MBP (design_bar_bp/align_bar_to_maltose.input).
Since both ligands have oxygen-containing six-member rings, the ligands
were aligned to overlay these atoms. The choice of alignment is
completely arbitrary. Another option may be to align the centroid of
the new ligand with the bound ligand. The docked coordinates are used
as a base for the rotations and translations, as discussed in the ligand
parameter file section.
The sidechain neighbors of maltose defined as described above are
redesigned to bind bar (design_bar_bp/design_bar_bp.input
and design_bar_bp/design_bar_bp_fix_neighbors.input).
12.2.3
Drug design - design of an N-substituted peptoid for binding src SH3
Small-molecule combinatorial libraries have a scaffold (akin to protein
backbone) and variable groups (akin to protein sidechains). The example
described here outlines a scheme for using EGAD for selecting an
optimal combination from such a library. This example is naive and
should be considered to be only a demonstration; some of the caveats
and thoughts on improvement are also discussed. The
files for this are all
in examples/ligands/drug_design/.
The first step is choosing a scaffold. In principle, any scaffold may
be used. This example uses N-substituted peptoids (Figure 12.2.3a). As discussed
by
Lim and
colleagues, these molecules have peptide-like linkages, but
since they are N-substituted, they are resistant to proteases (Nguyen
et al 1998).
The second step is the identification of a targeted binding site for
the
scaffold. A simple approach is to align the drug scaffold to the
scaffold of a natural or previously identified binding partner for the
target of interest. For
example, if one
wished to design an inhibitor of ATP-binding, a potential scaffold
should be aligned onto the adenine ring of the nucleotide-bound
structure of the target protein; potential variable groups would branch
off from the aligned (and, for the design calculation, fixed) scaffold.
Structures of protein-protein complexes may also be used. For example,
a patch of spacially contigious hotspot residues may be a fruitful
site for aligning a scaffold. Selection methods, such as phage display,
can be used to
select binding peptides, often resulting in tight binders (Livnah et al 1996).
Indeed, kits
for peptide display
are commercially available. Structures of selected peptides bound to
their targets suggest potential binding sites for small molecules that
are better suited as pharmaceuticals than peptides. It may also be
useful to try orientation variations of the aligned scaffold (Wollacott & Desjarlais 2001).
The example
discussed here uses the location of the central residues of the
RALPPLPRY peptide bound to the src
SH3 domain. The peptoid has five positions corresponding to residues
73-77 of the peptide. The proline at position 75 of the peptide
(position 3 of the peptoid) was kept fixed, resulting in four fully
variable positions.
The next step is to create ligand parameter files for each
of the candidate R groups. A pdb file must be created for each R group,
including the scaffold atoms required to build R-group atoms. These
coordinates must conform to the rules in the ligand parameter file section, with
unique atomnames, avoiding reserved names.
Please also read the ligand
parameter file section for information on converting a pdb file of
a ligand to a resparam section of a
ligand parameter file, and ligand_files/make_ligand_files/notes
for details particular to this example.
Depending on the scaffold, it may be desirable to assign scaffold atoms
to U
in each R-group ligand file, and then include an additional ligand file
for just the scaffold. This may be preferable for systems (such as
rings) in which branch points the sidechains grow from are not
identical in terms of atomtypes (Figure
12.2.3b). For other systems, such as the peptoid, it may be better
to
include the scaffold with the R-group.
After creating the resparam section of the ligand parameter file, the ligamers must be generated. The transform
matrix bringing each sidechain to the correct location for each
branch point must be identified. The ligamers are position-dependent
and should be marked as such, as described in the ligamer format description. For the
peptoid example, the transform matricies were determined for each
position in ligand_files/make_ligand_files/find_transform_matrix/
and summarized in transform_matricies.
Since four positions are being floated, and each R-group is permitted
at each position, a transform matrix must be found for each position.
In addition, each peptoid scaffold position also has a unique variable
dihedral (psi;
N1->CA1->C1->O1).
If the R-groups have rotatable bonds, these must be marked in the atomlines of the resparam section, and the
rotamers included in the ligamer lines.
The rotamers considered in the peptoid example were not optimized for
the scaffold, and naively used -180.0,
60.0, -60.0 (trans, gauche+, and gauche-) for
each rotatable bond. Varying these to values more consistent
with the scaffold may be useful, as would including more
rotamers. Since each position has its own transform matrix, each
rotamer must be attached to each transform matrix. For example, see the
ligand file for the valine peptoid analog (ligand_files/val.ligand).
The file ligand_files/prt.ligand
defines the central proline residue and the terminal capping groups.
The design was done with N_sub_peptide.SH3.design.input.
As discussed above, the dummy residuetype U must be
defined as the only choice for all the ligand positions, before the
actual variable positions are defined. Multistate optimization
is used to explicitly model the unbound state of the protein and
ligand. Single-state optimization may also be used, and is in fact used
to initialize the multistate optimization.
However, as discussed in the ligand parameter
file section, the automatically generated reference state energy
considers only the base coordinates. If the base structure is strained
for a particular choice, that R-group will be over-represented in the
solution. Multistate optimization avoids this problem, since the
conformation is optimized for each candidate solution. The optimal
solution is
shown in Figure 12.2.3c.
12.3 Ligand hydrogen-bonding satisfaction: UNSAT_HBOND_REFSTATE
Protein ligand binding sites satisfy the hydrogen-bonding groups on a
ligand. This is especially important for binding specificity. However,
hydrogen-bonding is usually less favorable than other types of
interactions, due to the unfavorable burial of polar groups. As
expected, designs that simply optimize the energy of the bound complex
tend to be hydrophobic. In order to drive the selection of sequences
that bind the ligand of interest specifically, unsatisfied
hydrogen-bonding atoms on a ligand incur an additional UNSAT_HBOND_REFSTATE (default -VDW_CUTOFF; -1000
kcal/mol) energy to the specificity reference state; this is equivalent
to a penalty for unsatisfied hydrogen-bonding atoms. This value may be
defined in the input file or forcefield file:
UNSAT_HBOND_REFSTATE
reference state energy of unsatisfied hbond-atom; default -1000
The optimal value of this parameter is yet to be determined.
12.4 Ligand implementation
Functions regarding ligands are mostly in ligand_stuff.cpp
and ligand_stuff.h. However, ligand-specific modifications are
scattered in a few other files. The numerous modifications in read_forcefield.cpp, input_stuff.cpp,
and readpdbfile.cpp deal with parsing
ligand-related inputs, and are documented in the source code.
Ligand-specific modifications in lookup_table.cpp
prune ligamers that do not interact with the backbone (ie: unbound) and
deal with the slightly different book-keeping that results from the use
of static structures for ligamers. Other than these minor
perturbations, the vast majority of functions in EGAD do not
distinguish between proteins and free-moving ligands.
If a LIGAND entry is present, the LIGANDFILENAME array of strings is created; each
string contains the filename of a LIGAND
parameter file. read_forcefield.cpp:
read_forcefield calls ligand_stuff.cpp:
read_ligand_data for each ligand. This function parses the
ligand files. read_forcefield.cpp: read_forcefield
attaches the ligand resparam and rotamerlib entries into the main
resparam and rotamerlib arrays. The resparam entry for a ligand is
marked with resparam->ligand_flag=1.
The rotamerlib entry has pdbATOM
*ligand_base_coords; if this is not NULL,
it implies that the ligamers are transform matricies that use these
coordinates as a base. In the rotamerlib->rotamer
array, there is a pdbATOM *ligand_coords
array; if this is not NULL, the
coordinates are used in lieu of transform matrix ligamers (also, rotamerlib->numChi=1 and rotamerlib->rotamer[n]->chi[1]
= n). All of the book-keeping required for building ligand
coordinates are handled by functions in ligand_stuff.cpp;
these functions are called by the appropriate functions in dihedral_cartesian.cpp, but should be otherwise
invisible to most other functions.
After returning to input_stuff.cpp: input_stuff,
ligand_stuff.cpp: initialize_stuff_for_ligands
is called. This function parses base coordinates from the template
structure, if they were included. It also sets up virtual backbone
atoms for each ligand. These virtual backbone atoms are required to
permit the rest of the program to treat ligands as if they were
ordinary sidechains. If ligamers are automatically generated, this
function also re-sets the transform matricies so that they are centered
at the centroid of the base coordinates.
back to Table of Contents