back
to Table of Contents
1. Introduction
1.1 Overview
EGAD's (Egad! a Genetic Algorithm for protein Design) main focus is
performing protein design on fixed backbone scaffolds. All the major
rotamer-optimization methods such as genetic algorithms (GA), monte
carlo simulated annealing (MC), self-consistent
mean-field optimization
(SCMF), fast and accurate side-chain topology and energy refinement
(FASTER), and dead-end elimination (DEE) have been implemented
in EGAD. It can also consider multiple structures
simultaneously for
designing specific binding proteins or locking proteins into specific
conformational states. In addition to natural protein residues, EGAD
can also consider free-moving ligands with or
without rotatable bonds.
EGAD can be used with a single processor, but it can take advantage of
the power of parallelization to perform certain
jobs quickly.
The energy function used by EGAD can predict the effect of mutations on
folding stability and protein-protein complex affinity to ~1 kcal/mol.
The energy function used includes standard forcefield terms, such as
van der Waals (vdW), coulombic electrostatics, and torsional energies.
It also includes the generalized Born (GB) continuum electrostatics
model and surface-area dependent energies for solvation (Pokala and Handel 2004; Pokala and
Handel 2005).
Some of the tasks EGAD can perform are:
- optimization of sequences while
considering
multiple structures for
the design of specific binding proteins, conformational switching, etc.
- prediction of mutation effects
on protein
stability and protein-complex formation to within ~1kcal/mol
- predicting the pK's of
ionizable groups in proteins
- generating tables to display the
distribution of
energetic interactions in protein structures
- minimizing protein structure
energies by adjusting
dihedral angles
For a more complete discussion of protein design, please see my thesis
and these other reviews (Pokala
and Handel 2001; Kortemme
and Baker
2004; Kuhlman and Baker
2004; Ventura and
Serrano 2004; Pokala and
Handel 2005). However, many of the code implementation
details have changed between the thesis version of EGAD and the current
version.
This documentation is for the stand-alone EGAD program. If you use this
program, please cite:
Pokala N, Handel TM. Energy
functions for protein design: adjustment with protein-protein
complex affinities, models for the unfolded state, and negative design
of solubility & specificity. J Mol Biol. 347(1): 203-227. (2005).
[download pdf]
[medline]
The EGAD program takes user-written inputfiles as input. An EGAD
inputfile is like a script, but, unlike a real scripting language, is
very limited in its syntax. Most of this manual deals with setting up
input files for performing different tasks. In each section, there are
examples of sample inputs, outputs, and
interpretation. Example inputfiles which
may copied and modified are available in examples/.
At the end of each section, there is a description of the relevant
functions and data structures. Reading these sections should not
be necessary for using EGAD as an end-user. Source code files (.h or .cpp) that
are referred to are in source_code/.
Unless otherwise specified, angles are in degrees,
energies in kcal/mol, distances and coordinates in Å, and
temperatures
in K.
This program is distributed under the GNU public license (http://www.gnu.org/copyleft/gpl.html
and COPYRIGHT_AND_LICENSE). It is provided as
is, with absolutely no warranty expressed or implied.
1.2 Installation
Download, uncompress, and untar the EGAD.tar.gz
tarball:
mycomputer % gunzip EGAD.tar.gz
mycomputer % tar -xvf EGAD.tar
This should create a directory called EGAD/.
Within it are the source_code/, lib/, bin/, and examples/ directories.
Included are library and binary files for x386 Linux (compiled with gcc version 2.96). However, if the EGAD/bin/EGAD.exe executable does not work for
you, re-compilation may be necessary.
The default compiler is g++; if you do not
have this, modify the CC = g++ line in EGAD/source_code/Makefile and EGAD/source_code/DEE/Makefile. Testing and
development has been done using gcc version 2.96.
However, the program should be compatible with newer versions.
Compile the program:
mycomputer % cd EGAD/source_code
mycomputer % make clean all
This should create the EGAD.exe executable
in the EGAD/bin/ directory.
1.2.1
Compilation and execution issues
Some of the example files refer to directories or files that have paths
your machine may not have. This is especially the case for LOOKUP_TABLE_DIRECTORY
entries. Please modify these lines as needed.
Depending on the flavor of UNIX being used, there may be some
compilation or execution problems. EGAD assumes that the UNIX commands mv, rm, touch, cp, cat are in /bin/ and
and diff
is in /usr/bin/.
Setting up symbolic links (an alias) for these commands may prevent
problems. For example, on Max OS X, touch is in
/usr/bin/,
not /bin/.
The following will set up the symbolic link:
sudo ln -s
/usr/bin/touch /bin/touch
On Mac OS X, the following (or similar) compilation error may occur:
DeeTable.cpp:301:
no matching function for call to `min(unsigned int&, size_t)'
If this happens, add the following line to source_code/DEE/DeeTable.cpp
after the #include
<math.h> line near the top of the file:
#include
"../moremath.h"
If you get a ranlib
error such as:
ld: archive: ../lib//libEgadLib.a
has no table of contents, add one with ranlib(1) (can't load from it)
create the library table of contents manually. This can be done by:
ranlib
../lib//libEgadLib.a
ranlib
../lib//libDee.a
make all
If you are running on a single-processor system that does not allow for
ssh
logins (Mac OS X by default), include the following in the command line
for jobs that may include automated parallelization (scanning
mutagenesis or multistate optimization):
EGAD.exe
blah.input parallel 1 cpus
This mimics a parallelized job with 1 cpu, and
bypasses any ssh
jobs that may be launched by EGAD.
Otherwise, set up passwordless
ssh.
1.3 EGAD source code
Since EGAD contains some potentially useful C/C++-functions (g++ -Wall compliant; copyright under the GNU public
license),
the implementation details of some of the more important or major
functions in the source code are discussed in the relevant sections.
Reading these sections should not be necessary for using EGAD
as an end-user. These implementation descriptions are less complete
than one would like. The reader is urged to look at the relevant source
code in order to gain a better understanding, since there is no
substitute for actually following the code and reading the internal
documentation. The descriptions here are for the April 2005 version of
EGAD. There are admittedly some functions and datastructures that could
have been defined or written in a more flexible or elegant manner. Some
of this inelegance is due to the fact that this program was initially
written in a "bottom-up" manner by an inexperienced amateur programmer.
The program has been re-written and re-organized several times in order
to make it more powerful, as well as (one hopes) aesthetically
pleasing. Unfortunately there are still vestigial remnants of these
"growing pains." Most of these deal with inputting user data and
template pdb files at the start of the run. For a few tasks that are
performed infrequently in the course of a run, the implementation might
not be the most optimal one. However, since these tasks are performed
infrequently, it was not deemed necessary to improve them. In contrast,
the code for highly repetitive tasks, such as rotamer optimization and
lookup table generation/access, are much cleaner and optimized.
Linking to the library requires the following header:
#include
"/blah/EGAD/source_code/egad.h"
Compile with:
mycomputer % g++ myprogram.cpp -lm
-L/blah/EGAD/lib/ -lEgadLib -o my_executable.exe
If you are interested in writing your own programs, or in making major
modifications, it is highly recommended that you examine the EGAD
library project.
This is a library of C++ functions for protein design, and has been
designed to be highly object-oriented. In addition, the source code
documentation is far more extensive.
back
to Table of Contents