back to Table of Contents

1. Introduction

1.1 Overview

EGAD's (Egad! a Genetic Algorithm for protein Design) main focus is performing protein design on fixed backbone scaffolds. All the major rotamer-optimization methods such as genetic algorithms (GA), monte carlo simulated annealing (MC), self-consistent mean-field optimization (SCMF), fast and accurate side-chain topology and energy refinement (FASTER), and dead-end elimination (DEE) have been implemented in EGAD. It can also consider multiple structures simultaneously for designing specific binding proteins or locking proteins into specific conformational states. In addition to natural protein residues, EGAD can also consider free-moving ligands with or without rotatable bonds. EGAD can be used with a single processor, but it can take advantage of the power of parallelization to perform certain jobs quickly.

The energy function used by EGAD can predict the effect of mutations on folding stability and protein-protein complex affinity to ~1 kcal/mol. The energy function used includes standard forcefield terms, such as van der Waals (vdW), coulombic electrostatics, and torsional energies. It also includes the generalized Born (GB) continuum electrostatics model and surface-area dependent energies for solvation (Pokala and Handel 2004; Pokala and Handel 2005).

Some of the tasks EGAD can perform are:
    - optimization of sequences while considering multiple structures for the design of specific binding proteins, conformational switching, etc.
    - prediction of mutation effects on protein stability and protein-complex formation to within ~1kcal/mol
    - predicting the pK's of ionizable groups in proteins
    - generating tables to display the distribution of energetic interactions in protein structures
    - minimizing protein structure energies by adjusting dihedral angles

For a more complete discussion of protein design, please see my thesis and these other reviews (Pokala and Handel 2001; Kortemme and Baker 2004; Kuhlman and Baker 2004; Ventura and Serrano 2004; Pokala and Handel 2005). However, many of the code implementation details have changed between the thesis version of EGAD and the current version.

This documentation is for the stand-alone EGAD program. If you use this program, please cite:
Pokala N, Handel TM. Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility & specificity. J Mol Biol. 347(1): 203-227. (2005). [download pdf]    [medline]

The EGAD program takes user-written inputfiles as input. An EGAD inputfile is like a script, but, unlike a real scripting language, is very limited in its syntax. Most of this manual deals with setting up input files for performing different tasks. In each section, there are examples of sample inputs, outputs, and interpretation. Example inputfiles which may copied and modified are available in examples/.

At the end of each section, there is a description of the relevant functions and data structures. Reading these sections should not be necessary for using EGAD as an end-user. Source code files (.h or .cpp) that are referred to are in source_code/.

Unless otherwise specified, angles are in degrees, energies in kcal/mol, distances and coordinates in Å, and temperatures in K.

This program is distributed under the GNU public license (http://www.gnu.org/copyleft/gpl.html and COPYRIGHT_AND_LICENSE). It is provided as is, with absolutely no warranty expressed or implied.

1.2 Installation

Download, uncompress, and untar the EGAD.tar.gz tarball:
mycomputer % gunzip EGAD.tar.gz
mycomputer % tar -xvf EGAD.tar


This should create a directory called EGAD/. Within it are the source_code/, lib/, bin/, and examples/ directories.

Included are library and binary files for x386 Linux (compiled with gcc version 2.96). However, if the EGAD/bin/EGAD.exe executable does not work for you, re-compilation may be necessary.

The default compiler is g++; if you do not have this, modify the CC = g++ line in EGAD/source_code/Makefile and EGAD/source_code/DEE/Makefile. Testing and development has been done using gcc version 2.96. However, the program should be compatible with newer versions.

Compile the program:
mycomputer % cd EGAD/source_code
mycomputer % make clean all

This should create the EGAD.exe executable in the EGAD/bin/ directory.

1.2.1   Compilation and execution issues

Some of the example files refer to directories or files that have paths your machine may not have. This is especially the case for LOOKUP_TABLE_DIRECTORY entries. Please modify these lines as needed.

Depending on the flavor of UNIX being used, there may be some compilation or execution problems. EGAD assumes that the UNIX commands mv, rm, touch, cp, cat are in /bin/ and and diff is in /usr/bin/. Setting up symbolic links (an alias) for these commands may prevent problems. For example, on Max OS X, touch is in /usr/bin/, not /bin/. The following will set up the symbolic link:
    sudo ln -s /usr/bin/touch /bin/touch

On Mac OS X, the following (or similar) compilation error may occur:
    DeeTable.cpp:301: no matching function for call to `min(unsigned int&, size_t)'
If this happens, add the following line to source_code/DEE/DeeTable.cpp after the #include <math.h> line near the top of the file:
    #include "../moremath.h"

If you get a ranlib error such as:
ld: archive: ../lib//libEgadLib.a has no table of contents, add one with ranlib(1) (can't load from it)
create the library table of contents manually. This can be done by:
ranlib ../lib//libEgadLib.a
ranlib ../lib//libDee.a
make all

If you are running on a single-processor system that does not allow for ssh logins (Mac OS X by default), include the following in the command line for jobs that may include automated parallelization (scanning mutagenesis or multistate optimization):
EGAD.exe blah.input parallel 1 cpus
This mimics a parallelized job with 1 cpu, and bypasses any ssh jobs that may be launched by EGAD.
Otherwise, set up passwordless ssh.

1.3 EGAD source code

Since EGAD contains some potentially useful C/C++-functions (g++ -Wall compliant; copyright under the GNU public license), the implementation details of some of the more important or major functions in the source code are discussed in the relevant sections. Reading these sections should not be necessary for using EGAD as an end-user. These implementation descriptions are less complete than one would like. The reader is urged to look at the relevant source code in order to gain a better understanding, since there is no substitute for actually following the code and reading the internal documentation. The descriptions here are for the April 2005 version of EGAD. There are admittedly some functions and datastructures that could have been defined or written in a more flexible or elegant manner. Some of this inelegance is due to the fact that this program was initially written in a "bottom-up" manner by an inexperienced amateur programmer. The program has been re-written and re-organized several times in order to make it more powerful, as well as (one hopes) aesthetically pleasing. Unfortunately there are still vestigial remnants of these "growing pains." Most of these deal with inputting user data and template pdb files at the start of the run. For a few tasks that are performed infrequently in the course of a run, the implementation might not be the most optimal one. However, since these tasks are performed infrequently, it was not deemed necessary to improve them. In contrast, the code for highly repetitive tasks, such as rotamer optimization and lookup table generation/access, are much cleaner and optimized.

Linking to the library requires the following header:
#include "/blah/EGAD/source_code/egad.h"
Compile with:
mycomputer % g++ myprogram.cpp -lm -L/blah/EGAD/lib/ -lEgadLib -o my_executable.exe

If you are interested in writing your own programs, or in making major modifications, it is highly recommended that you examine the EGAD library project. This is a library of C++ functions for protein design, and has been designed to be highly object-oriented. In addition, the source code documentation is far more extensive.

back to Table of Contents