EGAD Library Manual

Energy Function

Solvation Energies

With the basic vaccuum potential set up, we can focus on the more complicated parts of the energy function. One of the hardest parts of protein design is accurately modeling the effects of water on protein structure. The environment in the core of a protein is distinct from that on the surface. Some amino acid sidechains strongly prefer the hydrophobic environment of the core to the hydrated protein surface. Another consequence of solvation is that Coulombic interactions are much stronger in the interior of a protein than on the exterior.

EGAD uses a continuum solvation model. This means that individual water molecules are not modeled. The protein is treated as a low dielectric object suspended in a continuous, high-dielectirc medium. This is because modeling the many conformations of water in the context of a design calculation is prohibitively expensive.

Pseudo-sidechain Approximation

In order to measure how solvent exposed a sidechain is at a particular position, we must know the identities of all the surrounding atoms. Knowing this, we could simply subtract off the surface area that was "covered up" by these surrounding atoms from the total surface area of the sidechain and obtain the surface area that must be exposed to solvent.

Of course, in protein design, the identities of some positions are unknown. One of the innovations that the original EGAD introduced was to use a pseudo-sidechain to approximate the volume an unknown position might occupy. This approximation allowed the original EGAD program to obtain impressive correlations with experimentally determined energy values.

While the concept of a pseudo-sidechain is simple, the methods used to determine the size of the sidechains is beyond the scope of this manual. For more information, see:

Solvent Accessible Surface Area

The hydrophobic effect is known to be an important determinant of protein structure. It is a term that describes the tendency of non-polar groups to clump together in solution to minimize unfavorable interactions with water. Because of this, the cores of proteins are known to be greasy, while the exterior is usually polar.

The hydrophobicity of a compound is often measured by the free energy of transfer from vaccuum into water (there are other ways to measure hydrophobicity). Regardless of measurement technique, it is observed that the transfer energy is proportional to the exposed surface area of the compound:

E=SUM(gamma*A)

Where the summation is over all atoms i in the compound, gamma is an empirical solvation factor for that atom type, and A is the solvent accessible surface area of that atom. Calculation of the accessible surface area of each atom is done by sampling discrete points on a sphere and testing for intersections with other atoms (or pseudo-sidechains, as the case may be). These sphere test points are defined in something called the SASA Points file.

Example 3.6: Loading SASA test points

// Continue from example 3.5

// Load the SASA Points file from disk
std::ifstream finSasaPoints("sasa.points");
SasaPointsFile sasaPoints;
sasaPoints.Load(finSasaPoints);
finSasaPoints.close();

The empirical solvation parameters are atom-specific and found in the Forcefield File. All of the relevant data for the SASA energy term is collected into a SASA parameters object:

Example 3.7: Loading SASA parameters

// Continue from example 3.6

// Load SASA parameters from points file and Forcefield File
SasaParams sasaParams;
sasaParams.LoadPoints(sasaPoints);
sasaParams.LoadFromForcefield(forcefield);

As mentioned earlier, the unknown positions in the protein are modeled using a pseudo-sidechain model. EGAD Library pseudo-sidechain models are very similar to Protein objects, but only offer a subset of the Protein interface necessary to build pseudo-sidechains. We have to create one for the SASA energy term to use:

Example 3.8: Creating a pseudo-sidechain model

// Continue from example 3.7

// Create a pseudo-sidechain model appropriate for SASA
SasaPseudoSidechainModel pseudoSC(&myProtein, &atomTypes);

Now that all the groundwork is laid, constructing the actual SASA energy term is trivial:

Example 3.9: Setting up the SASA energy term

// Continue from example 3.8

// Create the SASA energy term
SasaEnergyTerm energyS(&myProtein, &sasaParams, &pseudoSC);

Generalized Born

The other solvation effect that has been mentioned is the difference in the strength of Coulombic interactions between the protein core and surface. Basically, this problem boils down to determining what the effective dielectric is between two arbitrary atoms in the protein. The Poisson-Boltzmann (PB) continuum dielectric model is the gold standard for this form of calculation, but is far too slow to be usable in protein design. The family of generalized Born (GB) models are fast approximations to PB that give similar results.

The technique defines a Born raidus for each atom in the protein that is analogous to an ionic radius: the Born radius measures the average distance from the atom to the protein surface. The effective dielectric is scaled according to this Born radius between the values for the protein surface and core.

Generalized Born Energy

The epsilons are the dielectric constants for water and protein, r is the distance between the atoms, q is the charge on each atom, and a is the born radius of the atom.

The bulk of the calculation is in determining the Born radii of each atom in the protein. This involves an ugly equation (even uglier than the last one) and requires the VDW radius of the atoms under consideration. Like in the SASA term, a pseudo-sidechain model is required to estimate the effect of unknown positions. For more information on this ugly equation, and the GB model, see:

The ultimate effect of this model is to modulate the dielectric between atoms based on their relative burial in the protein. This is a more advanced treatment than simply setting the dielectric to a constant value in the Coulombic energy term, but is also (currently) the slowest part of the entire energy function.

Example 3.10: Setting up the Generalized Born energy term

// Continue from example 3.9

// Create a pseudo-sidechain model appropriate for G.B.
BornPseudoSidechainModel pseudoB(&myProtein, &atomTypes);

// Create the G.B. energy term
// vdwParameters was set up when creating the Lennard Jones energy 
// term in example 3.1
BornRadii energyB(&myProtein, &vdwParameters, &pseudoB);