Minimizers
Stochastic Algorithms
The second class of minimizers we will consider are those that are stochastic. In other words, they involve random elements and are not guaranteed to produce the same result each time. Unlike deterministic minimizers, stochastic methods often scale well for large spaces and provide answers that are "good enough" if not perfectly correct.
Simulated Annealing (Monte Carlo)
Possibly the most well known stochastic minimization method is the Metropolis Monte Carlo (also known as Simulated Annealing). This algorithm is, essentially, a random walk through the energy landscape with a preference for taking downward steps. The procedure is simple:
- Take a step (make a mutation)
- Evaluate the energy of the new sequence
- Accept or deny mutation, based on energy difference
- Repeat
The decision to accept or deny the new, mutated sequence is based on the energy of the new sequence. If the energy is lower, the mutation is accepted unconditionally. If the energy is higher (unfavorable), the mutation has a Boltzmann weighted probability of being accepted:

Where kB is Boltzmann's constant, E2 and E1 are the energies of the new and old state, respectively, and T is temperature.
Temperature is a way of expressing how random the simulation should be. At the limit of infinite temperature, the Monte Carlo is walking completely randomly (all steps are accepted). At the limit of zero temperature, the Monte Carlo is walking only downhill.
The term "simulated annealing" refers to the fact that Monte Carlo simulations usually start with high temperature values and gradually lower them, analogous to annealing two strands of DNA. These temperature cycles are unique to each application. While the original EGAD had a single temperature cycle, EGAD Library allows the user control over the temperature parameter during the minimization.
Because this is a random process, the first thing we require is a random number generator. EGAD Library provides one. We prefer this generator over system-dependent ones so that results are consistent across multiple platforms.
Important
The EGAD Library generator is a pseudo-random number generator. It requires a seed to initialize its state. If you use the same seed for two Monte Carlo simulations, you will get the same result. This is useful to allow exact reproducibility, but it is important to be careful about recording your seed values and not reusing them accidentally.
Example 4.8: Initializing the random number generator
// Continue from example 3.13 // Our seed value will be 42, though a much larger integer is // recommended for any real application RandNum random(42);
Setting up the Monte Carlo minimizer involves creating an object for each of the steps above and threading them together in a master cycle object, like we did before for DEE. It will be much easier this time, because the order of the objects is obvious and their role is easy to interpret.
The first step is to introduce a random mutation. We do this using a Mutagen object. There are different types of mutagens that differ in how they introduce mutations. The one we want will introduce exactly one random mutation at a random position in the sequence each time it is called upon.
Example 4.9: Initializing the ForceMutagen
// Continue from example 4.8 // The mutagen requires the energy table to determine how many // choices are available at each position in the sequence. ForceMutagen mutagen(&energyTable, &random); // Because mutagens are shared between different minimizers, we // wrap our mutagen of choice inside the Monte Carlo mutator. MCMutator mutator(&mutagen);
The next step is to evaluate the mutation. This is done by using the energy function to calculate the energy of the new sequence and comparing it to the energy of the old sequence, as described above.
Example 4.10: Initializing the evaluator
// Continue from example 4.9 // The evaluator needs the energy table to look up the energy // of the new sequence. MCEvaluator evaluator(&energyTable, &random);
The rest is taken care of by the master cycle. Setting this up is going to look very familiar:
Example 4.11: Setting up the Monte Carlo cycle
// Continue from example 4.10 // The MCCycle controls the execution of each element MCCycle simulation; simulation.AddElement(&mutator); simulation.AddElement(&evaluator);
The actual minimization process can be as simple or as complicated as you choose to make it. Each call to Minimize() will execute one step of the Monte Carlo simulation. A step is one repeat of mutate, evaluate, accept or deny. There is no convergence for a Monte Carlo simulation, so the return value will always be ETYPE_UNKNOWN.
There are a few factors to consider. How high should the temperature start at? How quickly should it drop? How many steps should the minimization take?
We will set up a very basic temperature cycle. Starting with a completely random sequence, we will take 5,000 steps. The temperature will begin at 10,010 and drop 2 degrees each step. We want to be careful to design our cycle so that the temperature doesn't drop to or below zero!
Example 4.12: Running a Monte Carlo simulation
// Continue from example 4.11
// Create a starting sequence with enough positions.
std::vector<unsigned int> vecStartSequence(energyTable.NumPos(),0);
// This is another type of mutagen (remember ForceMutagen from 4.8)
// that, instead of introducing one mutation, randomizes the entire
// sequence.
TotalMutagen randomize(&anp;energyTable, &random);
randomize.Mutate(vecStartSequence);
// Set the random sequence as the start of the simulation.
simulation.SetSequence(vecStartSequence);
// Start at a temperature of 10,010
double fCurrentTemp = 10010.0;
unsigned int iNumSteps = 0;
// Keep cycling until we've taken 5,000 steps
while (iNumSteps++ < 5000){
// Set the temperature the evaluator should use
evaluator.SetTemp(fCurrentTemp);
// Take a step
simulation.Minimize();
// Decrement the temperature by 2 degrees
fCurrentTemp -= 2.0;
}
This might seem like more control than you want, at this point. However, this level of control allows a large amount of flexibility without sacrificing function. For more "plug and play," the common Monte Carlo cycle used in the original EGAD is available as a header file in many client applications (reference?).
Heuristic Quench
One problem with Monte Carlo is its tendency to take upward steps, even if it has found a local minimum. The decreasing temperature is one way to counteract this, but often it is coupled with a second minimization that can quickly navigate to a local minimum. This is called a quench.
What is dubbed "heuristic quench" in EGAD Library is a minimizer that does the following:
- Randomly select a position in the sequence
- Iterate over all possible rotamers at that position and find the best one for that context
- Repeat until convergence
Convergence, in this case, is when all positions have been quenched in this fashion without a single mutation. This process will find what are called "first-order" minima. While this is good for finding a local minimum after a Monte Carlo simulation, it is not reliable for a complete minimization because it only accepts downward steps.
Example 4.13: Using the heuristic quench
// Continue from example 4.12 // Give the quench an energy table and random number generator. // The second parameter is a seed value that is currently ignored. HeuristicQuench quench(&energyTable, 0, &random); // Quench the Monte Carlo solution. quench.SetSequence(simulation.GetSequence()); quench.Minimize();
It turns out that a higher order version of this algorithm, called FASTER, has been used in protein design. This has not yet been implemented in EGAD Library, but is planned in the future:
- "Fast and accurate side-chain topology and energy refinement (FASTER) as a new method for protein structure optimization." Proteins 2002 48: 31-43.
Genetic Algorithm
Last, but not least, is the stochastic minimizer responsible for the "GA" in EGAD. Genetic algorithms attempt to find a minimum by mimicking the process of evolution. Members of an initial population compete for the privilege to mate and produce offspring for successive generations.
In protein design, the population members are sequences. Each position in the sequence is a gene, with the rotamer choice as an allele. Mating with another individual involves randomly swapping alleles to form two offspring. The probability of an individual mating (fitness) is proportional to its energy value.

In the previous diagram, sequence 1 is the most fit in the parent population. During mating, sequence 1 recombines twice: once with sequence 2 and once with sequence 4. The children of 1+2 have improved their fitness by swapping alleles, while those of 1+4 decreased in fitness. Parent sequence 3 has the lowest fitness and is least likely to recombine. In this case, it did not and its alleles are lost.
Random mutations are also introduced into the population (sounds like a job for a mutagen, right?). Each generation should keep the best alleles from the previous generation along with these mutations. For example, in the previous diagram, the dark brown allele at the first position has propagated to 3 out of 4 individuals in the child population.
Genetic algorithms have many parameters. How many generations to progress through? How often should mutations be made? What should the size of the population be? Once again, there are no universal answers, and EGAD Library allows you to make the choice for yourself.
The first step is to initialize our population. The initial population can come from any source: it could be random or the resut of multiple Monte Carlo simulations.
Example 4.14: Initializing the population
// Continue from example 4.8
// Use a TotalMutagen to create random sequences
TotalMutagen randomize(&anp;energyTable, &random);
// Create a population 100 random individuals
GAPopulation population;
for (unsigned int i = 0; i < 100; ++i){
std::vector<unsigned int> vecIndividual(energyTable.NumPos(), 0);
randomize.Mutate(vecIndividual);
population.AddIndividual(vecIndividual);
}
Next, we have to set up the elements of the genetic algorithm. There is an evaluator that calculates the fitness score of each population member, a recombinase that performs a round of mating based on these fitness scores, and a mutator that introduces random mutations into the population.
Example 4.15: Setting up the GA Elements
// Continue from example 4.14 // GAEvaluator calculates fitness scores using the energy table GAEvaluator evaluator(&energyTable); // GARecombinase mates individuals to form a new population. GARecombinase recombinase(&random); // We want a mutagen that introduces random mutations at a // specified frequency. In this case, we have set the mutation // rate to 10%. RandMutagen mutagen(&energyTable, &random, 0.1); // GAMutator applies a mutagen to each individual in the population GAMutator mutator(&mutagen);
Like in the Monte Carlo, the evaluator uses a Boltzmann probability to convert energy values to fitness scores. It has a temperature parameter that dictates how "random" the genetic algorithm will be. We do not intend to adjust the temperature during the genetic algorithm, so we have not bothered to specify it.
All that remains is to create a master cycle and specify the order of the elements. Then we are ready to evolve:
Example 4.16: Running a genetic algorithm
// Continue from example 4.15
// Create the master cycle and add elements
GACycle simulation(&population);
simulation.AddElement(&evaluator);
simulation.AddElement(&recombinase);
simulation.AddElement(&mutator);
// Evolve for 100 generations
for (unsigned int i = 0; i < 100; ++i){
simulation.Minimize();
}
Each time Minimize() is called, the three elements are executed in succession. The recombinse will replace the parent population with the generated child population.
Important
We did not include an evaluate step at the end of our GA cycle, so the child population we end up with will not have fitness scores calculated for them. The GetSequence/GetEnergy functions for the GA cycle only report the first individual in the population, not the best. It does not have that information. To get the best sequence, all of the individuals must be checked and the lowest energy sequence taken.
It is possible to iterate over the population to retrieve all of the individuals.
Example 4.17: Iterating over the individuals in a population
// Continue from example 4.16
// Print the energy values for each result individual
for (GAPopulation::const_iterator i = population.begin();
i != population.end(); ++i){
std::cout << "Energy value for individual " << i - population.begin()
<< " is " << energyTable.TotalEnergy((*i).vecGenes) << std::endl;
}