EGAD Library Manual

Minimizers

Deterministic Algorithms

The first class of minimizers we will consider are those that are deterministic. In other words, they are known to produce the same result every time they are run. This class of minimizer is also guaranteed to find the global minimum energy conformation. Of course, this convenience comes with a cost. Deterministic minimizers are often not tractable for very large design space.

Exhaustive Enumeration

The simplest minimizer is one that searches every possible sequence. This is, of course, also the weakest minimization algorithm and is useless for most protein design applications. Still, there are cases where an exhaustive search of a smaller space can be helpful.

The following example is useful because it illustrates both the setup and execution of a simple minimizer. All the minimizers in EGAD Library are derived from a base class that defines this common interface, and they can all be used in a similar fashion.

Example 4.1: Setting up the exhaustive search minimizer

// Continue from example 3.13

// We will be using the pre-calculated energy table from the 
// previous chapter.
ExhaustiveSearch exSearch(&energyTable);

// Run the minimization and save the convergence result
ERROR_TYPE convergence = exSearch.Minimize();

// Check if the minimizer converged
if (convergence != ETYPE_OK) {
  // We didn't converge, something went horribly wrong!
  // Should print an error message here ...
  return EXIT_FAILURE;
}

// We converged, so let's get our results.
std::vector<unsigned int> vecResult = exSearch.GetSequence();
double fEnergy = exSearch.GetEnergy();

The idea of convergence might seem strange here. An exhaustive search of all possible solutions, by definition, will find the lowest energy sequence. This means it should always converge. However, there are minimizers that do not always converge and something might have gone wrong during the search, so we should check for convergence before assuming we have the correct answer.

The answer, in this case, is an array of integer values corresponding to choices in the design space. If there are 10 floating positions, then there will be 10 integers, each indicating the optimal rotamer choice at that position. These can be passed back to the physical model in order to understand their physical meaning in the context of the protein structure.

Example 4.2: Printing a result structure

// Continue from example 4.1

// Create an output stream to write the structure to
std::ofstream foutResult("result.pdb");

// Use the Protein object to write the result structure
myProtein.WritePDB(foutResult, vecResult);
foutResult.close();

// Loop over the result and print the names of the amino acids to stdout
std::cout << "Printing solution:\t";
for (unsigned int i = 0; i < vecResult.size(); ++i){
  std::cout << myProtein.BuildPosition(myProtein.FloatToPosIndex(i),
                                       vecResult[i]);
  std::cout << "."
}
std::cout << std::endl;

Branch and Terminate

One step up from exhaustive enumeration is a common search algorithm called "branch and bound." The idea is that, if we know the best energy we have encountered, we can ignore any sequences that are obviously above that energy.

The protein conformation space is expressed as a tree, with each position being a level of the tree and each conformation being a branch from that level. A single conformation is represented by a path from the root of the tree to the tip of one branch.

Search Tree Diagram

In the diagram above, position 1 has 4 choices, 2 has 2 choices, and 3 has 3 choices. The blue path indicates choice 1,1,1. By using bounding expressions, a search algorithm can determine if a partial sequence (red) has any chance of beating the current best energy sequence (blue). If it can't, the rest of that branch can be ignored (grey box).

The Mayo lab has established bounding criterion useful for protein design and dubbed their algorithm "branch and terminate." For most spaces, branch and terminate is faster than exhaustive search while still guaranteeing the best solution. However, for very small spaces, calculating the bounding criterion may be a higher overhead than running a simple exhaustive search.

Example 4.3: Setting up the branch and terminate minimizer

// Continue from example 3.13

// Create the minimizer object
BranchAndTerminate btSearch(&energyTable);

// Run the minimization and save the convergence result
ERROR_TYPE convergence = btSearch.Minimize();
std::vector<unsigned int> vecResult = btSearch.GetSequence();

For more information about branch and terminate:

Dead End Elimination

Possibly the most advanced and powerful deterministic algorithm available to protein design, Dead End Elimination (DEE) is able to address much larger spaces than either branch and terminate or exhaustive enumeration can. This increased power comes with a huge increase in complexity.

DEE uses logical rules to eliminate parts of the design space that cannot be part of the global minimum conformation. As an illustration, we will consider a simple elimination criterion called the "Goldstein" criterion.

Take two rotamers, r and t, at the same position, i. Consider r first. For each other position, find the highest energy rotamer that interacts with r. By summing these interactions together, we can find the worst-case energy that r can have, in any context. Now consider t. For each other position, find the lowest energy rotamer that interacts with t. We now have a best-case energy for t, in any context.

Goldstein Criterion Math

If the worst-case energy for r is better than the best-case energy for t, then r eliminates t from the design space. There is no context where t could be a better choice than r at position i. This rule alone is powerful, but there are many more rules that, when applied in sequence, can reduce the size of the space significantly.

More complex criterion have been described by both the Mayo and Hellinga labs. They are implemented in the library, but not described in detail here. For more information about them, see:

DEE rules are usually applied iteratively until the space converges (there is only one solution left), or there are no more productive eliminations to make. Setting up DEE involves wrapping an energy function in an object that allows eliminations to be made. A DEETable object can record the elimination status of rotamers and shrink the space appropriately.

Example 4.4: Setting up the DEE Elimination table

// Continue from example 3.13

// Wrap the precalculated energy table in an elimination table
DEETable elimTable(&energyTable);

// Create the DEE master cycle object
DEEMasterCycle master(&elimTable, 1, 5.0*log(10.0));

We passed the elimination table to a master cycle object. The master cycle is the DEE minimizer, but is currently useless because we have not specified what rules to apply and in what order. The other parameters in the constructor will make sense soon.

Important

The order that DEE rules are applied in is arbitrary. Different orders can have large effects on the speed of convergence. It is also possible for a DEE minimization to get stuck. If DEE is unable to converge the space, the best alternative is to take the eliminated space and use another minimizer on it.

Next, the DEE rules have to be set up. This isn't exactly rocket science:

Example 4.5: Setting up DEE Rules

// Continue from example 4.4

// Logical Singles
// This rule is used to check the elimination table for any "obvious"
// eliminations. It should be run after each rule to make sure the table
// makes sense.
DEELogicalSingles filterLS;

// Goldstein Singles
// This rule was just described earlier.
DEEGoldsteinSingles filterGS;

// Split Singles
// This is an advanced rule described in the Looger paper above.
DEESplitSingles filterSS;

There are (many) other rules, but for the sake of simplicity, we'll only use these three. Now the master cycle must be instructed how to use them:

Example 4.6: Adding rules to a master cycle

// Continue from example 4.5

// Before starting any cycle of minimization, make sure the table is
// logically consistent.
master.AddFilter(&filterLS);

// Run Goldstein first, followed by Split Singles
master.AddFilter(&filterGS);
master.AddFilter(&filterLS);
master.AddFilter(&filterSS);

Whenever we call the Minimize() function on the master cycle, it will run the Logical, Goldstein, Logical, and Split rules, in that order. The second parameter to the master cycle constructor (example 4.4) was the number of times to repeat the cycle per call to Minimize(). There are three possible convergence outcomes to a Minimize() call. The most likely is that the space has not converged yet. The second is that the space has converged to a single, unique solution.

The last possibility involves the third constructor parameter (example 4.4). This is a threshold value for the number of possible solutions left in the space. When the minimizer has eliminated enough of the space to pass this threshold, it is usually more efficient to switch to an exhaustive search algorithm to find the minimum solution. It makes sense to do this for threshold values where an exhaustive search would take a second or two, while successive rounds of DEE would take minutes or hours.

These three possibilities are checked via the return value of the Minimize() function:

Example 4.7: Minimizing using DEE

// Continue from example 4.6

// DEE might get stuck, so impose a maximum number of cycles
const unsigned int MAX_CYCLES = 10;
unsigned int iNumCycles = 0;

// ETYPE_NOT_FOUND means the space has not converged or passed
// the necessary threshold value.
ERROR_TYPE result = ETYPE_NOT_FOUND;

while (result == ETYPE_NOT_FOUND){
    // Perform a round of eliminations
    result = master.Minimize();

    // Check if we have exceeded our cycle count
    if (++iNumCycles > 10)
        break;
}

// Now we check the exit status of the minimization
std::vector<unsigned int> vecResult;
switch (result){
  case ETYPE_OK:   // Converged on a single solution (lucky!)
    vecResult = master.GetSequence();
    break;

  case ETYPE_OUT_OF_RANGE: // Threshold value passed
    // Use branch and terminate to search remaining space
    BranchAndTerminate branch(&elimTable);
    branch.Minimize();
    vecResult = branch.GetSequence();
    break;

  default:
    std::cerr << "DEE could not converge space!" << std::endl;
    break;
}

// Finally, we convert the sequence from the eliminated table 
// space to the space of the underlying table.
elimTable.ConvertChoices(&vecResult);

The last step deserves further consideration. What is the difference between the eliminated table and the original table? When a position is eliminated from the table, it disappears from the space. For example, if there were 4 rotamers at position 1, and one of them was eliminated, the elimination table would report only 3 rotamers at position 1.

Of course, if the space converges, there is only one rotamer at each position. In the ETYPE_OK case in example 4.7, the vector contains all zeros. In the ETYPE_OUT_OF_RANGE case, however, it contains indices of rotamers in the elimination table. We don't care about these indices; we want to know about the indicies in the original table we were working with. The final step converts the indicies to those.

Important

This feature is more important than it might seem. We've taken advantage of it here, already. After eliminating as much as DEE can, the elimination table can be used by another minimizer (branch and terminate) to search the remaining space. This can be used on larger spaces with minimizers like Monte-Carlo, as well. Always remember to convert the choices back to those of the original energy table!