Physical Model
Atom and Residue Data
The physical model in EGAD Library is built from the bottom up. Data about atomic parameters is set up before molecules can be built. Small molecules are built and combined into larger macromolecules such as proteins.
Atoms and Atom Types
Atom data is stored in a class called Atom, which holds coordinate, serial number, charge, name and type data. The coordinates of an atom define its location in 3D space. Serial numbers are not required to be unique, but are useful for differentiating atoms within a molecule. The electronic charge of an atom is important in some energy calculations. Atom names are arbitrary strings, usually corresponding to the names assigned in PDB files.
Atom types are used by energy functions to look up additional parameters for an atom. Atom types vary based on the application and are defined in an input file called the Forcefield File. The types are stored in an atom type library. Type libraries read data from Forcefield Files and assign unique ID's to each atom type.
An example of where atom types are useful is the Lennard Jones energy term. In order to calculate an energy, this term requires an equillibrium atomic raidus for each atom type. This data is not found within the Atom class, so the energy term uses the type ID to look up the data from another source (the type library ensures that the ID's used by the Atom class and the other data source are consistent).
Example 2.1: Reading atom types from a Forcefield File
// Open an input stream for the Forcefield File std::ifstream finForcefield; // defined in <fstream> finForcefield.open("forcefield.txt"); // Read Forcefield File into an EGAD Library object Egad::ForcefieldFile forcefield; // defined in "EGAD_ForcefieldFile.h" forcefield.Load(finForcefield); // Close input stream for Forcefield File finForcefield.close(); // Create an atom type library from the forcefield data Egad::TypeLib atomTypes; // defined in "EGAD_TypeLib.h" atomTypes.LoadFromForcefield(forcefield);
Residues and Rotamer Libraries
Most atoms in a protein design calculation are not floating freely in solution. Rather, they are bound to other atoms to form molecules. The word residue is often used to describe an amino acid molecule that is part of a protein. The library supports arbitrary molecules through its Molecule class. This is a container of atoms that also stores connectivity data between them. The Residue class is a child of Molecule that contains additinal functions to easily access backbone atoms.
In order to build protein structures, EGAD Library requires a collection of prototype residue structures referred to as a residue library. Usually this includes one example of each residue type that can exist in the protein. These prototypes are used when reading proteins from PDB file data, for example. However, we have already discussed that each residue type may adopt multiple rotameric conformations.
Data for creating a residue library is contained in a resparam file. This file defines connectivity and torsional building data for prototype residues. A resparam file contains only one entry for each type of residue.
Example 2.2: Creating a residue library
// Continue from example 2.1
// Open an input stream for the resparam file
std::ifstream finResparam; // defined in <fstream>
finResparam.open("resparam.txt");
// Read resparam file into an EGAD Library object
Egad::ResparamFile_2 resparam; // defined in "EGAD_ResparamFile.h"
resparam.Load(finResparam);
// Close input stream for Forcefield File
finResparam.close();
// Create a residue library from the resparam file data
Egad::ResLib residueLibrary; // defined in "EGAD_ResLib.h"
residueLibrary.LoadFromResparam(resparam, atomTypes);
Residue libraries can be made into rotamer libraries by loading data from a rotamer file. Rotamer data consists of a set of chi-angles (and standard deviations) for each rotable bond in a residue. Rotamer libraries with this data can be downloaded from the Dunbrack Lab in a similar format to that used by EGAD Library rotamer files. When a rotamer file is read into an existing residue library, the single prototype structure for each residue type is expanded into a collection of prototype conformations (one per rotamer). These prototypes can then be used to define potential mutations for a protein.
Example 2.3: Making a residue library into a rotamer library
// Continue from example 2.2
// Open an input stream for the rotamer file
std::ifstream finRotamers; // defined in <fstream>
finRotamers.open("rotamers.txt");
// Read resparam file into an EGAD Library object
Egad::RotamerFile rotamers; // defined in "EGAD_RotamerFile.h"
rotamers.Load(finRotamers);
// Close input stream for Forcefield File
finRotamers.close();
// Expand the current residue library to contain rotamers
residueLibrary.LoadRotamers(rotamers);
Accessing a Rotamer Library
The are multiple ways to access residues contained within a rotamer library. The simplest means of access will be familiar to anyone who has worked with arrays in C:
Example 2.4: Array style access to rotamer library
// Continue from example 2.3
// Check how many residues exist in the library
unsigned int iSize = residueLibrary.Size();
// For each residue in the library, print its name
for (unsigned int i = 0; i < iSize; ++i){
std::cout << "Residue number " << i << " is named " <<
residueLibrary[i].Name() << std::endl;
}
Another method of accessing residues is to search for them by name. Each Molecule has an associated name (such as "ARG" for arginine). Every rotamer of arginine should share its name, "ARG":
Example 2.5: Searching for a residue type
// Continue from example 2.3
// Find the index of the first residue to match the name "ARG"
unsigned int iIndex = residueLibrary.FindResidue("ARG");
// The three letter codes used in this example correspond to the abbreviations
// for these residues listed in the resparam file.
Modifying the Rotamer Library
It may be necessary to modify the rotamer library after it has been loaded. The simplest operations are adding and deleting residues, but a more common situation is when some residues in the library are not intended to be used in the design calculation. For example, one might wish to have cysteine, glycine and proline residues present so that proteins containing these residues can be properly built from PDB files, but it is undesirable to allow positions to mutate to these amino acids.
It is possible to mark certain residues so that they will not be considered as mutations during a design calcultation. The role of this behavior will become more clear when discussing proteins:
Example 2.6: Setting the mutation status for a residue type
// Continue from example 2.3
// We do not want to allow cysteine, proline and glycine mutations
residueLibrary.AllowMutation("CYS",false);
residueLibrary.AllowMutation("PRO",false);
residueLibrary.AllowMutation("GLY",false);
// The three letter codes used in this example correspond to the abbreviations
// for these residues listed in the resparam file.