back to Table of Contents

A.2 Energy function file formats

A.2.1 Forcefield file

A.2.1.1 General format

Comments

START
atomtype torsiontype
svdW evdw united_radius ASP # comment
...
pseudoatomtype   U        0.000   0.000 united_radius 0.000 # comment
...
END
constant_label   constant         # comment
...
file_label       filename         # comment
...
END

A.2.1.2 Atomtype line format

Between START and the first END, there needs to be a line for each atomtype alluded to in the resparam file.
For example (names from structure_types.h: ATOMPARAM):
CH2      CT       3.500    0.066 1.750 0.007 # sp3 methylene carbon
        "
CH2" = atomtype                 
        "CT" = torsiontype         (for torsion functions)
        "
3.500" = sigma              (van der waals sigma (svdW) )
        "
0.066" = welldepth      (van der waals well depth (evdW) )
        "1.750" = united_radius  (atom radius for surface area and born radii)
   
    "0.007" = asp                             (atomic solvation (water->octanol) parameter (ASP); for solubility design)

The van der Waals parameters are assumed to be in the OPLS-AA format. The united atom radii are for surface area and Born radii calculations . The atomic solvation parameters are based on empirical fitting of small molecule water/octanol transfer free energies.

atomtype must be unique. BHY, *BK, and PSE* are reserved for pseudoatoms (*=wildcard). U is reserved as an atomtype for zero radius, zero charge pseudo atoms, as well as a torsiontype for all pseudoatoms. CH3 is used for methyl carbons. HN is required for backbone amide protons. SH is required for CYS SG. The CT torsiontype is required for all sp3 carbons. ZZZ is reserved as an end flag.

The real atoms do not need to be in any particular order, except that atoms with the same torsiontype should be listed contiguously. Pseudoatoms must be listed contiguously after the real atoms. Spacing within a line is not important. Do not skip lines.

A.2.1.3 Constant line and file line formats

After END, forcefield constants and files are listed. For example:
VDW_ATTRACTIVE_FACTOR 2.0 # favorable vdW scale
TORSION_FILE torsion_opls # torsion params

These constants and files do not need to be listed in any particular order. Some of these can be defined in the inputfile as well; the inputfile values take precedence. The ROTAMER_LIBRARY filename may also be defined at the top of the RESPARAM_FILE (see resparam file formats below). Although the labels do not need to be in all-caps, it is recommended for the purposes of clarity. Do not skip lines.

These are the constants and files that may be defined in the forcefield file, along with their defaults. If a default is not listed, the value MUST be defined here or in the inputfile, if permitted.

VDW_RADII_SCALE                    1.0 (scale factor on vdw radius)
VDW_LINEAR_START_POINT            1000 (default does not include linearization; in JMB paper, optimal is 1.5)
VDW_CLASH_ENERGY                  2.75 kcal/mol (atompairs with EvdW
clash)
VDW_CUTOFF                        1000.0 (E≥VDW_CUTOFF are truncated to this)
VDW_ATTRACTIVE_FACTOR              1.0 (scale Evdw < 0)
VDW_REPULSIVE_FACTOR               1.0 (scale Evdw > 0)
Ep                                 0.0 (protein dielectric; must be defined here or in inputfile
FRACTION_HYDROPHOBIC_SASA_CUTOFF   1000.0 (for solubility filter)
TRANSFER_FREE_ENERGY_DENSITY_CUTOFF      1000.0 (for solubility filter)
CORESURF_MEA_SASA                  26.0     (core-surface def.)
CORESURF_MEA_BORN                  3.0      (core-surface def.)
GENERAL_ASP                        0.0 (general atomic solvation parameter)        
HYDROPHOBIC_ASP                    0.0 (atomic solvation parameter for hphob)
NONBOND_FACTOR_1_4                  (not in inputfile; scale vdw/coulomb for 1,4 energy)
WEIGHT_VDW                         1.0 (empirical weight on vdw energies)
WEIGHT_ELECTROSTATICS              1.0 (empirical weight on electrostatic energies)
WEIGHT_1_4                         1.0 (empirical weight on 1,4 energies)
UNSAT_HBOND_REFSTATE (-VDW_CUTOFF)       # reference state for unsatisfied ligand hbond groups
N_HBOND_UNSATISFIED_REFSTATE     (UNSAT_HBOND_REFSTATE) # ref.state for unsat bkbn N
H_HBOND_UNSATISFIED_REFSTATE     (UNSAT_HBOND_REFSTATE) # ref.state for unsat bkbn H
O_HBOND_UNSATISFIED_REFSTATE     (UNSAT_HBOND_REFSTATE) # ref.state for unsat bkbn O
OVERALL_ENERGY_SCALE               1.0 (empirical slope for stability prediction; JMB paper, optimal is 0.2 )
BORN_P1                                      (not in inputfile) (Born radii calculation const)
BORN_P2                                      (not in inputfile) (Born radii calculation const)
BORN_P3                                      (not in inputfile) (Born radii calculation const)
BORN_P4                                      (not in inputfile) (Born radii calculation const)
BORN_P5                                      (not in inputfile) (Born radii calculation const)
BORN_LAMBDA                                  (not in inputfile) (Born radii calculation const)
VAR_FIX_BORN_SCALE                           (not in inputfile) (Born radii calculation const)
INTRA_ROTAMER_FLAG                  1 (1 = measure intra-rotamer energy)
TORSION_FILE                                 (not in inputfile)
RESPARAM_FILE            
ROTAMER_LIBRARY           (can also be defined in resparam file or input file)
SASA_POINTS_FILE          (not in inputfile; can be generated automatically if not defined)

A.2.1.4 Comments

Comments are allowed at the top of the file, before START. After that, comments are allowed following the data on a line.

A.2.1.5 Description of the current file and tips for modification

The current file is examples/energy_function/forcefield.

The vdW parameters are from OPLS-AA (Jorgensen et al 1996) Within EGAD, the vdW combining rules are assumed to be the same as OPLS-AA (geometric for s and e). However, energy_functions.cpp: calc_vdw_tables can be easily modified to reflect another rule set. The actual vdW energies are calculated by the appropriately named functions listed in energy_functions.h. These probably do not need to be modified.

The UNSAT_HBOND_REFSTATE value may need adjustment, as discussed in the ligand-binding design section. Depending on the system, OVERALL_ENERGY_SCALE may also need to be varied. However, the remaining values should be left as they are, unless an extensive re-optimization has been performed. The optimization of these values are described in .

A.2.2 Torsion parameter file

A.2.2.1 General format

comments

START
atom1    atom2    atom3    atom4             k1       k2       k3       # comment
...
atom1    atom2    atom3    atom4             k1       k2       k3       # comment
STOP

There is no required order for the lines. Do not skip lines.

A.2.2.2 Torsion parameter format

A torsion parameter line has the following format:
atom1    atom2     atom3     atom4       k1       k2       k3
         atom1..4 = torsiontype strings
        
k1, k2, k3 = fourier series coefficients

The dihedral is between atom1 and atom4, via the atom2-atom3 bond.
The energy E for a dihedral angle f is:
E = WEIGHT_1_4*(1/2)*k1*[1 + cos(f)] + k2*[1 - cos(2*f)] + k3*[1 + cos(3*f)]      

For example:
CO CT N CN -2.365 0.912 -0.850

For dihedral angle f between CO and CN, via CT-N bond, the energy E is
E = WEIGHT_1_4*(1/2)*-2.365*[1 + cos(f)] + 0.912*[1 - cos(2*f)] + -0.850*[1 + cos(3*f)]    

A.2.2.3 Comments

Comments are allowed at the top of the file, before START. After that, comments are allowed following the data on a line.

A.2.2.4 Description of the current file and tips for modification

The current file is examples/energy_function/torsion_opls.

The torsional parameters are from OPLS-AA (Jorgensen et al 1996); NONBOND_FACTOR_1_4 should be defined as 0.5 in the forcefield file.

Within EGAD, the torsional energy model is OPLS-AA. However, it should be easy to modify energy_functions.cpp: torsion_energy in order to use another model. Since the WEIGHT_1_4*(1/2) factor is pre-multiplied to the fourier-coefficients in read_forcefield.cpp: read_torsiondata, this function may also need to be modified to use another model.

A.2.3 Resparam files

A.2.3.1 General format

comments

ROTAMER_LIBRARY filename (optional)
START (START VERSION 2 if not version 1)
residuetype header line
atomline
         ...
done
residuetype header line
atomline
         ...
done
...
residuetype header line
atomline
         ...
done
STOP
PSE atomline (freq-weighted avg centroid pseudoatom location)


Older resparam files have a slightly different format for the residuetype header line. In the interest of backwards compatibility, newer files must have "START VERSION 2" at the beginning of the parse-able section.

The order of real residues between START and STOP does not matter. However, the order of atoms within a given residue is extremely important. Spacing within any of the lines is unimportant. Do no skip lines.

If a ROTAMER_LIBRARY filename is defined both here and elsewhere, the file defined here will be disregarded.

A.2.3.2 Residuetype header formats
        
The residuetype header line contains attributes about a residuetype.

In the older version (names from structure_types.h: RESPARAM):
ASP D 4 4.00 6.481 60.3742 -0.000799 0.000008 0.00287 2.435 0.0 1.0 2.0 1.632 241.78
         "ASP" = residuetype
         "
D" = one_letter_code
         "
4" = residuecode        (residue ID number)
         "
4.00" = pKa                      (if not ionizable, pKa should be listed as 20.0)
         "
6.841" = max_vdw        (sidechain-bkbn vdW cutoff)
         "
60.3742 -0.000799 0.000008" = a b c for E_tripeptide
               
(boltzmann avg'd tripeptide energy w/ entropy: E_tripeptide = aT2 + bT + c, T=TEMPERATURE)
         "0.00287" = S                    (sidechain entropy in tripeptide; from (Creamer 2000))
         "
2.435" = E_random       (random-sequence-structure (RSS) energy; with respect to E_tripeptide at 298 K)
         "
0.0" = E_specificity    (universal specificity energy (w/ respect to E_random) )
         "1.0 2.0 1.632" = E_specificity_core, E_specificity_interfacial, E_specificity_surface
                              (core, interfacial, surface specificity energies (w/ respect to E_random); if E_specificity 0, do not have non-zero values for these energies               "241.78" = hydrophobic SASA in unfolded state; E_tripeptide += sasa*HYDROPHOBIC_ASP

The newer version is more flexible. Multiple header types can be used within the same file. A major difference is that the RSS and specificity energies are listed as absolute energies, rather than with respect to tripeptide or RSS values.
Example (names from structure_types.h: RESPARAM):
ASP D 4 4.00 5.944 0.660 0.00287 -22.229 -23.792 -23.535
       "ASP" = residuetype
         "
D" = one_letter_code
         "
4" = residuecode        (residue ID number)
         "
4.00" = pKa                      (if not ionizable, pKa should be listed as 20.0)
         "
5.944" = max_vdw        (sidechain-bkbn vdW cutoff)
         "
0.660" = E_tripeptide (tripeptide energy, w/o entropy) )
         "
0.00287" = S                        (sidechain entropy in tripeptide; from (Creamer 2000))
         "
-22.229" = E_random     (random-sequence-structure (RSS) energy)
         "
-23.792" = E_specificity        (universal specificity energy)
         "
-23.535" = E_unfolded   (unfolded state energy; for calculating Pseudo_DELTA_G_folding)

In lieu of the universal specificity energy, one may instead specify three numbers for the core, interfacial, surface specificity energies (which are NOT w/ respect to RSS or tripeptide energies):
TRP W 19 20.00 1.383825 102.172289 0.003900 -7.9378 -16.337 0.000000 -13.521
         same as previous example, except instead of E_specificity, E_specificity_reference_state
        "-16.337"  = E_specificity_hbond (E_specificity_core in source code)
        "0.000000" = E_specificity_interfacial
        "-13.521
"      =   E_solubility     (E_specificity_surface in source code)


residuetype must be a unique three character string. "*TE" (*=wildcard) is reserved for charged termini. "ZZZ" and "END" are also reserved. "UUU" is reserved for the UUU virtual residuetype. residuetype should start with a letter.
        
residuecode numbers should be unique integers <100; ≥ 100 are for charged termini. residuecode numbers are >0, except for the case of ionizable groups (pKa ≠ 20.0). For these, one of the forms (usually the prevalent form at pH 7.0), has residuecode = p > 0. Its conjugate form has residuecode = -p.

one_letter_code must be a unique character (with the exception of charged termini, which have more letters). For natural amino acids, the resparam file uses the standard FASTA code. For conjugate acids/bases, the prevalent form at pH 7.0 has the upper-case version, while the conjugate has the lower-case form.

For non-ionizable groups (or for groups for which the conjugate partner is not also described), pKa should be listed as 20.0.

The max_vdw sidechain-bkbn vdW cutoff energy is used to reject rotamers from consideration. These were estimated by surveying a subset of the PDB. The distribution of vdW energies from natural structures is gaussian; max_vdw is defined as the average plus three standard deviations.

The reference state energies are described in greater detail in (Pokala and Handel 2005).

A.2.3.3 Atomline format

The format for the atom lines in EGAD is a direct descendent of the AMBER94 topology file (Cornell et al 1996).

As an example, here is the residue parameter section for TRP:
TRP W 19 20.00 1.422176 15.057 0.0039 -15.089999 0.000000 -16.865708
    4  N   N   -1 26  6  4  0  0  0 1.335 116.600  180.000 -0.500 D
    5  H   HN  -2 4  26  6  0  6  0 1.010 119.800    0.000  0.300 H
    6  CA  CT  -3 4  26  6  0  0  0 1.449 121.900  180.000  0.140 -
    7  HA  HC  -4 6   4 26  0  0  0 1.090 109.500  300.000  0.060 -
    8  CB  CH2 -5 6   4 26  0  0  0 1.525 111.100   60.000 -0.120 -
    9  1HB HC   6 8   6  4  8 11 25 1.090 109.500  300.000  0.060 -
    10 2HB HC   7 8   6  4  8 11 25 1.090 109.500   60.000  0.060 -
    11 CG  C*   8 8   6  4  0  0  0 1.510 115.000 -180.000  0.075 -
    12 CD1 CA   8 11  8  6  0 25  0 1.340 127.000 -180.000 -0.115 -
    13 HD1 HA   8 12 11  8  0  0  0 1.090 120.000    0.000  0.115 -
    14 NE1 NW   8 12 11  8  0 25  0 1.430 107.000  180.000 -0.570 D
    15 HE1 HNW  8 14 12 11  0  0  0 1.010 125.500  180.000  0.420 H
    16 CE2 CQ   8 14 12 11 25 11  8 1.310 109.000    0.000  0.130 -
    17 CZ2 CA   8 16 14 12  0 25  0 1.400 128.000  180.000 -0.115 -
    18 HZ2 HA   8 17 16 14  0  0  0 1.090 120.000    0.000  0.115 -
    19 CH2 CA   8 17 16 14  0  0  0 1.390 116.000  180.000 -0.115 -
    20 HH2 HA   8 19 17 16  0  0  0 1.090 120.000  180.000  0.115 -
    21 CZ3 CA   8 19 17 16  0  0  0 1.350 121.000    0.000 -0.115 -
    22 HZ3 HA   8 21 19 17  0  0  0 1.090 120.000  180.000  0.115 -
    23 CE3 CA   8 21 19 17 25 11  8 1.410 122.000    0.000 -0.115 -
    24 HE3 HA   8 23 21 19  0  0  0 1.090 120.000  180.000  0.115 -
    25 CD2 CB   8 23 21 19 11  8  6 1.400 117.000    0.000 -0.055 -
    26 C   CN  -9 6   4 26  0  8  0 1.522 111.100  180.000  0.500 -
    27 O   O  -10 26  6  4  0  0  0 1.229 120.500    0.000 -0.500 A2
done

Atom lines have the following format (names from structure_types.h: ATOMRESPARAM):
23 CE3 CA   8 21 19 17 25 11  8 1.410 122.000    0.000 -0.115 -
         "23" = atomnumber
         "
CE3" = atomname
         "
CA" = atomtype
         "
8" = other_info
         "21" = contactAtomNumber
         "
19" = twobondAtomNumber
         "
17" = dihedAtomNumber
         "
25" = ringContactAtom
         "
11" = ringTwoBondAtom
         "
8" = ringDihedAtom
         "
1.410" = bondlength    
         "122.000" = bondangle
         "0.000" = dihedral
         "
-0.115" = charge
         "
-" = donorflag  (optional)

Each residue must begin with N, H, CA, HA, and CB. The atomnumber and order for these atoms are conserved in all residuetypes (4 is always N, 8 is always CB, etc). The last two atoms listed must always be C and O.

atomnumber:
        unique ID number for an atom within a residue
        For historical reasons (from the AMBER94 file), atoms are numbered starting with 4 for N,
atomname:
        unique string for an atom within a residue
        nomenclature for protein amino acid atoms follows standard PDB naming conventions
atomtype:
        forcefield atomtype for this atom; must correspond to an atomtype listed in the forcefield file
other_info number:
        another atom identification number
        other_info numbering usually starts with 1 for N
        Numbering for backbone atoms (including CB, all atoms for gly, pro residues) is denoted by a minus sign.
        Atoms within the same rigid body (such as a ring) should have identical other_info numbers.
        For example, the members of the trp ring all have other_info 8.

Residues are built outward from given anchor backbone coordinate atoms. contactAtomNumber, twobondAtomNumber, and dihedAtomNumber are used to identify atoms whose coordinates are required for building the atom of interest; given a dihedral angle, three atomic coordinates are required to build the fourth one (see dihedral_cartesian.cpp: chi2xyz).

When appropriate, backbone atoms are built in a similar manner, N-terminus to C-terminus. Within a given residue parameter section, the backbone atoms N, H, CA, HA, C, O, and CB are described as if the backbone formed a closed ring, in which C for this residue is directly bonded to N for this residue.

contactAtomNumber:
        atomnumber of the atom from whose coordinates this atom is directly built from.
twobondAtomNumber:
        contactAtomNumber and twobondAtomNumber form the bond whose rotation is described by dihedral.
        This atom forms bondangle angle with twobondAtomNumber, with contactAtomNumber as the vertex.
dihedAtomNumber:
   dihedral
is the dihedral angle between this atom and dihedAtomNumber

The ring atomnumbers are used to identify other atoms this atom is connected to via ring closure; these are used for energy calculations, rather than for atom coordinate generation. If 0, ignore.

ringContactAtom:
         atomnumber of an atom this atom is directly bonded to, via ring closure.
ringTwoBondAtom:
         atomnumber of an atom this atom is two bonds away from, via ring closure.
ringDihedAtom:
         atomnumber of an atom this atom is three bonds away from, via ring closure.

Given this minimal connectivity data, this information is sufficient to infer the connectivity between any pair of atoms within a residue.

bondlength:
        bondlength to contactAtomNumber atom,
bondangle:
        bondangle between this atom, contactAtomNumber atom, and twobondAtomNumber atom.
   contactAtomNumber
is the vertex
dihedral:
        dihedral angle to dihedAtomNumber atom.
        IMPORTANT: if listed dihedral<0.0, then this bond is classified as rotatable.
            In this example, the dihedrals for atoms 11 and 12 (CG and CD1) are <0.
            These values will be replaced internally by values from the rotamer library.
donorflag:
         Gives information about the hydrogen-bonding status of an atom. This field is optional; default to '-'.
         '-' = not involved with hydrogen bonds
         'H' = donate-able hydrogen
         'D' = can donate hydrogens
         'An' = hydrogen bond acceptor; can accept n hydrogens (n is integer)
         'Bn' = both a donor and an acceptor; can accept n hydrogens (n is integer)
                          
A.2.3.4 Comments

Comments are permitted before the START line, but not in any data-containing lines.

A.2.3.5 Description of the current file

The current file is examples/energy_function/resparam.

The bondlengths and angles are taken directly from the AMBER94 topology file (Cornell et al 1996). The charges are from OPLS-AA (Jorgensen et al 1996). The determination of the reference state energy values is described in (Pokala and Handel 2005).

The UUU virtual residue is primarily used for giving free-moving ligands an anchor backbone position.

A.2.4 Rotamer library file

A.2.4.1 General format

comments

START (START VERSION 2 if not version 1)
rotamer header line
rotamerline
         ...
rotamer header line
rotamerline
         ...
...
rotamer header line
rotamerline
         ...
STOP


Older rotamerlib files have a slightly different format for the rotamer header line. In the interest of backwards compatibility, newer files must have "START VERSION 2" at the beginning of the parse-able section.

The order of residues between START and STOP does not matter. There is no required ordered of rotamers within a section. Spacing within any of the lines is unimportant. Do not skip lines.

A.2.4.2 Rotamer headerline - Old version

In the older version, the headers have the following format:
restype  num_library_rotamers     number_of_rotamers       numChi
    restype
                   (residuetype; must correspond to a residuetype in the resparam file)
    num_library_rotamers
  (number of rotamers listed here for this residuetype)
    number_of_rotamers
      (the desired final number of rotamers for this residuetype, after spreading)
    numChi
                    (number of dihedral angles)

Based on these values:
library_spread_ratio = number_of_rotamers/num_library_rotamers
Dihedrals up to chiln(library_spread_ratio)/ln(3) are spread 1 std. dev. from each of the listed rotamers.

For example:
PHE 5 15 2
         "PHE" = restype          
         "
5" = num_library_rotamers       (number of rotamers listed here for this residuetype)
         "
15" = number_of_rotamers                 (number of rotamers we would like, after spreading)
         "
2" = numChi                                (number of variable dihedral angles for this residuetype)
In this example, library_spread_ratio = 3. This means that dihedrals up chi1 will be spread 1 std. dev. from each of the listed rotamers, for a total of 15 rotamers.

If we want to spread both chi1 and chi2:
PHE 5 45 2
   
"45" = number_of_rotamers                 (number of rotamers we would like, after spreading)
For this, library_spread_ratio = 9. This means that rotamers through chi2 will be spread 1 std. dev. from each of the listed rotamers.

If we do not wish to spread rotamers, then:
PHE 5 5 2
The library_spread_ratio = 1. Since ln(1)/ln(3) = 0, no dihedrals are spread.

A.2.4.3 Rotamer headerline - New version

Defining spreading is much simpler in the newer rotamerfiles. In newer rotamer library files, there are two possible formats.

If we wish to spread rotamers:
restype  numChi   chi_angle_to_spread_to
    restype
                (residuetype; must correspond to a residuetype in the resparam file)
        numChi                     (number of dihedral angles)
        chi_angle_to_spread_to   (spread 1 std dev to chichi_angle_to_spread_to)

The number of rotamers do not need to be defined; they will be counted by the program.

For example:
PHE 2    1
"PHE" = residuetype                        (number of rotamers listed here for this residuetype)
         "2" = numChi                               (number of variable dihedral angles for this residuetype)
         "1" = chi_angle_to_spread_to
This will result in spreading chi1 1 std. dev, for a total of 15 rotamers.
                                   
If we wish to spread both chi1 and chi2:
PHE 2    2
"PHE" = residuetype                        (number of rotamers listed here for this residuetype)
         "2" = numChi                               (number of variable dihedral angles for this residuetype)
         "2" = chi_angle_to_spread_to
This means that rotamers through chi2 will be spread 1 std. dev. from each of the listed rotamers.

Of course, if we do not wish to spread the rotamers for this residue, simply leave the last field empty:
PHE 2
        "
PHE" = residuetype              (number of rotamers listed here for this residuetype)
      "2" = numChi                                 (number of variable dihedral angles for this residuetype)

A.2.4.4 Rotamerline format

Rotamer lines are taken directly from Dunbrack's backbone-independent library (Dunbrack and Cohen 1997).

restype class n(r1) n(r1234) p(r1234) s p(r234|r1) s c1 s1 c2 s2 c3 s3 c4  s4
    1 2 3 4
   
restype          (residuetype; must correspond to a residuetype in the header line)
          
p(r1234)         (frequency of this rotamer class in the database)
          
cn                 (average value for this angle class for chin in this rotamer class)
          
sn                 (std. dev. (sigma) observed for this angle class for chin in this rotamer class)

The only important entries for EGAD are the restype, p(r1234), and the chin, sigman fields.

For example:
PHE 2 1 0 0 2058 1935 31.97 0.49 93.99 0.43 -178.8 11.2 77.0 13.8

            "
PHE" = restype
            "
2 1 0 0" = class rotamer class; chi1 = class[2], chi2= class[1]
         "31.97" = p(r1234) frequency of PHE rotamer class [2 1 0 0] in the database
            "
-178.8" = chi1
         "11.2" = sigma1
         "77.0" = chi2
         "13.8" = sigma2

A.2.4.5     Position-specific rotamers

If an additional field follows the rotamerline described, it is read as a position-specific rotamer. For the PHE example discussed here, this rotamer is used only for position 15:
PHE 2 1 0 0   2058   1935   31.97   0.49   93.99   0.43 -178.8 11.2     77.0 13.8    15

The following rotamer is used only for position 12A:
PHE 2 1 0 0   2058   1935   31.97   0.49   93.99   0.43 -178.8 11.2     77.0 13.8    12A

A.2.4.6 Comments

Comments are permitted before the START line, but not in any data-containing lines.

A.2.4.7 Description of current file and tips for modification

The current file is examples/energy_function/dunbrack.rotamers.

Rotamer lines are taken directly from Dunbrack's backbone-independent library (Dunbrack and Cohen 1997). In the current file, rare R, M, K, and F rotamers were trimmed. Spreading (for chi1 only) is done for all rotamers with less than 27 library rotamers.

Many of the fields in Dunbrack's rotamerline format are not used by EGAD. However, since Dunbrack periodically updates his library on his website, it was felt that it would be simplest to support his format. It is necessary to insert header lines at the appropriate locations. If conjugate acids/bases are included, it is necessary to include a rotamer library entry for each form.

A.2.5 Ligand parameter files

Ligand files have three parts. The first part is a resparam entry that describes connectivities and geometries. The second part lists the ligamers. The third part is a pdb-style list of the base coordinates used as a starting point for transformations.

A.2.5.1 General format

comments

START VERSION 2
residuetype header line
atomline
         ...
STOP
START VERSION 2
ligamer header line
         ligamer
         ...
STOP
START
pdb-style base coordinate line
...
END
STOP

Each section is bracketed by START and STOP. Coordinates are defined as pdb-style lines; individual entries are bracketed by START and END.

A.2.5.2 Resparam entry section

This section has a format identical to the resparam entries described above. There are a few more header-line variations allowed for ligands:
PRT w 49
         "PRT" = residuetype
         "w" = one letter code
         "49" = residue ID number

PRT w 49 12.4
         "12.4" = ligand-backbone vdW cutoff; ligamers with vdW interaction energy greater than this are rejected

PRT w 49 12.4 -5.4
         "-5.4" = solvation energy cutoff; ligamers with solvation energy greater than this are rejected

The residuetypes, one letter codes, and residue ID numbers must be unique and not used by any residue in the main resparam file.

By default, the ligand-backbone vdW cutoff is set to 10 kcal/mol and solvation energy cutoff set to 1000 kcal/mol.

If the hydrogen-bonding specificity energy is not defined, then the defined hydrogens, donors, and acceptors are treated as described above for backbone hydrogen-bonding atoms.

If the reference state energy is not defined in the ligand parameter file, the program will automatically calculate the energy for the isolated ligand base coordinates. For rigid ligands, this may be reasonable, but, as for protein sidechains, some form of conformational averaging might be required for flexible ligands. A possible approximation would be to use a minimized structure for the ligand base coordinates.

A.2.5.2.1 Generation of resparam section from a ligand pdb file

The ExtendedSidechain program by Mark Voorhies (ExtendedSidechain documentation) may be used to create this section from a pdb file (myligand.pdb) of the ligand. An example of this is given in the examples/energy_function/ligands/maltose/ directory.

If the coordinates in myligand.pdb are from a crystal structure, hydrogens should be added by some program such as Pymol or PRODRG. If the ligand of interest is somewhere in the pdb, find it at the MSD database; this server also adds hydrogens. Make sure that the coordinates of the heavy atoms are not changed significantly if the hydrogen-built structure is minimized.

Edit the ligand pdb file so that each line has a "ATOM" instead of "HETATM" This may be done with awk:
cat myligand.pdb | awk 'sub("HETATM","ATOM "); print ' > temp ; mv temp myligand.pdb

This edited file may be used as input for ExtendedSidechain:
ExtendedSidechain AUU.pdb myligand.pdb -reslib_file old_resparam_file_for_ligand_building -anchor_res 216 -anchor_chain Q -res_outfile myligand.param -remove_flexibility true

The other argument files are in examples/energy_function/ligands/. AUU.pdb is a structure of a tripeptide that ExtendedSidechain gets the anchor (-anchor_res 216 -anchor_chain Q) to build the ligand resparam description from the ligand coordinates listed in myligand.pdb. ExtendedSidechain requires using an older version of the EGAD resparam file (old_resparam_file_for_ligand_building).

The output (myligand.param) must be modified. The atomtype CH3 on ligands sometimes causes problems; change these to CT. Change the backbone atomtypes to U. Finally, make sure that the ligandtype, one_letter_code, and residuecode are unique (not already used in the resparam file), and that atomnames are not reserved (not listed in examples/energy_function/RESERVED_NAMES). Define hydrogen-bonding properties for each atom, and indicate rotatable bonds as described above for amino-acid residues.

Determining optimal methods for charge assignments is a major challenge in and of itself. While there are many ways to assign charges, a simple and self-consistent way is to use the OPLS-AA rules, since these charges have been optimized in concert with the van der Waals terms to reproduce small molecule data, and should be compatible with the EGAD energy function . Charge assignments with this rule set may be automated, and do not require quantum calculations. The OPLS-AA atom descriptions are listed in examples/energy_function/ligands/oplsaa.txt.

A.2.5.3 Ligamer section

A.2.5.3.1   Automatic generation of ligamers

Ligamers may be defined in several ways. The simplest way is for the program to generate them automatically.
(resparam section here)
START VERSION 2
PRT GENERATE_LIGAMERS
STOP
(base coordinate section here)

The maximal displacement, translation stepsize, and rotation stepsize may also be user-defined:
(resparam section here)
START VERSION 2
PRT GENERATE_LIGAMERS    rotation_stepsize        translation_stepsize     max_displacement
STOP

(base coordinate section here)

The defaults (if stepsizes and displacement are not user-defined):
         rotation_stepsize = 20.0 degrees
         translation_stepsize = 1.0
Å
         max_displacement = 2.0 Å

These will generate a grid of ligand positions. The ligamers are translated max_displacement from the base coordinates, with translation_stepsize steps, along each axis (x,y,z). At each of these points, 13 arbitrary rotation axes (x, y, & z axes and all diagonals with the origin at the ligand centroid) are defined <x y z>:
<1 1 1> <1 0 0> <0 1 0> <0 0 1> <1 1 0> <1 0 1> <0 1 1> <1 -1 0> <1 0 -1>
<0 1 -1> <1 -1 -1> <-1 1 -1> <-1 -1 1>

The ligand is rotated -180.0 -> 180.0, in rotation_stepsize steps, around each axis. In total, the default settings create 29252 ligamers. The backbone-clash vdW cutoff criterion reduces this number significantly.

For the maltose example (examples/energy_function/ligands/maltose/maltose.ligand), the stepsizes and maximum displacement are manually defined, resulting in a much more coarse sampling than the default values:
MAL GENERATE_LIGAMERS 36 1 2
For this example:
                  rotation_stepsize = 36.0 degrees
         translation_stepsize = 1.0
Å
         max_displacement = 2.0
Å

Automatically generated ligamers also include the base coordinates as a ligamer. Automatic generation is not an option for ligands with rotatable bonds.

A.2.5.3.2   Ligamer line format

Ligamers may also be directly defined by transform matrices. If this is done, the header-line for this section must have the residuetype and the number 12 (12 "dihedrals" per ligamer) . For example:
(resparam section here)
START VERSION 2
PRT 12
1 0 0 0 0 1 0 1 0 0 1 0
...
STOP
(base coordinate section here)


A structure with rotatable bonds can be defined in a similar manner. For example:
(resparam section here)
START VERSION 2
PRT 15
1 0 0 0 0 1 0 1 0 0 1 0 90.0 -70.0 180.0
...
STOP
(base coordinate section here)


In this example, there are 15 "dihedrals" (12 transform matrix + 3 actual dihedrals). The dihedral angles are listed after the transform matrix elements ( 90.0 -70.0 180.0 ). The rotatable bonds must be defined in the resparam entry as is done with regular amino acid sidechains.

As discussed above for conventional rotamers, ligamers can be defined as being position specific. For example:
START VERSION 2
PRT 15
1 0 0 0 0 1 0 1 0 0 1 0 90.0 -70.0 180.0    2l
1 0 0 5 0 1 0 1 0 0 1 0 180.0 -70.0 60.0    1l
...
STOP


The first ligamer is used only for ligand position 2l, while the second is used only for 1l. This option may be useful if multiple copies of the same ligand are permitted.

The base coordinates are not included as a choice, unless the stationary transform matrix ligamer is also listed:
1 0 0 0 0 1 0 0 0 0 1 0

A.2.5.4 Base coordinate section

Following the STOP of the ligamer section, a base coordinate entry must be defined. If the ligand coordinates are listed in the template pdb file, the coordinates from the template file will supercede this one. Nevertheless, coordinates must also be defined here.

The base coordinates are a pdb entry, bracketed by START and END. A final STOP line marks the end of the file. For example:
(resparam section here)
(ligamer section here)
START
ATOM 60 1HB SYL 4 13.876 19.001 19.708 0.060 1.250
...
ATOM 74 3HZ SYL 4 16.477 20.457 16.826 0.330 0.814
END
STOP

A.2.5.5 Ligamers defined by coordinates

Ligamers and base coordinates can be combined and defined by successive pdb-style coordinate entries. To indicate this, the header line should have COORDINATES in place of the "number of dihedral angles" entry. For example:
(resparam section here)
START VERSION 2
SYL COORDINATES
START
ATOM 160 1HB SYL 10 1.740 11.485 3.560 0.060 1.250
...
END
START
ATOM 60 1HB SYL 4 13.876 19.001 119.708 0.060 1.250
...
END
...
END
STOP

If coordinates are defined, there is no need to define a base coordinate entry.

Coordinates can also be listed in a separate file:
(resparam section here)
START VERSION 2
SYL COORDINATE_FILE coord_file
STOP    

The coordinate file lists coordinates sequentially, bracketed by START and END for each entry, and terminating with a STOP at the end of the file, as is done when they are listed in the main ligand parameter file.

back to Table of Contents