back to Table of Contents
A.2 Energy function file
formats
A.2.1 Forcefield file
A.2.1.1 General format
Comments
START
atomtype torsiontype svdW evdw united_radius ASP # comment
...
pseudoatomtype
U 0.000 0.000
united_radius 0.000 # comment
...
END
constant_label
constant
# comment
...
file_label
filename
# comment
...
END
A.2.1.2 Atomtype line format
Between START and the first END, there needs to be a line for each atomtype
alluded to in the resparam file.
For example (names from structure_types.h:
ATOMPARAM):
CH2
CT
3.500 0.066 1.750 0.007 # sp3 methylene carbon
"CH2" = atomtype
"CT" = torsiontype
(for torsion functions)
"3.500" = sigma
(van der waals sigma (svdW) )
"0.066" = welldepth
(van der waals well depth (evdW) )
"1.750" = united_radius
(atom radius for surface area and born radii)
"0.007" = asp
(atomic solvation
(water->octanol) parameter (ASP); for solubility design)
The van der Waals parameters are assumed to be in the OPLS-AA format.
The united atom radii are for surface area and Born radii calculations
. The atomic solvation parameters are based on empirical fitting of
small molecule water/octanol transfer free energies.
atomtype must be unique. BHY, *BK, and PSE* are reserved for pseudoatoms (*=wildcard). U is reserved as an atomtype
for zero radius, zero charge pseudo atoms, as well as a torsiontype for all pseudoatoms. CH3 is used for methyl carbons. HN is required for backbone amide protons. SH is required for CYS SG.
The CT torsiontype
is required for all sp3 carbons. ZZZ is
reserved as an end flag.
The real atoms do not need to be in any particular order, except that
atoms with the same torsiontype should be
listed contiguously. Pseudoatoms must be listed contiguously after the
real atoms. Spacing within a line is not important. Do not skip lines.
A.2.1.3 Constant line and file line formats
After END, forcefield constants and files
are listed. For example:
VDW_ATTRACTIVE_FACTOR 2.0 #
favorable vdW scale
TORSION_FILE
torsion_opls # torsion params
These constants and files do not need to be listed in any particular
order. Some of these can be defined in the inputfile as well; the
inputfile values take precedence. The ROTAMER_LIBRARY
filename may also be defined at the top of the RESPARAM_FILE
(see
resparam file formats below).
Although the labels do not need to be in all-caps, it is recommended
for the purposes of clarity. Do not skip lines.
These are the constants and files that may be defined in the forcefield
file, along with their defaults. If a default is not listed, the value
MUST be defined here or in the inputfile, if permitted.
VDW_RADII_SCALE
1.0 (scale factor on vdw
radius)
VDW_LINEAR_START_POINT
1000 (default does not include linearization; in JMB paper, optimal is
1.5)
VDW_CLASH_ENERGY
2.75 kcal/mol
(atompairs with EvdW ≥ clash)
VDW_CUTOFF
1000.0 (E≥VDW_CUTOFF are
truncated to this)
VDW_ATTRACTIVE_FACTOR
1.0 (scale Evdw < 0)
VDW_REPULSIVE_FACTOR
1.0 (scale Evdw > 0)
Ep
0.0 (protein dielectric;
must be defined here or in inputfile
FRACTION_HYDROPHOBIC_SASA_CUTOFF 1000.0 (for solubility
filter)
TRANSFER_FREE_ENERGY_DENSITY_CUTOFF
1000.0 (for solubility filter)
CORESURF_MEA_SASA
26.0 (core-surface def.)
CORESURF_MEA_BORN
3.0 (core-surface def.)
GENERAL_ASP
0.0 (general atomic solvation
parameter)
HYDROPHOBIC_ASP
0.0 (atomic solvation parameter for hphob)
NONBOND_FACTOR_1_4
(not in inputfile; scale vdw/coulomb for 1,4 energy)
WEIGHT_VDW
1.0 (empirical weight on vdw energies)
WEIGHT_ELECTROSTATICS
1.0 (empirical weight on electrostatic energies)
WEIGHT_1_4
1.0 (empirical weight on 1,4 energies)
UNSAT_HBOND_REFSTATE (-VDW_CUTOFF)
# reference state for unsatisfied ligand hbond groups
N_HBOND_UNSATISFIED_REFSTATE
(UNSAT_HBOND_REFSTATE) # ref.state for unsat bkbn N
H_HBOND_UNSATISFIED_REFSTATE
(UNSAT_HBOND_REFSTATE) # ref.state for unsat bkbn H
O_HBOND_UNSATISFIED_REFSTATE
(UNSAT_HBOND_REFSTATE) # ref.state for unsat bkbn O
OVERALL_ENERGY_SCALE
1.0 (empirical slope for stability prediction; JMB paper, optimal is
0.2 )
BORN_P1
(not in inputfile) (Born radii calculation const)
BORN_P2
(not in inputfile) (Born radii calculation const)
BORN_P3
(not in inputfile) (Born radii calculation const)
BORN_P4
(not in inputfile) (Born radii calculation const)
BORN_P5
(not in inputfile) (Born radii calculation const)
BORN_LAMBDA
(not in inputfile) (Born radii calculation const)
VAR_FIX_BORN_SCALE
(not in inputfile) (Born radii calculation const)
INTRA_ROTAMER_FLAG
1 (1 =
measure intra-rotamer energy)
TORSION_FILE
(not in inputfile)
RESPARAM_FILE
ROTAMER_LIBRARY
(can also be defined
in resparam file or input file)
SASA_POINTS_FILE
(not in inputfile; can be generated automatically if not defined)
A.2.1.4 Comments
Comments are allowed at the top of the file, before START. After that, comments are allowed
following the data on a line.
A.2.1.5 Description of the current file and
tips for modification
The current file is examples/energy_function/forcefield.
The vdW parameters are from OPLS-AA (Jorgensen
et al 1996) Within EGAD, the
vdW
combining rules are assumed to be the same as OPLS-AA (geometric for s and e). However, energy_functions.cpp: calc_vdw_tables can be
easily modified to reflect another rule set. The actual vdW energies
are calculated by the appropriately named functions listed in energy_functions.h. These probably do not need
to be modified.
The UNSAT_HBOND_REFSTATE value may need
adjustment, as discussed in the ligand-binding design section.
Depending on the system, OVERALL_ENERGY_SCALE may
also need to be varied. However, the remaining values should be left as
they are, unless an extensive re-optimization has been performed. The
optimization of these values are described in .
A.2.2 Torsion parameter file
A.2.2.1 General format
comments
START
atom1
atom2
atom3
atom4
k1
k2
k3
# comment
...
atom1
atom2
atom3
atom4
k1
k2
k3
# comment
STOP
There is no required order for the lines. Do not skip lines.
A.2.2.2 Torsion parameter format
A torsion parameter line has the following format:
atom1 atom2
atom3 atom4
k1
k2
k3
atom1..4 = torsiontype strings
k1, k2, k3 = fourier series coefficients
The dihedral is between atom1 and atom4, via the atom2-atom3 bond.
The energy E for a dihedral angle f is:
E = WEIGHT_1_4*(1/2)*k1*[1 + cos(f)] + k2*[1 - cos(2*f)] + k3*[1 + cos(3*f)]
For example:
CO CT N CN -2.365 0.912 -0.850
For dihedral angle f between CO and CN, via CT-N bond, the
energy E is
E = WEIGHT_1_4*(1/2)*-2.365*[1 + cos(f)] + 0.912*[1
- cos(2*f)] + -0.850*[1 + cos(3*f)]
A.2.2.3 Comments
Comments are allowed at the top of the file, before START. After that, comments are allowed
following the data on a line.
A.2.2.4 Description of the current file and
tips for modification
The current file is examples/energy_function/torsion_opls.
The torsional parameters are from OPLS-AA (Jorgensen
et al 1996); NONBOND_FACTOR_1_4 should be defined as 0.5 in the forcefield file.
Within EGAD, the torsional energy model is OPLS-AA.
However, it should be easy to modify energy_functions.cpp:
torsion_energy in order to use another model. Since the WEIGHT_1_4*(1/2) factor is
pre-multiplied to the fourier-coefficients in read_forcefield.cpp:
read_torsiondata, this function may also need to be modified to
use another model.
A.2.3 Resparam files
A.2.3.1 General format
comments
ROTAMER_LIBRARY
filename (optional)
START (START VERSION
2 if not version 1)
residuetype header line
atomline
...
done
residuetype header line
atomline
...
done
...
residuetype header line
atomline
...
done
STOP
PSE atomline (freq-weighted avg centroid pseudoatom location)
Older resparam files have a slightly different format for the
residuetype header line. In the interest of backwards compatibility,
newer files must have "START VERSION 2" at
the beginning of the parse-able section.
The order of real residues between START
and STOP does not matter. However, the
order of atoms within a given residue is extremely important. Spacing
within any of the lines is unimportant. Do no skip lines.
If a ROTAMER_LIBRARY filename is defined
both here and elsewhere, the file defined here will be disregarded.
A.2.3.2 Residuetype header formats
The residuetype header line contains attributes about a residuetype.
In the older version (names from structure_types.h:
RESPARAM):
ASP D 4 4.00 6.481 60.3742
-0.000799 0.000008 0.00287 2.435 0.0 1.0 2.0 1.632 241.78
"ASP" = residuetype
"D" = one_letter_code
"4" = residuecode
(residue ID number)
"4.00" = pKa
(if not ionizable, pKa
should be listed as 20.0)
"6.841" = max_vdw
(sidechain-bkbn vdW cutoff)
"60.3742 -0.000799 0.000008" = a b c for E_tripeptide
(boltzmann avg'd
tripeptide energy w/ entropy: E_tripeptide = aT2 + bT + c, T=TEMPERATURE)
"0.00287" = S
(sidechain entropy in tripeptide; from (Creamer
2000))
"2.435" = E_random
(random-sequence-structure (RSS) energy; with respect to E_tripeptide
at 298
K)
"0.0" = E_specificity
(universal specificity energy (w/ respect to E_random) )
"1.0 2.0 1.632"
= E_specificity_core, E_specificity_interfacial, E_specificity_surface
(core, interfacial,
surface specificity energies (w/ respect to E_random); if
E_specificity ≠ 0, do not have non-zero values for
these energies
"241.78" =
hydrophobic SASA in unfolded state; E_tripeptide += sasa*HYDROPHOBIC_ASP
The newer version is more flexible. Multiple header types
can be used within the same file. A major difference is that the RSS
and specificity energies are listed as absolute energies, rather than
with respect to tripeptide or RSS values.
Example (names from structure_types.h: RESPARAM):
ASP D 4 4.00 5.944 0.660 0.00287
-22.229 -23.792 -23.535
"ASP" = residuetype
"D" = one_letter_code
"4" = residuecode
(residue ID number)
"4.00" = pKa
(if not ionizable, pKa should be
listed as 20.0)
"5.944" = max_vdw
(sidechain-bkbn vdW cutoff)
"0.660" = E_tripeptide (tripeptide energy, w/o entropy) )
"0.00287" = S
(sidechain entropy in tripeptide;
from (Creamer
2000))
"-22.229" = E_random
(random-sequence-structure (RSS) energy)
"-23.792" = E_specificity
(universal specificity energy)
"-23.535" = E_unfolded
(unfolded state energy; for calculating Pseudo_DELTA_G_folding)
In lieu of the universal specificity energy, one may instead specify
three numbers for the core, interfacial, surface specificity energies
(which are NOT w/ respect to RSS or tripeptide energies):
TRP W 19 20.00 1.383825 102.172289
0.003900 -7.9378 -16.337 0.000000 -13.521
same as previous example, except instead of E_specificity, E_specificity_reference_state
"-16.337" = E_specificity_hbond (E_specificity_core in source code)
"0.000000" =
E_specificity_interfacial
"-13.521" = E_solubility
(E_specificity_surface in
source code)
residuetype must be a unique three
character string. "*TE" (*=wildcard) is
reserved for charged termini. "ZZZ" and "END" are also reserved. "UUU"
is reserved for the UUU virtual
residuetype. residuetype should start with
a letter.
residuecode numbers
should be unique
integers <100; ≥
100 are for charged termini. residuecode
numbers are >0, except for the case of ionizable groups (pKa ≠
20.0). For these, one of the forms (usually the prevalent form
at pH 7.0), has residuecode = p > 0.
Its conjugate form has residuecode = -p.
one_letter_code must be a unique character
(with the exception of charged termini, which have more letters). For
natural amino acids, the resparam file uses the standard FASTA code.
For conjugate acids/bases, the prevalent form at pH 7.0 has the
upper-case version, while the conjugate has the lower-case form.
For non-ionizable groups (or for groups for which the conjugate partner
is not also described), pKa should be
listed as 20.0.
The max_vdw sidechain-bkbn vdW cutoff
energy is used to reject rotamers from consideration. These were
estimated by surveying a subset of the PDB. The distribution of vdW
energies from natural structures is gaussian; max_vdw
is defined as the average plus three standard deviations.
The reference state energies are described in greater detail in (Pokala
and Handel 2005).
A.2.3.3 Atomline format
The format for the atom lines in EGAD is a direct descendent of the
AMBER94 topology file (Cornell et al 1996).
As an example, here is the residue parameter section for TRP:
TRP W 19 20.00 1.422176 15.057
0.0039 -15.089999 0.000000 -16.865708
4 N N -1 26
6 4 0 0 0 1.335 116.600 180.000 -0.500 D
5 H HN -2 4 26
6 0 6 0 1.010 119.800 0.000
0.300 H
6 CA CT -3 4 26
6 0 0 0 1.449 121.900 180.000 0.140 -
7 HA HC -4 6 4
26 0 0 0 1.090 109.500 300.000 0.060 -
8 CB CH2 -5 6 4 26
0 0 0 1.525 111.100 60.000 -0.120 -
9 1HB HC 6 8 6
4 8 11 25 1.090 109.500 300.000 0.060 -
10 2HB HC 7 8 6
4 8 11 25 1.090 109.500 60.000 0.060 -
11 CG C* 8 8 6
4 0 0 0 1.510 115.000 -180.000 0.075 -
12 CD1 CA 8 11 8 6 0
25 0 1.340 127.000 -180.000 -0.115 -
13 HD1 HA 8 12 11 8 0
0 0 1.090 120.000 0.000 0.115 -
14 NE1 NW 8 12 11 8 0
25 0 1.430 107.000 180.000 -0.570 D
15 HE1 HNW 8 14 12 11 0 0 0
1.010 125.500 180.000 0.420 H
16 CE2 CQ 8 14 12 11 25 11 8 1.310
109.000 0.000 0.130 -
17 CZ2 CA 8 16 14 12 0 25 0
1.400 128.000 180.000 -0.115 -
18 HZ2 HA 8 17 16 14 0
0 0 1.090 120.000 0.000 0.115 -
19 CH2 CA 8 17 16 14 0
0 0 1.390 116.000 180.000 -0.115 -
20 HH2 HA 8 19 17 16 0
0 0 1.090 120.000 180.000 0.115 -
21 CZ3 CA 8 19 17 16 0
0 0 1.350 121.000 0.000 -0.115 -
22 HZ3 HA 8 21 19 17 0
0 0 1.090 120.000 180.000 0.115 -
23 CE3 CA 8 21 19 17 25 11 8 1.410
122.000 0.000 -0.115 -
24 HE3 HA 8 23 21 19 0
0 0 1.090 120.000 180.000 0.115 -
25 CD2 CB 8 23 21 19 11 8 6
1.400 117.000 0.000 -0.055 -
26 C CN -9 6 4
26 0 8 0 1.522 111.100 180.000 0.500 -
27 O O -10 26 6
4 0 0 0 1.229 120.500 0.000 -0.500
A2
done
Atom lines have the following format (names from structure_types.h:
ATOMRESPARAM):
23 CE3 CA 8 21 19 17 25
11 8 1.410 122.000 0.000 -0.115 -
"23" = atomnumber
"CE3" = atomname
"CA" = atomtype
"8" = other_info
"21" = contactAtomNumber
"19" = twobondAtomNumber
"17" = dihedAtomNumber
"25" = ringContactAtom
"11" = ringTwoBondAtom
"8" = ringDihedAtom
"1.410" = bondlength
"122.000" = bondangle
"0.000" = dihedral
"-0.115" = charge
"-" = donorflag
(optional)
Each residue must begin with N, H, CA, HA, and CB. The atomnumber and order for these atoms are
conserved in all residuetypes (4 is always N, 8 is always CB, etc). The last two atoms listed must always
be C and O.
atomnumber:
unique ID number for an atom
within a residue
For historical reasons (from the
AMBER94 file), atoms are numbered starting with 4
for N,
atomname:
unique string for an atom within
a residue
nomenclature for protein amino
acid atoms follows standard PDB naming conventions
atomtype:
forcefield atomtype for this
atom; must correspond to an atomtype
listed in the forcefield file
other_info number:
another atom identification number
other_info
numbering usually starts with 1 for N
Numbering for backbone atoms
(including CB, all atoms for gly, pro
residues) is denoted by a minus sign.
Atoms within the same rigid body
(such as a ring) should have identical other_info
numbers.
For example, the members of the
trp ring all have other_info 8.
Residues are built outward from given anchor backbone coordinate atoms.
contactAtomNumber, twobondAtomNumber,
and dihedAtomNumber are used to identify
atoms whose coordinates are required for building the atom of interest;
given a dihedral angle, three atomic coordinates are required to build
the fourth one (see dihedral_cartesian.cpp:
chi2xyz).
When appropriate, backbone atoms are built in a similar manner,
N-terminus to C-terminus. Within a given residue parameter section, the
backbone atoms N, H,
CA, HA, C, O, and CB are described as if the backbone formed a
closed ring, in which C for this residue
is directly bonded to N for this residue.
contactAtomNumber:
atomnumber
of the atom from whose coordinates this atom is directly built from.
twobondAtomNumber:
contactAtomNumber
and twobondAtomNumber form the bond whose
rotation is described by dihedral.
This atom forms bondangle angle with twobondAtomNumber,
with contactAtomNumber as the vertex.
dihedAtomNumber:
dihedral is the dihedral angle between this atom
and dihedAtomNumber
The ring atomnumbers are used to identify other atoms this atom is
connected to via ring closure; these are used for energy calculations,
rather than for atom coordinate generation. If 0,
ignore.
ringContactAtom:
atomnumber of an atom this atom is directly
bonded to, via ring closure.
ringTwoBondAtom:
atomnumber of an atom this atom is two
bonds away from, via ring closure.
ringDihedAtom:
atomnumber of an atom this atom is three
bonds away from, via ring closure.
Given this minimal connectivity data, this information is sufficient to
infer the connectivity between any pair of atoms within a residue.
bondlength:
bondlength to
contactAtomNumber atom,
bondangle:
bondangle between
this atom, contactAtomNumber atom, and twobondAtomNumber atom.
contactAtomNumber is the vertex
dihedral:
dihedral angle to dihedAtomNumber atom.
IMPORTANT: if listed dihedral<0.0, then this bond is classified as
rotatable.
In this
example, the dihedrals for atoms 11 and 12 (CG
and CD1) are <0.
These values
will be replaced internally by values from the rotamer library.
donorflag:
Gives information about the hydrogen-bonding status of an atom. This
field is optional; default to '-'.
'-' = not involved with hydrogen bonds
'H' = donate-able hydrogen
'D' = can donate hydrogens
'An' = hydrogen bond acceptor; can accept n hydrogens (n is
integer)
'Bn' = both a donor and an acceptor; can
accept n hydrogens (n
is integer)
A.2.3.4 Comments
Comments are permitted before the START
line, but not in any data-containing lines.
A.2.3.5 Description of the current file
The current file is examples/energy_function/resparam.
The bondlengths and angles are taken directly from the AMBER94 topology
file (Cornell
et al 1996). The charges
are from OPLS-AA (Jorgensen et al 1996).
The determination of the reference state energy values is described in (Pokala
and Handel 2005).
The UUU virtual residue is primarily used
for giving free-moving ligands an anchor backbone position.
A.2.4 Rotamer library file
A.2.4.1 General format
comments
START (START VERSION 2 if not
version 1)
rotamer header line
rotamerline
...
rotamer header line
rotamerline
...
...
rotamer header line
rotamerline
...
STOP
Older rotamerlib files have a slightly different format for the rotamer
header line. In the interest of backwards compatibility, newer files
must have "START VERSION 2" at the
beginning of the parse-able section.
The order of residues between START and STOP does not matter. There is no required
ordered of rotamers within a section. Spacing within any of the lines
is unimportant. Do not skip lines.
A.2.4.2 Rotamer headerline -
Old version
In the older version, the headers have the following
format:
restype
num_library_rotamers
number_of_rotamers
numChi
restype
(residuetype; must correspond to a residuetype in the resparam file)
num_library_rotamers
(number of rotamers listed here for this residuetype)
number_of_rotamers
(the desired final number of rotamers for this residuetype, after
spreading)
numChi
(number of dihedral angles)
Based on these values:
library_spread_ratio = number_of_rotamers/num_library_rotamers
Dihedrals up to chiln(library_spread_ratio)/ln(3)
are spread 1 std. dev. from each of the listed
rotamers.
For example:
PHE 5 15 2
"PHE" = restype
"5" = num_library_rotamers
(number of rotamers listed here for this residuetype)
"15" = number_of_rotamers
(number of rotamers we would like, after spreading)
"2" = numChi
(number of variable dihedral angles for this residuetype)
In this example, library_spread_ratio = 3. This means that
dihedrals up chi1
will be spread 1 std. dev. from each of the listed rotamers, for a
total of 15 rotamers.
If we want to spread both chi1
and chi2:
PHE 5 45 2
"45" = number_of_rotamers
(number of rotamers we would like, after spreading)
For this, library_spread_ratio = 9. This means that rotamers
through chi2
will be spread
1 std. dev. from each of the listed rotamers.
If we do not wish to spread rotamers, then:
PHE 5 5 2
The library_spread_ratio = 1. Since ln(1)/ln(3) = 0, no
dihedrals are spread.
A.2.4.3 Rotamer headerline - New version
Defining spreading is much simpler in the newer rotamerfiles. In newer
rotamer library files, there are two possible formats.
If we wish to spread rotamers:
restype
numChi
chi_angle_to_spread_to
restype
(residuetype; must
correspond to a residuetype in the resparam file)
numChi
(number of dihedral angles)
chi_angle_to_spread_to
(spread 1 std dev to chichi_angle_to_spread_to)
The number of rotamers do not need to be defined; they will be counted
by the program.
For example:
PHE 2
1
"PHE" = residuetype
(number of rotamers listed here for this residuetype)
"2" = numChi
(number of variable dihedral
angles for this residuetype)
"1" = chi_angle_to_spread_to
This will result in spreading chi1
1 std. dev, for a total of 15 rotamers.
If we wish to spread both chi1
and chi2:
PHE 2
2
"PHE" = residuetype
(number of rotamers listed here for this residuetype)
"2" = numChi
(number of variable dihedral
angles for this residuetype)
"2" = chi_angle_to_spread_to
This means that rotamers through chi2
will be spread 1 std. dev. from each of the listed
rotamers.
Of course, if we do not wish to spread the rotamers for this residue,
simply leave the last field empty:
PHE 2
"PHE" = residuetype
(number of rotamers listed
here for this residuetype)
"2" = numChi
(number of variable dihedral angles for this residuetype)
A.2.4.4 Rotamerline format
Rotamer lines are taken directly from Dunbrack's
backbone-independent library (Dunbrack and Cohen 1997).
restype class n(r1) n(r1234)
p(r1234) s p(r234|r1) s c1 s1 c2 s2 c3 s3 c4
s4
1 2 3 4
restype
(residuetype; must correspond to a residuetype in the header line)
p(r1234)
(frequency of this rotamer class in the database)
cn
(average value for this angle class for chin in this rotamer class)
sn
(std. dev. (sigma) observed for this angle class for chin in this rotamer
class)
The only important entries for EGAD are the restype, p(r1234), and the chin, sigman
fields.
For example:
PHE 2 1 0 0 2058 1935 31.97 0.49 93.99 0.43 -178.8 11.2 77.0 13.8
"PHE" = restype
"2 1 0 0" =
class rotamer class; chi1
= class[2], chi2=
class[1]
"31.97" = p(r1234) frequency of PHE rotamer class [2 1 0 0] in the database
"-178.8" = chi1
"11.2" = sigma1
"77.0" = chi2
"13.8" = sigma2
A.2.4.5
Position-specific rotamers
If an additional field follows the rotamerline described, it is read as
a position-specific rotamer. For the PHE example
discussed here, this rotamer is used only for position 15:
PHE 2
1 0 0 2058 1935 31.97
0.49 93.99 0.43 -178.8
11.2 77.0 13.8 15
The following rotamer is used only for position 12A:
PHE 2
1 0 0 2058 1935 31.97
0.49 93.99 0.43 -178.8
11.2 77.0 13.8 12A
A.2.4.6 Comments
Comments are permitted before the START
line, but not in any data-containing lines.
A.2.4.7 Description of current file and
tips for modification
The current file is examples/energy_function/dunbrack.rotamers.
Rotamer lines are taken directly from Dunbrack's backbone-independent
library (Dunbrack
and Cohen 1997). In the current file, rare R, M, K, and F rotamers
were
trimmed. Spreading (for chi1 only)
is done for all rotamers with less than 27 library rotamers.
Many of the fields in Dunbrack's rotamerline format are not used by
EGAD. However, since Dunbrack periodically updates his library on his website,
it was felt that it would be
simplest to support his format. It is necessary to insert header lines
at the appropriate locations. If conjugate acids/bases are included, it
is necessary to include a rotamer library entry for each form.
A.2.5 Ligand parameter files
Ligand files have three parts. The first part is a resparam entry that
describes connectivities and geometries. The second part lists the
ligamers. The third part is a pdb-style list of the base coordinates
used as a starting point for transformations.
A.2.5.1 General format
comments
START VERSION 2
residuetype header line
atomline
...
STOP
START VERSION 2
ligamer header line
ligamer
...
STOP
START
pdb-style base coordinate line
...
END
STOP
Each section is bracketed by START and STOP. Coordinates are defined as pdb-style
lines; individual entries are bracketed by START
and END.
A.2.5.2 Resparam entry section
This section has a format identical to the resparam entries described
above. There are a few more header-line variations allowed for ligands:
PRT w 49
"PRT" = residuetype
"w" = one letter code
"49"
= residue ID number
PRT w 49 12.4
"12.4" = ligand-backbone vdW cutoff;
ligamers with vdW interaction energy greater than this are rejected
PRT w 49 12.4 -5.4
"-5.4" = solvation energy cutoff; ligamers
with solvation energy greater than this are rejected
The residuetypes, one letter codes, and residue ID numbers must be
unique and not used by any residue in the main resparam file.
By default, the ligand-backbone vdW cutoff is set to 10 kcal/mol and
solvation energy cutoff set to 1000 kcal/mol.
If the hydrogen-bonding specificity energy is not defined, then the
defined hydrogens, donors, and acceptors are treated as described above
for backbone hydrogen-bonding atoms.
If the reference state energy is not defined in the ligand parameter
file, the program will automatically calculate the energy for the
isolated ligand base coordinates.
For rigid ligands, this may be reasonable, but, as for protein
sidechains, some form of conformational averaging might be required for
flexible ligands. A possible approximation would be to use a minimized
structure for the ligand base coordinates.
A.2.5.2.1 Generation of resparam section
from a ligand pdb file
The ExtendedSidechain program by Mark
Voorhies (ExtendedSidechain documentation)
may be used to create this section from a pdb file (myligand.pdb)
of the ligand. An example of this is given in the examples/energy_function/ligands/maltose/
directory.
If the coordinates in myligand.pdb are
from a crystal structure, hydrogens should be added by some program
such as Pymol or PRODRG.
If
the ligand of interest is somewhere in the pdb, find it at the MSD
database;
this server also adds hydrogens. Make sure that the coordinates of the
heavy atoms are not changed significantly if the hydrogen-built
structure is
minimized.
Edit the ligand pdb file so that each line has a "ATOM"
instead of "HETATM" This may be done with awk:
cat myligand.pdb | awk
'sub("HETATM","ATOM "); print ' > temp ; mv temp myligand.pdb
This edited file may be used as input for ExtendedSidechain:
ExtendedSidechain AUU.pdb
myligand.pdb -reslib_file old_resparam_file_for_ligand_building
-anchor_res 216 -anchor_chain Q -res_outfile myligand.param
-remove_flexibility true
The other argument files are in examples/energy_function/ligands/.
AUU.pdb is a structure of a tripeptide that
ExtendedSidechain gets the anchor (-anchor_res 216 -anchor_chain Q) to build the
ligand resparam description from the ligand coordinates listed in myligand.pdb. ExtendedSidechain
requires using an older version of the EGAD resparam file (old_resparam_file_for_ligand_building).
The output (myligand.param) must be
modified. The atomtype CH3 on ligands
sometimes causes problems; change these to CT.
Change the backbone atomtypes to U. Finally, make sure that the ligandtype, one_letter_code,
and residuecode are unique (not already
used in the resparam file), and that atomnames are not reserved (not listed in examples/energy_function/RESERVED_NAMES).
Define hydrogen-bonding properties for each atom, and indicate
rotatable bonds as described above for amino-acid residues.
Determining optimal methods for charge assignments is a major challenge
in and of itself. While there are many ways to assign charges, a simple
and self-consistent way is to use the OPLS-AA rules, since these
charges have been optimized in concert with the van der Waals terms to
reproduce small molecule data, and should be compatible with the EGAD
energy function . Charge assignments with this rule set may be
automated, and do not require quantum calculations. The OPLS-AA atom
descriptions are listed in examples/energy_function/ligands/oplsaa.txt.
A.2.5.3 Ligamer section
A.2.5.3.1 Automatic generation
of ligamers
Ligamers may be defined in several ways. The simplest way is for the
program to generate them automatically.
(resparam section here)
START VERSION 2
PRT GENERATE_LIGAMERS
STOP
(base coordinate section here)
The maximal displacement, translation stepsize, and
rotation stepsize may also be user-defined:
(resparam section here)
START VERSION 2
PRT GENERATE_LIGAMERS
rotation_stepsize
translation_stepsize
max_displacement
STOP
(base coordinate section here)
The defaults (if stepsizes and displacement are not user-defined):
rotation_stepsize = 20.0 degrees
translation_stepsize = 1.0 Å
max_displacement = 2.0 Å
These will generate a grid of ligand
positions. The
ligamers are translated max_displacement
from the base coordinates, with translation_stepsize
steps, along each axis (x,y,z). At each of these points, 13 arbitrary
rotation axes (x, y, & z axes and all diagonals with the origin at
the ligand centroid) are defined <x y z>:
<1 1 1> <1 0 0> <0 1 0> <0 0
1> <1 1 0> <1 0 1> <0 1 1> <1 -1 0> <1 0 -1>
<0 1 -1> <1 -1 -1> <-1 1 -1> <-1 -1 1>
The ligand is rotated -180.0 -> 180.0, in rotation_stepsize
steps, around each axis. In total, the default settings create 29252
ligamers. The backbone-clash vdW cutoff criterion reduces this number
significantly.
For the maltose example (examples/energy_function/ligands/maltose/maltose.ligand),
the stepsizes and maximum displacement are manually defined, resulting
in a much more coarse sampling than the default values:
MAL GENERATE_LIGAMERS 36 1 2
For this example:
rotation_stepsize = 36.0 degrees
translation_stepsize = 1.0 Å
max_displacement = 2.0 Å
Automatically generated ligamers also include the base coordinates as a
ligamer.
Automatic generation is not an option for ligands with rotatable bonds.
A.2.5.3.2
Ligamer line format
Ligamers may also be directly defined by transform matrices. If
this is
done, the header-line for this section must have the residuetype and
the number 12 (12
"dihedrals" per ligamer) . For example:
(resparam section here)
START VERSION 2
PRT 12
1 0 0 0 0 1 0 1 0 0 1 0
...
STOP
(base coordinate section here)
A structure with rotatable bonds can be defined in a similar manner.
For example:
(resparam section here)
START VERSION 2
PRT 15
1 0 0 0 0 1 0 1 0 0 1 0 90.0 -70.0 180.0
...
STOP
(base coordinate section here)
In this example, there are 15 "dihedrals" (12 transform matrix +
3 actual dihedrals). The dihedral angles are listed after the transform
matrix elements ( 90.0 -70.0 180.0 ). The
rotatable bonds must be defined in the resparam entry as is done with
regular amino acid sidechains.
As discussed above for conventional rotamers,
ligamers can be defined as being position specific. For example:
START
VERSION 2
PRT 15
1 0 0 0 0 1 0
1 0 0 1 0 90.0 -70.0 180.0 2l
1 0 0 5 0 1 0
1 0 0 1 0 180.0 -70.0 60.0 1l
...
STOP
The first ligamer is used only for ligand position 2l, while
the second is used only for 1l. This
option may be useful if multiple copies of the same ligand are
permitted.
The base coordinates are not included as a choice, unless the
stationary transform matrix ligamer is also listed:
1 0 0
0 0 1 0 0 0 0 1 0
A.2.5.4 Base coordinate section
Following the STOP of the ligamer section, a base coordinate entry must
be defined. If the ligand coordinates are listed in the template pdb
file, the coordinates from the template file will supercede this one.
Nevertheless, coordinates must also be defined here.
The base coordinates are a pdb entry, bracketed by START
and END. A final STOP
line marks the end of the file. For example:
(resparam section here)
(ligamer section here)
START
ATOM 60 1HB SYL 4 13.876 19.001 19.708 0.060 1.250
...
ATOM 74 3HZ SYL 4 16.477 20.457 16.826 0.330 0.814
END
STOP
A.2.5.5 Ligamers defined by coordinates
Ligamers and base coordinates can be combined and defined by successive
pdb-style coordinate entries. To indicate this, the header line should
have COORDINATES in place of the "number
of dihedral angles" entry. For example:
(resparam section here)
START VERSION 2
SYL COORDINATES
START
ATOM 160 1HB SYL 10 1.740 11.485 3.560 0.060 1.250
...
END
START
ATOM 60 1HB SYL 4 13.876 19.001 119.708 0.060 1.250
...
END
...
END
STOP
If coordinates are defined, there is no need to define a base
coordinate entry.
Coordinates can also be listed in a separate file:
(resparam section here)
START VERSION 2
SYL COORDINATE_FILE coord_file
STOP
The coordinate file lists coordinates sequentially, bracketed by START and END for
each entry, and terminating with a STOP at
the end of the file, as is done when they are listed in the main ligand
parameter file.
back to Table of Contents