back to Table of Contents
A.1 Output pdb file format
The output pdb file contains a great deal of information in its header.
Many of these are about the run parameters (TEMPERATURE,
pH, JOBTYPE, LOOKUP_TABLE_DIRECTORY, various flags and
weights, etc), and are described above for the most part.
A.1.1 ENERGY entries
The ENERGY entries are as defined:
ENERGY
type_of_energy: value (in kcal/mol)
E_vdw: van der Waals energy of the structure
E_coulomb: Coulombic electrostatics energy
of the structure
E_1_4: energy between atoms three bonds
apart; torsion
E_born: sum of the Born self energies of
the atoms
E_pol: the cross-polarization energy of the
structure; opposite in direction to E_coulomb;
describes environment-dependent shielding of electrostatic interactions
E_sasa: the SASA-dependent energy
E_hbond: the explicit hydrogen-bonding
energy; turned off by default
TdS: sidechain entropy of the unfolded
state multiplied by the TEMPERATURE
If SCMF was used, this value is the sidechain entropy of the final
ensemble minus the sidechain entropy of the unfolded state (all
multiplied by TEMPERATURE). The entropies
at each position are listed further down in the pdb file; their format
is described in the scmf section.
These reference state energies are described in much greater detail in .
E_rss: the energy of the
random-sequence-structure unfolded state model
E_specificity: specificity negative design
energy; the more favorable this is, the more likely it is that a
sequence may adopt non-target conformations
E_solubility: solubility negative design
energy; simulated energy of this protein in a folded aggregate
E_working: the energy that was actually
optimized; for single sequence rotamer optimization, SPECIFICITY_FLAG is set to 0, so these energies are not calculated during
the optimization. For all calculations, pseudoatoms are used for
approximating SASA and Born radii, so this energy includes these
approximations
These energies summarize the energies from above:
E_structure: the energy of this structure,
without any reference state energies; sum of the energies listed above,
E_vdw through E_hbond
E_reference: the total energy of non-native
states, including the unfolded state, aggregates, and non-target
conformations; E_rss + E_specificity +
E_solubility
E_total: E_structure - E_reference
The following energies are scaled by OVERALL_ENERGY_SCALE,
an empirically determined scale factor that results in quantitative
prediction of mutant stabilities and protein-protein affinities (see Pokala and Handel 2005)
for details:
E_folded: OVERALL_ENERGY_SCALE*E_structure
E_unfolded: OVERALL_ENERGY_SCALE*E_rss
Pseudo_DELTA_G_folding: E_folded - E_unfolded
Some of the SASAs are broken down by type of atom, and are self
explanatory. The others:
sasa_hphob: hydrophobic SASA; sasa_sp3_S + sasa_sp2 + sasa_H
sasa_total: total SASA
fraction_sasa_hphob: fraction hydrophobic
SASA; sasa_hphob/sasa_total
transfer_free_energy_density: water-octanol
transfer free energy/sasa_total ; defunct
The SEARCH_SPACE entries describe the size
of the search space.
A.1.2 ROTAMER
entries
The ROTAMER entries describe the sidechain
conformations as follows:
ROTAMER seq_position position_type one_letter_code
c1 ... cn
The position_type entry has two parts
separated by a period ".". The first part
describes whether a position is classified as core (c),
surface (s), interfacial (i), or is a ligand (l);'
these definitions are as described in . The second part describes
whether a position was allowed to change rotamer (rot),
sequence (des), or was kept fixed to the
conformation from the template structure (fix).
For example:
ROTAMER 93 s.rot T -173.8
Thr 93 is a surface (s) position whose
rotamer was allowed to change (rot). Its c1 = -173.8
ROTAMER entries in an output pdb file may
be used as user-defined rotamers in an
inputfile.
A.1.3 SEQUENCE
entry
The SEQUENCE entry lists the sequence. If
there are missing residues, they are indicated by ".".
For single-chain structures without missing residues, the molecular
weight and extinction coefficient at 280nm are also calculated. The
masses are also given for a sequence with an additional N-terminal met,
or the post-cleavage GA tail left at the N-terminus following cleavage
of a tag with TEV protease (assuming expression in the pSV series of E
coli expression vectors).
If positions were allowed to change their amino-acid identity, a
compressed sequence is listed as VARIABLE_SEQUENCE.
The residues listed are only for those variable positions. For example,
if a design permitted only positions 10, 15, 20, and 50 to vary, and
the optimal solution has Thr10, Val15, Gln20, and Arg50, this entry
would list:
VARIABLE_SEQUENCE
TVQR
A.1.4 CHARGE,
HBOND, CLASH,
LENGTH, and CENTROID entries
The predicted overall CHARGE at the pH of
the calculation is also given, assuming that ionizable groups do not
have altered pKs.
The HBOND entries describe the
hydrogen-bonding properties of the structure. The number_of_hbonds
entry lists the number of hydrogen bonds; these are broken down into
total, side-side, side-backbone, and backbone-backbone. The number_of_unsatisfied_hbond_groups entry lists
the number of unsatisfied hydrogen-bonding atoms; satisfaction is
defined as described in .
For example:
HBOND number_of_hbonds: 40 0 2 38
total side-side side-bkbn bkbn-bkbn
HBOND number_of_unsatisfied_hbond_groups: 25 7 18 total side bkbn
This structure has 40
hydrogen bonds in total. There are 0
between sidechain atoms, 2 between
sidechain and backbone atoms, and 38
between backbone atoms. This structure has 25
unsatisfied hydrogen-bonding atoms in total. There are 7 sidechain atoms that are unsatisfied, while 18 backbone atoms are unsatisfied.
Unsatisfied atoms are marked with a "!" as
discussed below.
The CLASH entries describe clashes in the
structure. The number of clashes are reported. For each clash, there is
an entry:
atom1(seq residue name) atom2(seq
residue name) r E_lennard_jones E_vdw_working
where r is the distance
between atoms, E_lennard_jones is the vdW
energy of the atom-pair, without modifications, and E_vdw_working is the vdW energy of the
atom-pair, including modifications to the lennard-jones function. A
clash is defined if E_vdw_working > VDW_CLASH_ENERGY. For example:
CLASH 173 ASN O 173 ASN CG 2.503237
15.050219 10.297471
The O atom of ASN 173 forms a
clash with CG of ASN
173. The atoms are 2.503237
Å apart. Their energy with the unmodified Lennard-Jones
potential is 15.050219 kcal/mol, while
their energy with the softened vdW potential is 10.297471
kcal/mol.
As discussed previously, energy differences between two structures with
a difference in the number of clashes greater than two are not reliably
predicted accurately .
The LENGTH entry has the longest
inter-atom distance (Å) in the structure. The CENTROID entry lists the x,y,z coordinates of
the centroid of the structure.
A.1.5 ATOM
entries
The ATOM entries are in traditional pdb
format, with a few modifications and additions. The SASA of the atom is
listed in the occupancy column. The Born radius is listed in the
B-factor column. The number that follows is the energy of the atom,
including interactions with other atoms and solvation. If an atom has
not satisfied its hydrogen-bonding potential, it will have an "!" For example:
ATOM 757 1HZ LYS 50 17.482 16.702
28.616 0.00 1.76 -2.608348 !
This atom has a SASA of 0.00
Å2, and a Born radius of 1.76
Å
, and has interactions (including solvation) with an energy of -2.608348 kcal/mol. The "!"
indicates that its hydrogen-bonding is not satisfied; it is buried and
does not form any hydrogen bonds.
For the last atom in a residue (indicated by an "@"),
there are four additional entries. The first is the energy of the
sidechain, followed by the energy of the entire residue, including
backbone atoms. The next two numbers are the SASA of the sidechain,
followed by the SASA of the entire residue, including backbone atoms.
For example:
ATOM 713 OD2 ASP 47 20.371 7.323
23.443 36.79 1.63 -13.172937 @ -24.334115 2.313291 89.453524 99.383265
This atom has a SASA of 36.79
Å2, a Born radius of 1.63
Å, and has an energy of -13.172937
kcal/mol. This entire sidechain has an energy of -24.334115
kcal/mol. The energy of the residue, including the backbone, is 2.313291 kcal/mol. The SASA of the sidechain is 89.453524
Å2, while the SASA of the residue, including the
backbone is 99.383265
Å2. Since there is no "!",
this atom's hydrogen bonding is satisfied.
A.1.6 Output pdb file options
If OUTPUT_ENERGY_PER_ATOM_FLAG is set
to 0 (default 1),
the atom energies that are listed in the pdb file are not calculated.
If OUTPUT_COORD_FLAG is set to 0, the coordinates are <not> printed to
the file. For user-launched jobs, the default is 1;
ie: coordinates are printed. However, for scanning mutagenesis and
multistate sequence optimization, the default is set to 0 for the intermediary files (which are deleted
anyway). Writing a structure file can take a significant amount of
time. For scanning mutagenesis, if the files for the individual mutants
are desired, set CLEAN_UP_FLAG 0 and OUTPUT_COORD_FLAG 1.
A.1.7 Implementation and use
The function that creates the pdb file is output_stuff.cpp:
output_PROTEIN.
The energies and surface areas may be read from a pdb file into ENERGY and SASA_SUM
structs by
rotamer_calc_master.cpp:
get_ENERGY_and_SASA_SUM_from_egad_pdb_file.
A CHROMOSOME struct corresponding to a
given pdb file may be generated by rotamer_calc_master.cpp:
pdbfile_to_CHROMOSOME. The resulting CHROMOSOME
will have the sequence and rotamers that correspond to the structure
file.
back to Table of Contents