back to Table of Contents

A.1 Output pdb file format

The output pdb file contains a great deal of information in its header. Many of these are about the run parameters (TEMPERATURE, pH, JOBTYPE, LOOKUP_TABLE_DIRECTORY, various flags and weights, etc), and are described above for the most part.

A.1.1 ENERGY entries

The ENERGY entries are as defined:
ENERGY   type_of_energy: value (in kcal/mol)

E_vdw: van der Waals energy of the structure
E_coulomb: Coulombic electrostatics energy of the structure
E_1_4: energy between atoms three bonds apart; torsion
E_born: sum of the Born self energies of the atoms
E_pol: the cross-polarization energy of the structure; opposite in direction to E_coulomb; describes environment-dependent shielding of electrostatic interactions
E_sasa: the SASA-dependent energy
E_hbond: the explicit hydrogen-bonding energy; turned off by default

TdS: sidechain entropy of the unfolded state multiplied by the TEMPERATURE
If SCMF was used, this value is the sidechain entropy of the final ensemble minus the sidechain entropy of the unfolded state (all multiplied by TEMPERATURE). The entropies at each position are listed further down in the pdb file; their format is described in the scmf section.

These reference state energies are described in much greater detail in .
E_rss: the energy of the random-sequence-structure unfolded state model
E_specificity: specificity negative design energy; the more favorable this is, the more likely it is that a sequence may adopt non-target conformations
E_solubility: solubility negative design energy; simulated energy of this protein in a folded aggregate

E_working: the energy that was actually optimized; for single sequence rotamer optimization, SPECIFICITY_FLAG is set to 0, so these energies are not calculated during the optimization. For all calculations, pseudoatoms are used for approximating SASA and Born radii, so this energy includes these approximations

These energies summarize the energies from above:
E_structure: the energy of this structure, without any reference state energies; sum of the energies listed above, E_vdw through E_hbond
E_reference: the total energy of non-native states, including the unfolded state, aggregates, and non-target conformations; E_rss + E_specificity + E_solubility
E_total: E_structure - E_reference

The following energies are scaled by OVERALL_ENERGY_SCALE, an empirically determined scale factor that results in quantitative prediction of mutant stabilities and protein-protein affinities (see Pokala and Handel 2005) for details:
E_folded: OVERALL_ENERGY_SCALE*E_structure
E_unfolded: OVERALL_ENERGY_SCALE*E_rss
Pseudo_DELTA_G_folding: E_folded - E_unfolded

Some of the SASAs are broken down by type of atom, and are self explanatory. The others:
sasa_hphob: hydrophobic SASA; sasa_sp3_S + sasa_sp2 + sasa_H
sasa_total: total SASA
fraction_sasa_hphob: fraction hydrophobic SASA; sasa_hphob/sasa_total
transfer_free_energy_density: water-octanol transfer free energy/sasa_total ; defunct

The SEARCH_SPACE entries describe the size of the search space.

A.1.2 ROTAMER entries

The ROTAMER entries describe the sidechain conformations as follows:
ROTAMER seq_position position_type one_letter_code c1 ... cn

The position_type entry has two parts separated by a period ".". The first part describes whether a position is classified as core (c), surface (s), interfacial (i), or is a ligand (l);' these definitions are as described in . The second part describes whether a position was allowed to change rotamer (rot), sequence (des), or was kept fixed to the conformation from the template structure (fix). For example:
ROTAMER 93 s.rot T -173.8

Thr 93 is a surface (s) position whose rotamer was allowed to change (rot). Its c1 = -173.8

ROTAMER entries in an output pdb file may be used as user-defined rotamers in an inputfile.

A.1.3 SEQUENCE entry

The SEQUENCE entry lists the sequence. If there are missing residues, they are indicated by ".".

For single-chain structures without missing residues, the molecular weight and extinction coefficient at 280nm are also calculated. The masses are also given for a sequence with an additional N-terminal met, or the post-cleavage GA tail left at the N-terminus following cleavage of a tag with TEV protease (assuming expression in the pSV series of E coli expression vectors).

If positions were allowed to change their amino-acid identity, a compressed sequence is listed as VARIABLE_SEQUENCE. The residues listed are only for those variable positions. For example, if a design permitted only positions 10, 15, 20, and 50 to vary, and the optimal solution has Thr10, Val15, Gln20, and Arg50, this entry would list:
VARIABLE_SEQUENCE        TVQR

A.1.4 CHARGE, HBOND, CLASH, LENGTH, and CENTROID entries

The predicted overall CHARGE at the pH of the calculation is also given, assuming that ionizable groups do not have altered pKs.

The HBOND entries describe the hydrogen-bonding properties of the structure. The number_of_hbonds entry lists the number of hydrogen bonds; these are broken down into total, side-side, side-backbone, and backbone-backbone. The number_of_unsatisfied_hbond_groups entry lists the number of unsatisfied hydrogen-bonding atoms; satisfaction is defined as described in .
For example:
HBOND number_of_hbonds: 40 0 2 38 total side-side side-bkbn bkbn-bkbn
HBOND number_of_unsatisfied_hbond_groups: 25 7 18 total side bkbn
This structure has 40 hydrogen bonds in total. There are 0 between sidechain atoms, 2 between sidechain and backbone atoms, and 38 between backbone atoms. This structure has 25 unsatisfied hydrogen-bonding atoms in total. There are 7 sidechain atoms that are unsatisfied, while 18 backbone atoms are unsatisfied.

Unsatisfied atoms are marked with a "!" as discussed below.

The CLASH entries describe clashes in the structure. The number of clashes are reported. For each clash, there is an entry:
atom1(seq residue name) atom2(seq residue name) r E_lennard_jones E_vdw_working
where r is the distance between atoms, E_lennard_jones is the vdW energy of the atom-pair, without modifications, and E_vdw_working is the vdW energy of the atom-pair, including modifications to the lennard-jones function. A clash is defined if E_vdw_working > VDW_CLASH_ENERGY. For example:
CLASH 173 ASN O 173 ASN CG 2.503237 15.050219 10.297471
The O atom of ASN 173 forms a clash with CG of ASN 173. The atoms are 2.503237 Å apart. Their energy with the unmodified Lennard-Jones potential is 15.050219 kcal/mol, while their energy with the softened vdW potential is 10.297471 kcal/mol.

As discussed previously, energy differences between two structures with a difference in the number of clashes greater than two are not reliably predicted accurately .

The LENGTH entry has the longest inter-atom distance (Å) in the structure. The CENTROID entry lists the x,y,z coordinates of the centroid of the structure.

A.1.5 ATOM entries

The ATOM entries are in traditional pdb format, with a few modifications and additions. The SASA of the atom is listed in the occupancy column. The Born radius is listed in the B-factor column. The number that follows is the energy of the atom, including interactions with other atoms and solvation. If an atom has not satisfied its hydrogen-bonding potential, it will have an "!" For example:
ATOM 757 1HZ LYS 50 17.482 16.702 28.616 0.00 1.76 -2.608348 !
This atom has a SASA of 0.00 Å2, and a Born radius of 1.76 Å , and has interactions (including solvation) with an energy of -2.608348 kcal/mol. The "!" indicates that its hydrogen-bonding is not satisfied; it is buried and does not form any hydrogen bonds.

For the last atom in a residue (indicated by an "@"), there are four additional entries. The first is the energy of the sidechain, followed by the energy of the entire residue, including backbone atoms. The next two numbers are the SASA of the sidechain, followed by the SASA of the entire residue, including backbone atoms. For example:
ATOM 713 OD2 ASP 47 20.371 7.323 23.443 36.79 1.63 -13.172937 @ -24.334115 2.313291 89.453524 99.383265
This atom has a SASA of 36.79 Å2, a Born radius of 1.63 Å, and has an energy of -13.172937 kcal/mol. This entire sidechain has an energy of -24.334115 kcal/mol. The energy of the residue, including the backbone, is 2.313291 kcal/mol. The SASA of the sidechain is 89.453524 Å2, while the SASA of the residue, including the backbone is 99.383265 Å2. Since there is no "!", this atom's hydrogen bonding is satisfied.

A.1.6 Output pdb file options

If OUTPUT_ENERGY_PER_ATOM_FLAG is set to 0 (default 1), the atom energies that are listed in the pdb file are not calculated.

If OUTPUT_COORD_FLAG is set to 0, the coordinates are <not> printed to the file. For user-launched jobs, the default is 1; ie: coordinates are printed. However, for scanning mutagenesis and multistate sequence optimization, the default is set to 0 for the intermediary files (which are deleted anyway). Writing a structure file can take a significant amount of time. For scanning mutagenesis, if the files for the individual mutants are desired, set CLEAN_UP_FLAG 0 and OUTPUT_COORD_FLAG 1.

A.1.7 Implementation and use

The function that creates the pdb file is output_stuff.cpp: output_PROTEIN.

The energies and surface areas may be read from a pdb file into ENERGY and SASA_SUM structs by
rotamer_calc_master.cpp: get_ENERGY_and_SASA_SUM_from_egad_pdb_file.

A CHROMOSOME struct corresponding to a given pdb file may be generated by rotamer_calc_master.cpp: pdbfile_to_CHROMOSOME. The resulting CHROMOSOME will have the sequence and rotamers that correspond to the structure file.

back to Table of Contents