back
to Table of Contents
2. Descriptions of main, program flow,
overview of data-structures
When running EGAD, you must refer to the full path for the executable:
mycomputer % ~myusername/EGAD/bin/EGAD.exe
myinputfile.input &
Alternatively, create the appropriate alias in your .aliases file.
Since EGAD jobs can take a while to run, it is recommended that they be
forked (&) from the calling shell.
Additional arguments are necessary for parallelization
or batch processing.
2.1 main
The EGAD.cpp: main function parses the
command-line, and sends the inputfilename to the input_stuff.cpp:
input_stuff, which parses the inputfile. input_stuff
initializes a data structure PROTEIN
protein, which acts as the main object within the program. After
initialization, protein is sent back to main. Based on the specified jobtype, main sends
protein to functions which manage the
different jobtypes. After the run is completed, protein
is sent to output_stuff.cpp: output_PROTEIN,
which writes the final structure, energies, and other information to a
pdb-style structure file, as well as other files as needed (described
in pdb file and other sections). There are additional command-line
parameters used for multi-processor parallelization; these are
discussed in the parallelization section.
2.2 Exit status, errors, I/O errors, and warnings
If EGAD is successful, it exits with a value of 0.
Errors due to inappropriate or incompatible inputs that are caught by
the program exit the program, returning 1.
Errors messages are printed to stderr,
with the "ERROR" or "IO_ERROR"
prefix. Messages are also written to name_of_inputfile.EGAD_MESSAGE_LOG.
In a few functions, errors that are more likely due to a bug in the
calling function than in faulty or incompatible inputs result in an ERROR line to stderr
explaining the problem, and a corefile, rather than a soft exit; the
corefile permits the use of a debugger to identify the location of the
offending function.
Warning messages are printed to stderr
with the "WARNING"
prefix. These messages are not fatal, but are merely for the user's
information. Warnings are for things like inputted options that are no
longer supported, template pdb files with missing atoms, non-obvious
assumptions being made by the program that the user should be aware of,
older jobtypes and options that are supported but should be replaced
with newer ones, etc.
Error messages and warnings, except those resulting in core dumps (most
likely due to a bug), may be turned off by including
QUIET_FLAG 1
# default 0
in the inputfile. By default, slave processes launched by EGAD
include QUIET_FLAG 1.
File manipulation and access functions in EGAD have wrappers (see io.cpp, io.h, and FUNCTION_LIST).
In general, these functions attempt to perform the desired task 25
times over 50 secs, printing an IO_ERROR
message each time it fails. If the task still cannot be performed, the
defined fail value is returned to the calling function. The calling
function can decide whether the problem is fatal or not. This is useful
for overcoming problems due to networking or collisions between
independent processes operating on the same file. However, if the
failure is due to improper file permissions, the wrapper function will
exit the program completely, returning 1
and an appropriate IO_ERROR
message. The indicated file or directory should have its permissions
changed, or the offending file or inputfile modified appropriately (see
relevant sections).
2.3 Premature exit or early program advancement
If the user wishes to gracefully exit the program pre-maturely, or jump
to the next step, the logfile lists the names of files that may be
created to signal this.
touch escape.job_specific_label.pid (pid= process ID number)
will create the "escape hatch" file, and lead to premature program
termination or advancement to the next step. Depending on the job, the
actual escape or jump may not happen instantaneously.
2.4 Data structure overview
A PROTEIN struct contains all the
information about a given protein template, running parameters for the
jobs that are to be performed on it, any data these jobs may require,
and the results from these jobs (see structure_types.h:
PROTEIN). In principle, multiple PROTEIN
objects may co-exist within the same program simultaneously; however,
this has been tested only with programs that run the same jobs with the
same energy function parameters on the individual PROTEIN
structures. The data-structures within PROTEIN
are discussed in the relevant sections.
Two important structures within PROTEIN
are VARIABLE_POSITION *var_pos and LOOKUP_ENERGY *lookupEnergy. var_pos
contains the information about each residue in the protein, such as
backbone coordinates, classification as core or surface, and the
residues permitted at that position. This structure is discussed
further in the input variable positions section. lookupEnergy
is the pair energy lookup table used during rotamer optimization. It is
built using var_pos as a guiding template.
Both of these structures have pointers to relevant elements of the RESPARAM *resparam (residue parameters) and ROTAMERLIB *rotamerlib (rotamer library) arrays
within PROTEIN for fast access to the
energy function and rotamer data. var_pos
also has pointers to the relevant elements within lookupEnergy.
Moves within EGAD are primarily done in dihedral space. The CHROMOSOME and GENE
(and bkbnGENE
for moving backbone) structs are the primary repositories and conduits
for the dihedral representation of the protein structure. An individual
CHROMOSOME represents a protein structure,
and contains information about the energy, and information for the
optimization method, such as mating frequencies for genetic algorithms.
GENE and bkbnGENE
are linked-lists within CHROMOSOME that
carry the actual information necessary for building the coordinates,
such as dihedral angles. GENE has pointers
to elements of lookupEnergy and var_pos, permitting rapid access to pair energy
lookup table values during rotamer optimization.
Actual atom coordinates in EGAD are stored in the pdbATOM
and mini_pdbATOM data-structures (defined in structure_types.h). The main difference between
the two is that pdbATOM contains fields
for the atomname,
residuetype, atom_number, seqpos_text, and chain_id.
pdbATOM is used almost exclusively for
input and output. mini_pdbATOM is the
primary atom data-structure within EGAD. The atom_ptr
element within these structures provides a direct link to the relevant
forcefield parameters (ATOMRESPARAM) for
that atom (charge, vdW parameters, bonding neighbors, etc). These are
discussed further in the energy function section.
back
to Table of Contents