back to Table of Contents

2. Descriptions of main, program flow, overview of data-structures

When running EGAD, you must refer to the full path for the executable:
mycomputer % ~myusername/EGAD/bin/EGAD.exe myinputfile.input &

Alternatively, create the appropriate alias in your .aliases file.

Since EGAD jobs can take a while to run, it is recommended that they be forked (&) from the calling shell. Additional arguments are necessary for parallelization or batch processing.

2.1 main

The EGAD.cpp: main function parses the command-line, and sends the inputfilename to the input_stuff.cpp: input_stuff, which parses the inputfile. input_stuff initializes a data structure PROTEIN protein, which acts as the main object within the program. After initialization, protein is sent back to main. Based on the specified jobtype, main sends protein to functions which manage the different jobtypes. After the run is completed, protein is sent to output_stuff.cpp: output_PROTEIN, which writes the final structure, energies, and other information to a pdb-style structure file, as well as other files as needed (described in pdb file and other sections). There are additional command-line parameters used for multi-processor parallelization; these are discussed in the parallelization section.

2.2 Exit status, errors, I/O errors, and warnings

If EGAD is successful, it exits with a value of 0. Errors due to inappropriate or incompatible inputs that are caught by the program exit the program, returning 1. Errors messages are printed to stderr, with the "ERROR" or "IO_ERROR" prefix. Messages are also written to name_of_inputfile.EGAD_MESSAGE_LOG.

In a few functions, errors that are more likely due to a bug in the calling function than in faulty or incompatible inputs result in an ERROR line to stderr explaining the problem, and a corefile, rather than a soft exit; the corefile permits the use of a debugger to identify the location of the offending function.

Warning messages are printed to stderr with the "WARNING" prefix. These messages are not fatal, but are merely for the user's information. Warnings are for things like inputted options that are no longer supported, template pdb files with missing atoms, non-obvious assumptions being made by the program that the user should be aware of, older jobtypes and options that are supported but should be replaced with newer ones, etc.

Error messages and warnings, except those resulting in core dumps (most likely due to a bug), may be turned off by including
QUIET_FLAG 1     # default 0
in the inputfile. By default, slave processes launched by EGAD include QUIET_FLAG 1.

File manipulation and access functions in EGAD have wrappers (see io.cpp, io.h, and FUNCTION_LIST). In general, these functions attempt to perform the desired task 25 times over 50 secs, printing an IO_ERROR message each time it fails. If the task still cannot be performed, the defined fail value is returned to the calling function. The calling function can decide whether the problem is fatal or not. This is useful for overcoming problems due to networking or collisions between independent processes operating on the same file. However, if the failure is due to improper file permissions, the wrapper function will exit the program completely, returning 1 and an appropriate IO_ERROR message. The indicated file or directory should have its permissions changed, or the offending file or inputfile modified appropriately (see relevant sections).

2.3 Premature exit or early program advancement

If the user wishes to gracefully exit the program pre-maturely, or jump to the next step, the logfile lists the names of files that may be created to signal this.
touch escape.job_specific_label.pid (pid= process ID number)
will create the "escape hatch" file, and lead to premature program termination or advancement to the next step. Depending on the job, the actual escape or jump may not happen instantaneously.

2.4 Data structure overview

A PROTEIN struct contains all the information about a given protein template, running parameters for the jobs that are to be performed on it, any data these jobs may require, and the results from these jobs (see structure_types.h: PROTEIN). In principle, multiple PROTEIN objects may co-exist within the same program simultaneously; however, this has been tested only with programs that run the same jobs with the same energy function parameters on the individual PROTEIN structures. The data-structures within PROTEIN are discussed in the relevant sections.

Two important structures within PROTEIN are VARIABLE_POSITION *var_pos and LOOKUP_ENERGY *lookupEnergy. var_pos contains the information about each residue in the protein, such as backbone coordinates, classification as core or surface, and the residues permitted at that position. This structure is discussed further in the input variable positions section. lookupEnergy is the pair energy lookup table used during rotamer optimization. It is built using var_pos as a guiding template. Both of these structures have pointers to relevant elements of the RESPARAM *resparam (residue parameters) and ROTAMERLIB *rotamerlib (rotamer library) arrays within PROTEIN for fast access to the energy function and rotamer data. var_pos also has pointers to the relevant elements within lookupEnergy.

Moves within EGAD are primarily done in dihedral space. The CHROMOSOME and GENE (and bkbnGENE for moving backbone) structs are the primary repositories and conduits for the dihedral representation of the protein structure. An individual CHROMOSOME represents a protein structure, and contains information about the energy, and information for the optimization method, such as mating frequencies for genetic algorithms. GENE and bkbnGENE are linked-lists within CHROMOSOME that carry the actual information necessary for building the coordinates, such as dihedral angles. GENE has pointers to elements of lookupEnergy and var_pos, permitting rapid access to pair energy lookup table values during rotamer optimization.

Actual atom coordinates in EGAD are stored in the pdbATOM and mini_pdbATOM data-structures (defined in structure_types.h). The main difference between the two is that pdbATOM contains fields for the atomname, residuetype, atom_number, seqpos_text, and chain_id. pdbATOM is used almost exclusively for input and output. mini_pdbATOM is the primary atom data-structure within EGAD. The atom_ptr element within these structures provides a direct link to the relevant forcefield parameters (ATOMRESPARAM) for that atom (charge, vdW parameters, bonding neighbors, etc). These are discussed further in the energy function section.

back to Table of Contents