back to Table of Contents

9. Pair energy lookup table generation

9.1 Overview

In order to make energy calculations for rotamer optimization efficient, EGAD employs a pair energy lookup table. The EGAD energy function has been parameterized for pairwise decomposition, permitting approximation of intrinsically non-pairwise energies, such as continuum electrostatics and surface-area dependent solvation . For rotamer optimization, rotamer-rotamer and rotamer-backbone energies are pre-calculated and stored in a lookup table. These pairwise partial energies were summed as needed during the optimization to determine the total energy (Figure 9.1.1).

9.1.1 Rotamer complexity, disk and memory usage

Protein design is a complex combinatorial optimization problem. For total designs, the log10(complexity) scales linearly with the number of positions allowed to change amino acid identity (Figure 9.1.1.1a). The number of combinations range from 1060 combinations for 26 fully variable residues to 10424 combinations for 194 fully variable residues. Despite these enormous numbers, as discussed below, the rotamer optimization methods in EGAD are able to identify the lowest energy optimal sequence for many of these problems.

Memory and disk usage scales with the number of variable positions (and thus, the log10(complexity)). In the worst case, storing the pairwise lookup table should scale quadratically with the number of moving positions. However, since many rotamer pairs are too far apart to interact, the actual scaling exponent is ~1.3. The memory usage scales from 93MB for 26 fully variable positions to just over 1GB for 194 fully variable positions; these values are well within the memory range of inexpensive mass-produced computers, suggesting that even very large problems can be addressed without specialized equipment (Figure 9.1.1.1b). The disk usage for these problems are significantly larger (up to 5GB for 194 fully variable positions), but are a small fraction of standard hard-drive sizes (Figure 9.1.1.1c).       

9.2 Inputfile options

9.2.1 LOOKUP_TABLE_DIRECTORY

If LOOKUP_TABLE_DIRECTORY is not defined, it defaults to
temp_lookup_directory.pid/       (pid = process ID number)
If the program runs to completion successfully, this directory is automatically removed. However, if there is a crash or premature exit, this directory must be removed manually. For most situations, it is recommended that the lookup table be saved to a defined location.
LOOKUP_TABLE_DIRECTORY   directory_path/directory_name

The directory path directory_path must exist, and must be write-accessible. However, the directory directory_name within directory_path need not exist a priori; it can be created by the program.
For example, suppose the directory /homedir/joeuser/lookup_tables exists. Then,
LOOKUP_TABLE_DIRECTORY   /homedir/joeuser/lookup_tables/srcSH2
will create the directory srcSH2 within the directory /homedir/joeuser/lookup_tables/.

Since these directory trees can often become quite large (hundreds of MB to >GB), it is recommended that the location be defined, and that the write-accessible area large enough for the problem at hand. Plots of protein size vs. disk usage are shown for the worst case scenario of total design. For most uses however, the required disk space will be much smaller.

If at all possible, it is advantageous to have the lookup table directory on a disk local to the host performing the calculation, (or the master for some parallelized jobs). Although this is not necessary, the network overhead required to access a remote disk can be significant.

Saving to disk permits concurrent or subsequent processes to reuse the same data, assuming that the forcefield and template pdb structure are identical. This can save a tremendous amount of CPU time. Each process loads the data it needs from the disk. If required data is not available, a process can calculate it, and save it to disk, permitting other processes to use it. This scheme permits straightforward parallelization of lookup table generation and utilization, as discussed later.

9.2.2 PRECALCULATION_LEVEL

For some jobs, it may not be necessary to calculate or load the entire lookup table into memory at once. For these cases, set
PRECALCULATION_LEVEL 0
This will calculate and/or load only the sidechain-backbone energies. The sidechain-sidechain energies (which scale O(n1-2) ), are calculated and/or loaded as needed. As in the default case (complete precalculation), these sidechain-sidechain energies are saved to disk.

For almost all non-parallelized jobs, all the sidechain-sidechain pair energies are likely to be considered at some point during the run, so there is no significant advantage to using this option. Therefore, this option is used almost exclusively for launching parallel rotamer calculation foremen (see multistate design and scanning mutagenesis sections).

9.2.3 JOBTYPE LOOKUP_TABLE_SLAVE

The lookup table will be loaded/calculated for any rotamer optimization job. However, there may be cases in which all that is wanted is the lookup table. For these cases, set
JOBTYPE LOOKUP_TABLE_SLAVE
This causes the program to exit upon completion of lookup table generation, instead of starting a rotamer optimization.

9.3 Brief description of lookup table generation

If LOGFILE_FLAG 1 (default), a file pid.lookup_log (pid= process ID number) is created. It serves as a progress monitor. It exists while the lookup table is being loaded/calculated, and is automatically deleted upon completion. As the lookup table reading/writing progresses, the files pid.lookup_log.0.25, pid.lookup_log.0.5, and pid.lookup_log.0.75 are created, signaling that the process has completed 25%, 50%, and 75% of the job respectively. If multiple processes are using the same disk, the existence of pid.lookup_log files can signal that a process is still in the disk-access intensive stage of lookup table generation, and the relative state of progress; deletion of these files upon completion can act as a signal to another process to start its own disk-access intensive lookup table generation.

If a pre-existing lookup table directory is being used, the LOOKUP_TABLE_DIRECTORY/forcefield_info file is checked. Processes that share the same lookup table can have different residuetypes at variable positions, or even completely different variable and fixed positions. However, these processes must have identical forcefield parameters and use the same template structure. If there is any discrepancy at all, the program will exit with an ERROR message pointing out the problem. If a process is creating a new table, it will create the directory tree and write the forcefield info file.

This file, like all the lookup table files, is a binary file, and is therefore architecture-dependent; one of the forcefield_info checks is for architecture equivalence. Note that SGI's and x86 machines cannot share these files. While it might have been more flexible to save these datafiles as text, the parsing would have been much slower, and the utilization of disk space significantly greater.

The sidechain-backbone energies are loaded. If these energies are not available for a particular residue at a given position, the coordinates and energies are calculated and saved. Sidechain rotamers that have a vdW interaction energy with the fixed backbone greater than the maximum defined in the resparam file are filtered out, since it is unlikely (but not guaranteed) that these are present in the lowest energy solution. These data are all saved to disk.

At this point, reference state energies are subtracted from the sidechain-backbone energies. Since it is possible that different reference state model may be tested or used with the same template structure and forcefield, these referenced energies are NOT saved to disk, thus permitting the same lookup table on disk to be shared by processes that employ different reference state models. The subtraction of reference state energies adds trivially to the time.

The interaction list that describes what residuetypes at other positions a given residuetype at a given position interacts with is loaded. If such a list file does not exist, it is created. If such a file exists, but does not contain information about new residuetypes at other positions that were not present in the earlier process that created it, the file is updated and saved. This interaction list information is used to guide calculation and memory allocation for sidechain-sidechain energies.

Sidechain-sidechain pair energy calculation is perhaps the most time and memory intensive step. For many problems, it is even longer than the actual optimization itself. Be patient! After this is done, the pid.lookup_log files are deleted.

The data stored in these files are not significantly compressible; running gzip -r on lookup table directories result in negligible changes ( ≤ 0.5%) in disk usage (assayed by du -sk).

9.4 Parallelization of lookup table generation

The lookup table can be calculated in either a serial or a parallel manner. Parallelization of lookup table generation can speed things up tremendously for large problems, like total design. In general, for any problem with more than 20 variable positions which have all residuetypes as options, parallelization for lookup table generation is a significant speedup. For smaller problems, such as single-sequence rotamer optimization of a small protein, the speedup may be smaller due to network overhead and built-in delays designed to prevent process collisions. For small jobs ( ≤ 1020 rotamer combinations), EGAD automatically switches to serial mode.

Details on running parallel jobs are given in the Parallelization of EGAD jobs section.

9.4.1 Brief description of parallelized lookup table generation

The master process creates a number of slave inputfiles (inputfilename.master_pid.slave.input, master_pid = process ID number of master process) in the current working directory (these will be deleted when the parallel phase is completed). The master process launches (via ssh) independent foremen processes to each of the defined slave hosts. Each foreman process in turn launches child slave processes which perform the actual calculations.

During the calculation, the a slave inputfile (inputfilename.master_pid.slave.input) is mv'd to inputfilename.master_pid.slave.working to indicate that it is being worked on by a slave process. If the slave job exits prematurely (but gracefully), the inputfilename.master_pid.working file is mv'd back to inputfilename.master_pid.slave.input, putting it back in the queue. After a job corresponding to a slave inputfile is finished, the slave working file is mv'd to inputfilename.master_pid.slave.done. The master process uses the existence of files with these suffixes as progress meters. After all the slave inputfiles are done (in reality, the master has timers to prevent hung slave jobs from stalling everything else), the master deletes all the slave inputfiles, and loads the entire lookup table into memory, filling missing parts (if any). Finally, the rotamer optimization job defined in the inputfile is run, as in the single-processor version. The foremen processes kill themselves automatically after all the slave input files have been deleted.

IO_ERRORs may pop up during this process. These are usually benign, and are likely due to process collisions in which one of slave process will die, or master/foreman jobs trying to rename or delete files that have been renamed or deleted by another process.

9.5 Description of lookup table generation code

9.5.1 Description of
LOOKUP_ENERGY datastructures

A schematic of the organization of the lookup table is shown in Figure 9.5.1.1. Please also look at the appropriate entries in structure_types.h. The PROTEIN struct contains the LOOKUP_ENERGY lookupEnergy array, which contains an element for each non-P/G position i. protein->lookupEnergy[i] contains within it the LOOKUP_ENERGY_RESIDUE lookupRes array, which contains an element for each residue i_res permitted at i.

lookupEnergy[i].lookupRes[i_res] contains information about residue choice i_res, such as the average surface areas for the rotamers, and whether this residuetype should be considered to be fixed for the purpose of adding up pair energies (fixed_flag). It also contains the roots for two trees. LOOKUP_ENERGY_RESIDUE_RESIDUE lookupResRes contains information about what residues j_res at other positions j>i interact with residue i,i_res. This tree exists only transiently, and is cleared from memory before loading/calculating sidechain-sidechain pair energies. The LOOKUP_ENERGY_ROTAMER lookupRot array contains an element for each rotamer i_res_rot.

lookupEnergy[i].lookupRes[i_res].lookupRot[i_res_rot] (ERESROT) contains information about rotamer i_res_rot. It has ROTAMER rotamer, which has information about this rotamer. energy_var_fix contains the actual energy between this rotamer and the fixed atoms, minus the reference state energy. It also has the atomic coordinates for this rotamer in the milli_pdbATOM sideAtoms and ROTAMERLET rotamerlet (for local minimization of vdW energies) arrays. Finally, it contains the LOOKUP_ENERGY_X lookupX array, which contains an element for each other position j>i.

ERESROT.lookupX[j-i] contains the LOOKUP_ENERGY_RESIDUE_X lookupResX array for the interaction of residues at position j>i with rotamer i,i_res,i_res_rot. For each permitted residue j_res at position j>i, ERESROT.lookupX[j-i].lookupResX[j_res-1] contains the LOOKUP_ENERGY_ROTAMER_X lookupRotX array. For each rotamer j_res_rot for residue j_res at position j>i, the interaction energy with rotamer i,i_res,i_res_rot is stored in
ERESROT.lookupX[j-i].lookupResX[j_res-1].lookupRotX[j_res_rot-1].energy_var_var (ERESROT.JRESROT.energy_var_var).

For fixed positions and non-interacting residue and rotamer pairs, special values are defined by lookup_table.cpp: initialize_lookuptable_pointers. If residue i,i_res is defined as fixed, lookupEnergy[i].lookupRes[i_res].lookupRot[1].energy_var_fix is set to FIXED_POSITION_PTR. If rotamer i,i_res,i_res_rot does not interact with any residuetype j_res at position j, ERESROT.lookupX[j-i].lookupResX is set to NON_INTERACT_LOOKUP_RES_X. If rotamer i,i_res,i_res_rot does not interact with any rotamer for residuetype j_res at j, then ERESROT.lookupX[j-i].lookupResX[j_res-1].lookupRotX is set to NON_INTERACT_LOOKUP_ROT_X. If residues i,i_res and j,j_res have at least one interacting rotamer pair, but the energies have not yet been calculated or read, ERESROT.lookupX[j-i].lookupResX[j_res-1].lookupRotX is set to NULL; the reading function must calculate or read these values. If rotamer i,i_res,i_res_rot does not interact with rotamer j,j_res,j_res_rot, ERESROT.JRESROT.energy_var_var is set to NON_INTERACT_PTR. Functions that read energies from the lookup table must recognize these special values and act accordingly. For examples, see CHROMOSOME_to_lookupEnergy.cpp: CHROMOSOME_to_lookupEnergy, dee_utilities.cpp: initialize_lookuptable_for_dee and dee_utilities.cpp: get_energy.

Functions that read energies from the table must recognize these special values.

9.5.2 Discussion and rationale for the implemented strategy, thoughts on improvement

The lookup table is stored on disk as a directory tree that resembles the organization of the data in memory.

All the coordinates for residues at position i are stored in the LOOKUP_TABLE_DIRECTORY/coordinates/i/ directory. The coordinates for the max_vdw-filtered rotamers (including the native rotamer) for residuetype XYZ at position i are saved in the file LOOKUP_TABLE_DIRECTORY/coordinates/i/XYZ.i.structure. The coordinates for the just the native rotamer of residuetype XYZ at position i is stored in LOOKUP_TABLE_DIRECTORY/coordinates/i/XYZ.i.structure.fixed. This file is read/written if position i is defined as fixed.

The sidechain-backbone energies for all residues at position i are stored in the LOOKUP_TABLE_DIRECTORY/var_fix/i/ directory. Elements of the LOOKUP_ENERGY_RESIDUE
and the LOOKUP_ENERGY_ROTAMER for residuetype XYZ at position i (including the native rotamer) are saved in the file LOOKUP_TABLE_DIRECTORY/var_fix/i/XYZ.i.var_fix_energy. The LOOKUP_ENERGY_RESIDUE and LOOKUP_ENERGY_ROTAMER for just the native rotamer of residuetype XYZ at position i are stored in LOOKUP_TABLE_DIRECTORY/coordinates/i/XYZ.i.var_fix_energy.fixed. This file is read/written if position i is defined as fixed.

The interaction tables (LOOKUP_ENERGY_RESIDUE_RESIDUE) for all residues at position i are stored in the LOOKUP_TABLE_DIRECTORY/interaction_lists/i directory. The table for residuetype XYZ at position i is saved in the file LOOKUP_TABLE_DIRECTORY/interaction_lists/i/XYZ.i.interaction_list. These files are created only if position i is not defined as fixed.

The sidechain-sidechain interaction energies for all residues at position i with all residues j>i are stored in the LOOKUP_TABLE_DIRECTORY/var_var/i/i.j directory. The sidechain-sidechain interaction energies for residutype XYZ at position i with residuetype ABC at position j>i are stored in the file LOOKUP_TABLE_DIRECTORY/var_var/i/i.j /XYZ.i.ABC.j.var_var_energy. If position i is fixed, the energies with ABC at position j are stored in LOOKUP_TABLE_DIRECTORY/var_var/i/i.j/XYZ.i.f.ABC.j.var_var_energy. Similarly, if j is fixed, the data are stored in LOOKUP_TABLE_DIRECTORY/var_var/i/i.j/XYZ.i.ABC.j.f.var_var_energy. If both i and j are fixed, the energy is NOT saved to disk. If i,XYZ and j,ABC do not have at least one interacting rotamer pair, no file is created; based on the LOOKUP_ENERGY_RESIDUE_RESIDUE interaction table, the program will know not to look for or create the file.

This scheme results in the creation of many small files. This uses up disk blocks/inodes less efficiently than an alternative scheme that uses fewer larger files, or even one large file. However, the small file strategy does have some advantages over the large file scheme. If larger files were used, the process that created the file initially would need to know a priori the requirements for subsequent jobs (or simply assume the worst case), and place spacer material that could be overwritten latter. Indeed, this scheme is used for the storage of interaction lists. However, for sidechain-sidechain pair energy storage, this strategy would result in large files with a lot of filler material. An alternative strategy would be to use some sort of table of contents system so information could be appended as needed and found later as needed (see below for some thoughts on this).

Parallelization is most efficiently accomplished if the job can be broken into small parts. For this problem, the natural job block units are the residue-backbone and residue-pair. If a few large files were used, parallelization would have been less efficient, since an given file can be written to by only one process at a time. For many problems, explicit parallelization is not employed or necessary. However, there is implicit parallelization if succeeding processes follow each other. If these processes have some lookup table elements in common, this scheme makes it very straightforward for each process or read only what it needs, and write anything that it needed to calculate, making that data available for succeeding processes.

A compromise strategy would be to write data as individual files, thus avoiding the process-collision problem during parallelization, and then use tar to concatenate these files into larger files, freeing up space in partially filled disk blocks. When data is needed, the directory containing the files of interest is untarred, the files are read, and the directory re-tarred. Unfortunately, running tar dynamically is slow (this scheme was tested in an older version of the program). An even better method would be to read the data directly from the tar file. Tar files have a table of contents that lists the location of individual files within the large file, making direct access to files possible without searching. Another option would be to create a custom, highly optimized table-of-contents system for accessing files within a large concatenated file. In any case, since these lookup table directories are not meant to be kept for long-term storage, and since disk capacities keep increasing, attempts to further improve this presently functional implementation were abandoned. As mentioned above, since these files are not significantly compressible, there is little to be gained by compression.

9.5.3 Description of lookup table generation functions

The lookup_table.cpp: generate_lookup_table manages the generation of protein->lookupEnergy from the information stored in protein->var_pos. lookup_table.cpp: initialize_lookuptable_pointers is called to initialize special pointers and values used for flagging fixed positions and non-interacting residue and rotamer pairs (see above). The function lookup_table_disk_stuff.cpp: check_lookup_table creates the lookup table directory and writes the forcefield_info file. If the directory already exists, forcefield parameters are compared to those in the existing forcefield_info file. If there is any discrepency, the program will exit with an explanatory ERROR message.

lookup_table_disk_stuff.cpp: load_lookupRes_from_disk is called to read sidechain coordinates and sidechain-backbone energies from disk. If the information for a residuetype at a given position is not available, the coordinates must be generated, and the sidechain-backbone energies calculated.

In order to calculate approximate surface areas and born radii, pseudoatom coordinates are generated and placed in the mini_pdbATOM fixed_atoms array. The surface areas and associated energies for proline and glycine atoms are calculated, and then latter added to the energy of every rotamer at the first variable position.

For a given residue choice at a position, the coordinates for each rotamer is generated, and its energies calculated in turn. These coordinates and energies are stored temporarily in the PSEUDO_LOOKUP_ENERGY_ROTAMER pseudo_lookupRotamer array. PSEUDO_LOOKUP_ENERGY_ROTAMER is a datatype local to lookup_table.cpp. The information in this struct is condensed into LOOKUP_ENERGY_ROTAMER for longer-term storage. The born radii and sidechain-backbone energies are calculated by calling pairwise_energy_calc.cpp: var_fixed_energy_calc. Once the approximate born radii are calculated by this function, the internal born energies within the rotamer are calculated by traversing the COULOMBIC coulombic linked-list that was generated by energy_functions.cpp: intrinisic_rotamer_energy during energy function input (see energy function input section).

If the vdW energy between a rotamer and backbone exceeds resparam.max_vdw for this residuetype, the rotamer is flagged for rejection. Acceptable rotamers have their surface areas calculated. After all the rotamers for a given residuetype at a given position are measured, boltzmann probabilities based on the sidechain-backbone vdW energy (apparent T = 500 K -> RT=1) are calculated for acceptable rotamers. Acceptable rotamers with extremely low probabilities (<10-6) are flagged for rejection. These probabilities are also used to calculate the average surface area, hydrophobic surface area, and water-octanol transfer free energy for a residuetype at a position; these are used for solubility filters (see solubility design section). Finally, coordinates and energies for un-rejected rotamers are condensed and stored in LOOKUP_ENERGY_ROTAMER. These newly calculated coordinates, energies, surface areas, and born radii are saved to disk by the function lookup_table_disk_stuff.cpp: save_lookupRes_to_disk.

To prevent errors due to collisions between independent processes, any files that are created (or updated) by a process are named (or renamed) as filename.PID (PID = process ID number). After writing the file, the file is renamed to the appropriate filename.

After all the sidechain-backbone energies are read/calculated, reference state energies are subtracted from each rotamer's lookupRot[i_res_rot].energy_var_fix. Doing this calculation after the sidechain-backbone energies have been loaded from disk permits the same lookup table directory to be used with different reference state models. In any case, since this step scales O(n) and is simply a subtraction, only a trivial amount of time is spent.

The next step is to load and/or calculate the LOOKUP_ENERGY_RESIDUE_RESIDUE interaction table for each residuetype at each position. lookup_table_disk_stuff.cpp: load_lookupResRes_from_disk is used to read these from disk. If there is missing information, sidechain pairs that have at least one rotamer pair that interact (atompair ≤ FORCEFIELD_DISTANCE_CUTOFF (10Å) are identified. This search is done in a hierarchal manner. First, positions with interacting CB atoms are defined as interacting for all residue/rotamer pairs. Remaining position pairs are assayed These data are then written to disk by lookup_table_disk_stuff.cpp: save_lookupResRes_to_disk.

If parameters.disk_lookup_table_flag = 1 (PRECALCULATION_LEVEL 1; default), energies for all interacting rotamers are read and/or calculated by lookup_table_disk_stuff.cpp: load_lookupResX_from_disk. This function allocates memory, sets non-interacting sidechain pair flags, loads, calculates, and writes sidechain pair energies to and from disk as needed. The rotamer pair energies are calculated by calling pairwise_energy_calc.cpp: rotamer_pair_energy, which in turn calls pairwise_energy_calc.cpp: var_var_energy_calc. Rotamer coordinates are freed from memory after they are no longer needed.

If PRECALCULATION_LEVEL 0, then memory is allocated for sidechain pairs, and non-interacting sidechain pair flags are set. However, memory is not allocated for rotamer pairs; ERESROT.lookupX[j-i].lookupResX[j_res-1].lookupRotX is set to NULL, signalling the reading function to allocate memory and calculate/ or load these values as needed with CHROMOSOME_to_lookupEnergy.cpp: load_calc_save_lookupResX. In order to use this function, extern char LOOKUP_TABLE_DIRECTORY[] must be defined to the proper directory (currently set in input_stuff.cpp: input_stuff, as well as at the beginning of each rotamer optimization algorithm control function to protein->parameters.lookup_energy_table_directory).

Positions with one and only one rotamer are marked as fixed for the purposes of the rotamer optimization. Energies between fixed positions and moving positions are calculated and added to ERESROT.energy_var_fix for the moving positions. Energies between fixed positions are added up and appended to parameters.fixedatoms_energy. The energy_var_fix energies for the fixed positions are also appended to parameters.fixedatoms_energy. Finally,
ERESROT.energy_var_fix for these fixed positions are set to FIXED_POSITION_PTR, as described in the data-structure section, and the lookup table branches rooted from this position are pruned.

9.5.4 Description of parallel lookup table generation functions

If a process is defined as a lookup_table_master in the command-line, the inputfile is parsed by input_stuff.cpp: input_stuff. The command-line is parsed by EGAD.cpp: parse_command_line. If available_processors are defined in the command-line, these are read and used to write the AVAILABLE_PROCESSORS_FILE /tmp/avail_processors.PID (PID = process ID number).

protein and the name of the inputfile are sent to parallel_egad.cpp: lookup_table_master. This function reads the AVAILABLE_PROCESSORS_FILE, and counts the number of available CPUs (num_processors).

The file inputfile_top.PID is created. It is essentially the parameter section from the original inputfile, except:
OTHER_RESIDUES none
LOGFILE_FLAG 0 # don't create and write logfiles
QUIET_FLAG 1 # don't print error messages       
This file is used to generate the parameter section for all the slave inputfiles.

num_processors job_frag files are created. Each of these has lines for a subset of the variable positions lines from the original inputfile (shortcuts are expanded). The file PID.n.job_frag has lines for variable positions starting at the nth line and every (num_processors + n)th one after that.

These job_frag files are used, along with the inputfile_top.PID file, to create the slave inputfiles. The first num_processors slave inputfiles PID.n.slave.input are simply PID.n.job_frag appended to the contents of inputfile_top.PID. When these are run, the coordinates, sidechain-self and sidechain-backbone energies for these positions are calculated and saved. The sidechain-pair energies between these positions are calculated and saved as well.

The remaining inputfiles are created by generating variable position sections from all non-redundant pairs of job_frag files, and appending these to the contents of inputfile_top.PID. When these are run, energies between sidechains from the different job_frag blocks are calculated; the coordinates and energies between sidechains within a given block, as well as their self and backbone interaction energies, should have been calculated by the first num_processors slave inputfiles discussed earlier.

The master process launches/forks foremen processes to the slaves by calling io.cpp: launch_command. It then waits until the slave inputfiles are all mv'd to PID.n.slave.working. If only one slave job remains to be launched, yet has not started after five minutes, the master goes to the next step and waits for all the PID.n.slave.working files to be moved to PID.n.slave.done. If after five minutes, a slave process is still not done, the master assumes that a job has crashed, and moves on. The master removes all the temporary slave files, and then loads the lookup table into memory, filling in missing pieces, if any, and then performs the rotamer optimization job it was assigned to do.

A lookup table foreman process is defined in the command-line argument that launched it. In practice, foremen jobs should only be launched by a master EGAD process, not manually. When detected by main, the inputfile name, along with the master_PID and number of slave inputfiles, are parsed and sent to parallel_egad.cpp: lookup_table_foreman. This function checks for the existence of slave inputfiles. If it finds one, it mv's it from master_PID.n.slave.input to master_PID.n.slave.working. This renamed file is used as input for a child (not forked) EGAD lookup_table_slave process. If this slave process returns 0 (success), master_PID.n.slave.working is renamed master_PID.n.slave.done. Otherwise, master_PID.n.slave.working is returned to the queue by mv'ing it back to master_PID.n.slave.input. The foreman keeps checking for remaining slave inputfiles until all are working. Once all the slave inputs are working, it waits until all are done. If one of these other jobs crashed, it picks up the re-queued inputfile and runs it. Finally, after all the jobs are done, or when the master removes all the slave inputfiles, the foreman process exits.

back to Table of Contents