back to Table of Contents

13. Parallelization of EGAD jobs

13.1 Overview

Parallelization can be used by EGAD to speed up calculations. As discussed above, parallelization is useful for lookup table calculations. For multiple-state jobs, parallelization is absolutely required.

EGAD follows a master-foreman-slave model. The master launches foreman jobs on the slave hosts. The master also creates slave inputfiles. On a slave host, a foreman process determines which slave jobs are not finished or are not being worked on, launches a local slave process accordingly, and marks the slave job as being worked on. After the slave process is finished, the foreman marks the slave job as completed, and launches the next slave job. After all the slave jobs are completed, the foreman automatically exits. If a few slave jobs are taking significantly longer than others (if the slave host crashed, for example), the master will automatically re-queue the incomplete jobs. In this manner, slaves of different speeds can all be used simultaneously, with de facto load balancing; faster slaves get more to do.

By default, foreman/slave processes are assigned a low priority (nice +10 level) to prevent the user from being accused of being a CPU-hog on shared systems.

13.2 Requirements

The master and slave nodes may be on the same machine, or on different machines. However, they must all be able to read and write to the defined LOOKUP_TABLE_DIRECTORY. It is highly recommended, for the reasons described above, that LOOKUP_TABLE_DIRECTORY be on a disk local to the master node. It is also required that the slaves also be able to use the same EGAD executable.

For clusters, the user who owns the master EGAD process must have passwordless ssh activated between the master and slave hosts. If the cluster has a single-system image, treat it as a single multi-processor machine.

13.3 Inputfile options

There are a few inputfile options relevant for parallelization:

BATCH_QUEUE_PREFIX                (none) 
        (any command line prefixes required for batch queuing systems; see your sysadmin about this)
NICE_SLAVE_FLAG           1      
        (if 0, the slave jobs will not be nice'd; be careful with this option! default 1 sets them to nice +10)
CPU_FILE                           (none)
        (a file that lists available slave hosts; format below; may also be defined in the command-line)
EXECUTABLE_FILENAME               (taken from argv[0])   
        (name of the EGAD executable; extracted automatically from the command-line, but may also be specified)
NUM_SYSTEM_CPU            0
         (for > 0 for multiprocessor systems)

13.4 Launching a parallel job
                                                                       
Parallelization must be defined in the command-line. This prevents accidental launching of a parallel job that would drain system resources from multiple machines. For lookup table calculations, "lookup_table_master" must be indicated. For other parallel jobs, "parallel" must be indicated. In the examples listed here, "lookup_table_master" is used; however, "parallel" must be substituted for parallelization of other jobs such as scanning mutagenesis or multistate optimization.

If the CPU_FILE (described below) or NUM_SYSTEM_CPU is defined in the inputfile, the following will suffice:
EGAD.exe inputfile lookup_table_master   (for lookup table parallelization)
EGAD.exe inputfile parallel              (for other parallel jobs)

For multiprocessor systems, define NUM_SYSTEM_CPU may be defined in the command-line:
EGAD.exe inputfile lookup_table_master n cpus   (for lookup table parallelization)
EGAD.exe inputfile parallel n cpus              (for other parallel jobs)

For clusters, if the CPU_FILE is not defined in the inputfile, then the command-line must define the names of the slave hosts. This can be done in two ways. For the first way, a CPU_FILE is defined in the command-line:
EGAD.exe inputfile lookup_table_master cpu_file: cpu_file   (for lookup table parallelization)
EGAD.exe inputfile parallel cpu_file: cpu_file              (for other parallel jobs)
For example:
schwing 36% EGAD.exe gb1.total_design.input lookup_table_master cpu_file: ../energy_function/cpu_list

In the second way, the hostnames may be listed in the command-line:
EGAD.exe inputfile lookup_table_master n cpus: host(1) host(2) .... host(n)
If the master CPU is permitted to behave as a slave during the parallel phase of the calculation (recommended, since the master process uses very little system resources during the parallel phase), its name should also be included in the slave processor list.
For example:
schwing 36% EGAD.exe gb1_design.input lookup_table_master 5 cpus: doh schwing whoa excellent doh

13.5 CPU_FILE

The CPU_FILE is simply a list of available slave hosts. It may not contain any empty lines, and must end with a newline.
For example (examples/energy_function/cpu_list):
cowabunga
schwing
excellent        # cpu 1
doh
excellent        # cpu 2

This file defines five slave hosts. Since excellent has two CPUs, it can be sent two slave jobs. If the master CPU is permitted to behave as a slave during the parallel phase of the calculation (recommended, since the master process uses very little system resources during the parallel phase), its name should also be included in the slave processor list.

13.6 Implementation of parallelization for rotamer calculations

The coarse-grained parallelization described here follows naturally from the problems of protein design. As discussed in the lookup table section, residues serve as natural job blocks for parallelizing lookup table generation. For other jobs, such as scanning mutagenesis and multistate optimization, individual rotamer optimization jobs serve as natural job blocks. This type of parallelization is very simple to implement, is robust, especially with respect to crashed slaves, provides de facto load balancing between slaves of different speeds, and can be run on any cluster of machines that can access the same directories. This scheme scales extremely well, approaching linearity with number of CPUs. This scheme also avoids the problems of thread timing and process synchronization that can potentially complicate finer-grained parallelization. On the down side, there is a significant disk access overhead with this scheme as presently implemented.

A potentially better implementation may involve using MPI or some other parallelization standard to operate at this coarse-grained level. For larger single-state sequence design problems, finer-grained message-passing type parallelization may be more useful. A suggested level of coarseness is at the level of running individual independent MCs for MC or for scoring GA population members. For FASTER, parallelization may be most straightforward at point where the optimal rotamers at all other positions are calculated. However, for single sequence rotamer optimization, or for small design problems, the parallelization overhead may not be worth it. In addition, even for the largest designs tried, satisfactory solutions are found within a day of wall-clock time, while smaller designs are solved within hours; very rarely will a total design solution need to be identified significantly faster than this.

In general, the master creates the required slave inputfiles, and launches foreman jobs to the slave machines by calling io.cpp: launch_command. The foreman jobs are launched to the hosts listed in AVAILABLE_PROCESSORS_FILE (inputted CPU_FILE or temp file that contains the slave host names from the command-line). io.cpp: ssh_command is used to send the remote commands. For multiprocessor systems, the foreman jobs are spawned locally by ssh_command.

Depending on the job, the foreman launches independent slave jobs on the local processor, or runs the slave process within itself. After all the slaves are completed, the foreman process will wait for some period of time (5-30 minutes) to see if new slave inputfiles are created. After the wait period, the foreman will exit. If appropriate, the master can guess if slave processes are hanging or crashed, and re-queue slave processes as needed.

There are two parallelization master functions in EGAD that monitor slave progress. parallel_egad.cpp: lookup_table_master deals with lookup table parallelization, and is described above. It launches foreman and monitors progress of slave jobs, re-queuing slave jobs as needed. Since the slave inputs have completely predictable names, there is no need for a file that lists the names of the slaves. The foreman for lookup table parallelization is parallel_egad.cpp: lookup_table_foreman. This function monitors the existence of slave inputfiles, and launches slave processes accordingly, as described above.

For scanning mutagenesis and multistate optimization, rotamer_calc_master.cpp: wait_for_slaves_to_finish monitors completion and re-queues slave jobs as needed. This function uses the slavefilelist file that lists the slave jobs of interest. This function does not actually launch the foremen processes. scanning_mutagenesis.cpp: scanning_mutagenesis calls io.cpp: launch_command. For multistate optimization, since different slave hosts are allowed to process only a distinct subset of the slave jobs, multistate.cpp: multistate_design launches foreman on its own, via io.cpp: ssh_command.

In contrast to the lookup table foreman, rotamer_calc_foreman.cpp: rotamer_calc_foreman does not actually launch independent jobs. The scheme described here avoids the expensive overhead of re-initializing the program for every rotamer-optimization. All possible rotamer backbone energies are calculated or read upfront, while rotamer pair energies are calculated or read only as needed; if calculated de novo, the energies are saved for other processes to use.

rotamer_calc_foreman opens and loops through parameters.slave_file_list_filename, a file which lists the names of the actual slave inputfiles created by the master. If a file not being worked on by another processor (slavefile.working file exists) or is not completed (slave_outputprefix.pdb exists), this foreman process opens the file and parses the VARIABLE_POSITIONS section into a dummy PROTEIN structure. When this is parsed, the function VARIABLE_POSITIONS.cpp: modify_VARIABLE_POSITIONS is not called, since it was already called during program initiation with the actual working_protein PROTEIN structure. The share_lookup_table function is called and modifies working_protein such that the in_use_flag values are set to 1 for residue CHOICEs and rotamers (in the lookupRot entry) that are relevant for the permitted sequences, and set to 0 for all others CHOICEs and rotamers. These flags are used by the optimization algorithms to limit the search only to the sequences of interest for this particular slave inputfile. Once the usage flags are set, the optimization is run, and the solution output to disk. During the optimization, rotamer pair energies that have not been loaded or calculated are calculated and saved to disk (CHROMOSOME_to_lookupEnergy.cpp: load_calc_save_lookupResX is called by energy evaluation functions as needed). The .working file is removed, and the next file read and processed. If parameters.slave_file_list_filename is deleted by the master, the foremen wait for its re-generation; after 30 minutes, they exit.

back to Table of Contents