back to Table of Contents
13. Parallelization of EGAD
jobs
13.1 Overview
Parallelization can be used by EGAD to speed up calculations. As
discussed above, parallelization is useful for lookup
table calculations. For multiple-state jobs,
parallelization is absolutely required.
EGAD follows a master-foreman-slave model. The master launches foreman
jobs on the slave hosts. The master also creates slave inputfiles. On a
slave host, a foreman process determines which slave jobs are not
finished or are not being worked on, launches a local slave process
accordingly, and marks the slave job as being worked on. After the
slave process is finished, the foreman marks the slave job as
completed, and launches the next slave job. After all the slave jobs
are completed, the foreman automatically exits. If a few slave jobs are
taking significantly longer than others (if the slave host crashed, for
example), the master will automatically re-queue the incomplete jobs.
In this manner, slaves of different speeds can all be used
simultaneously, with de facto load balancing; faster slaves get
more to do.
By default, foreman/slave processes are assigned a low priority (nice +10 level) to prevent the user from being
accused of being a CPU-hog on shared systems.
13.2 Requirements
The master and slave nodes may be on the same machine, or on different
machines. However, they must all be able to read and write to the
defined LOOKUP_TABLE_DIRECTORY. It is
highly recommended, for the reasons described above, that LOOKUP_TABLE_DIRECTORY be on a disk local to the
master node. It is also required that the slaves also be able to use
the same EGAD executable.
For clusters, the user who owns the master EGAD process must have
passwordless ssh activated between the
master and slave hosts. If the cluster has a single-system image, treat
it as a single multi-processor machine.
13.3 Inputfile options
There are a few inputfile options relevant for parallelization:
BATCH_QUEUE_PREFIX
(none)
(any command line prefixes
required for batch queuing systems; see your
sysadmin about this)
NICE_SLAVE_FLAG
1
(if 0,
the slave jobs will not be nice'd; be
careful with this option! default 1 sets
them to nice +10)
CPU_FILE
(none)
(a file that lists available
slave hosts; format below; may also be
defined in the command-line)
EXECUTABLE_FILENAME
(taken from argv[0])
(name of the EGAD executable;
extracted automatically from the
command-line, but may also be specified)
NUM_SYSTEM_CPU
0
(for > 0 for multiprocessor systems)
13.4 Launching a parallel job
Parallelization must be defined in the command-line. This prevents
accidental launching of a parallel job that would drain system
resources from multiple machines. For lookup table calculations, "lookup_table_master" must be indicated. For
other parallel jobs, "parallel" must be
indicated. In the examples listed here, "lookup_table_master"
is used; however, "parallel" must be
substituted for parallelization of other jobs such as scanning mutagenesis
or multistate
optimization.
If the CPU_FILE (described below) or NUM_SYSTEM_CPU is defined in the inputfile, the
following will suffice:
EGAD.exe inputfile
lookup_table_master
(for lookup table parallelization)
EGAD.exe inputfile parallel
(for other parallel jobs)
For multiprocessor systems, define NUM_SYSTEM_CPU
may be defined in the command-line:
EGAD.exe inputfile
lookup_table_master n cpus (for lookup table parallelization)
EGAD.exe inputfile parallel n cpus
(for other parallel jobs)
For clusters, if the CPU_FILE is not
defined in the inputfile, then the command-line must define the names
of the slave hosts. This can be done in two ways. For the first way, a CPU_FILE is defined in the command-line:
EGAD.exe inputfile
lookup_table_master cpu_file: cpu_file (for lookup table parallelization)
EGAD.exe inputfile
parallel cpu_file: cpu_file
(for other parallel jobs)
For example:
schwing 36% EGAD.exe
gb1.total_design.input lookup_table_master cpu_file:
../energy_function/cpu_list
In the second way, the hostnames may be listed in the command-line:
EGAD.exe inputfile
lookup_table_master n cpus: host(1) host(2) .... host(n)
If the master CPU is permitted to behave as a slave
during the parallel phase of the calculation (recommended, since the
master process uses very little system resources during the parallel
phase), its name should also be included in the slave processor list.
For example:
schwing 36% EGAD.exe
gb1_design.input lookup_table_master 5 cpus: doh schwing whoa excellent
doh
13.5 CPU_FILE
The CPU_FILE is simply a list of available
slave hosts. It may not contain any empty lines, and must end with a
newline.
For example (examples/energy_function/cpu_list):
cowabunga
schwing
excellent
# cpu 1
doh
excellent
# cpu 2
This file defines five slave hosts. Since excellent
has two CPUs, it can be sent two slave jobs. If the master CPU is
permitted to behave as a slave during the parallel phase of the
calculation (recommended, since the master process uses very little
system resources during the parallel phase), its name should also be
included in the slave processor list.
13.6 Implementation of parallelization for
rotamer calculations
The coarse-grained parallelization described here follows naturally
from the problems of protein design. As discussed in the lookup table
section, residues serve as natural job blocks for parallelizing lookup
table generation. For other jobs, such as scanning mutagenesis and
multistate optimization, individual rotamer optimization jobs serve as
natural job blocks. This type of parallelization is very simple to
implement, is robust, especially with respect to crashed slaves,
provides de facto load balancing between slaves of different
speeds, and can be run on any cluster of machines that can access the
same directories. This scheme scales extremely well, approaching
linearity with number of CPUs. This scheme also avoids the problems of
thread timing and process synchronization that can potentially
complicate finer-grained parallelization. On the down side, there is a
significant disk access overhead with this scheme as presently
implemented.
A potentially better implementation may involve using MPI or some other
parallelization standard to operate at this coarse-grained level. For
larger single-state sequence design problems, finer-grained
message-passing type parallelization may be more useful. A suggested
level of coarseness is at the level of running individual independent
MCs for MC or for scoring GA population members. For FASTER,
parallelization may be most straightforward at point where the optimal
rotamers at all other positions are calculated. However, for single
sequence rotamer optimization, or for small design problems, the
parallelization overhead may not be worth it. In addition, even for the
largest designs tried, satisfactory solutions are found within a day of
wall-clock time, while smaller designs are solved within
hours; very rarely will a total design solution need to be identified
significantly faster than this.
In general, the master creates the required slave inputfiles, and
launches foreman jobs to the slave machines by calling io.cpp: launch_command. The foreman jobs are
launched to the hosts listed in AVAILABLE_PROCESSORS_FILE
(inputted CPU_FILE or temp file that
contains the slave host names from the command-line). io.cpp: ssh_command is used to send the remote
commands. For multiprocessor systems, the foreman jobs are spawned
locally by ssh_command.
Depending on the job, the foreman launches independent slave jobs on
the local processor, or runs the slave process within itself. After all
the slaves are completed, the foreman process will wait for some period
of time (5-30 minutes) to see if new slave inputfiles are created.
After the wait period, the foreman will exit. If appropriate, the
master can guess if slave processes are hanging or crashed, and
re-queue slave processes as needed.
There are two parallelization master functions in EGAD that monitor
slave progress. parallel_egad.cpp:
lookup_table_master deals with lookup table parallelization, and
is described above. It launches foreman and monitors progress of slave
jobs, re-queuing slave jobs as needed. Since the slave inputs have
completely predictable names, there is no need for a file that lists
the names of the slaves. The foreman for lookup table parallelization
is parallel_egad.cpp: lookup_table_foreman.
This function monitors the existence of slave inputfiles, and launches
slave processes accordingly, as described above.
For scanning mutagenesis and multistate optimization, rotamer_calc_master.cpp: wait_for_slaves_to_finish
monitors completion and re-queues slave jobs as needed. This function
uses the slavefilelist file that lists the
slave jobs of interest. This function does not actually launch the
foremen processes. scanning_mutagenesis.cpp:
scanning_mutagenesis calls io.cpp:
launch_command. For multistate optimization, since different
slave hosts are allowed to process only a distinct subset of the slave
jobs, multistate.cpp: multistate_design
launches foreman on its own, via io.cpp:
ssh_command.
In contrast to the lookup table foreman, rotamer_calc_foreman.cpp:
rotamer_calc_foreman does not actually launch independent jobs.
The scheme described here avoids the expensive overhead of
re-initializing the program for every rotamer-optimization. All
possible rotamer backbone energies are calculated or read upfront,
while rotamer pair energies are calculated or read only as needed; if
calculated de novo, the energies are saved for other processes
to use.
rotamer_calc_foreman opens and loops
through parameters.slave_file_list_filename,
a file which lists the names of the actual slave inputfiles created by
the master. If a file not being worked on by another processor (slavefile.working file exists) or is not
completed (slave_outputprefix.pdb exists),
this foreman process opens the file and parses the VARIABLE_POSITIONS
section into a dummy PROTEIN structure.
When this is parsed, the function VARIABLE_POSITIONS.cpp:
modify_VARIABLE_POSITIONS is not called, since it was already
called during program initiation with the actual working_protein
PROTEIN structure. The share_lookup_table function is called and
modifies working_protein such that the in_use_flag values are set to 1 for residue CHOICEs
and rotamers (in the lookupRot entry) that
are relevant for the permitted sequences, and set to 0 for all others CHOICEs
and rotamers. These flags are used by the optimization algorithms to
limit the search only to the sequences of interest for this particular
slave inputfile. Once the usage flags are set, the optimization is run,
and the solution output to disk. During the optimization, rotamer pair
energies that have not been loaded or calculated are calculated and
saved to disk (CHROMOSOME_to_lookupEnergy.cpp:
load_calc_save_lookupResX is called by energy evaluation
functions as needed). The .working file is
removed, and the next file read and processed. If parameters.slave_file_list_filename
is deleted by the master, the foremen wait for its
re-generation; after 30 minutes, they exit.
back to Table of Contents