2. Environment Configuration
- 2.1 Codebase and TestProg Directory Structure
- 2.2 Bash Environment Variables
- 2.3 Command Scripts for Executing an Application Program
Beehive system code base and applications are contained in two directores.The following example is for codebase version 3.9.3
$BEEHIVE_HOME should be set to the path to the Beehive directory containing two
directories, one for the codebase and the other for application programs, as shown below:
. |__ Beehive |-- BeehiveCodeBase | |__ BeehiveCodeBase-V3.9.3 | |__ src | |-- HashTable.thrift | |-- schema.thrift | |-- beehive | | |-- util | | |-- server | | |-- thrift | | |-- validationService | | |-- workpool | | |__ worker | | | |__ lib | |__ <Dependencies for Thrift> | |__ BeehiveTestProgs |__ BeehiveTestProgs-V3.9.3 |__ src |__ TestProgs |-- GraphColoring |-- ShortestPath |-- MaxClique |-- <Various application program examples> |__ util |__ BeehiveAppLoader.java
The following environment variables should be set for bash, as shown below:
export BEEHIVE_CODEBASE_VERSION="BeehiveCodeBase-V3.9.3" export BEEHIVE_TESTPROGS_VERSION="BeehiveTestProgs-V3.9.3" export BEEHIVE_CODEBASE="$BEEHIVE_HOME/Beehive/BeehiveCodeBase/$BEEHIVE_CODEBASE_VERSION/src" export BEEHVE_TESTPROGS_SRC="BEEHIVE_HOME/Beehive/BeehiveTestProgs/$TESTPROGS_VERSION/src" export BEEHIVE_TESTPROGS="$BEEHIVE_TESTPROGS_SRC/TestProgs" export THRIFT_JARS="$BEEHIVE_CODEBASE/lib/*"
For various example graphs for testing, the
GRAPH_HOME should be set appropriately
as shown in the example below:
export GRAPH_HOME="/project/cluster16/GraphGen" export GRAPH_FGRAPH="GRAPH_HOME/F"
The Java classpath should be set as follows:
CLASSPATH=".:$BEEHIVE_CODEBASE:$BEEHIVE_TESTPROGS_SRC:$GRAPH_FGRAPH:$GRAPH_HOME:$THRIFT_JARS:$CLASSPATH" export PATH=".:$JAVA_HOME:$THRIFT_JARS:$PATH"
An application program is launched using three command script programs. These programs are executed in the example code directory. For example, for the GraphColoring problem, these script programs will be executed in the following directory
We need to first create a file containing the list of the cluster nodes on which we want to execute the parallel program. Suppose that we want to execute a program on a cluster of four nodes, then we create a file, say named 4-nodes, containing the hostnames of the nodes as follows:
Example hostlist file::4-nodes nuclear01.cs.umn.edu nuclear02.cs.umn.edu nuclear03.cs.umn.edu nuclear04.cs.umn.edu
For the GraphColoring problem, a Java program called
GCTest.java will be
executed on each node in the cluster.
The details about developing a parallel program, such as
GCTest.java will be
discussed in the following chapters.
Before we launch the execution, there are several other steps that need to be performed as shown below:
- Create a configuration file, say named configFile, and store it in the
application program's directory, e.g.
GraphColoring. The details of preparing the configFile are given in the next chapter.
Start the ValidationService on a dedicated host. Suppose that we will execute it on
a host named
jupiter.cs.umn.edu. The details are given below:
- Log onto the host running the ValidationService and make sure the bash
environment variables are correctly set. Execute the file called
startValidator which contains the following command:
java beehive.workpool.GlobalWorkpoolImpl configFile
- Make sure the
ValidationServiceis running, as it will printout the configFile
- Log onto the host running the ValidationService and make sure the bash environment variables are correctly set. Execute the file called startValidator which contains the following command:
There are three script programs which will launch the exection of the GraphColoring parallel program:
The program exectuin will be launched using
run-command-cluster which will then
start-GC and then in turn,
Before you launch the program, you need to follow the steps detailed below:
Step 1: Edit runGC
The structure of this file is shown below:
File:: runGC java -Xms4096m -Xmx8192m -XX:+UseG1GC TestProgs.GraphColoring.GCTest 4-nodes configFile jupiter.cs.umn.edu $GRAPH_FGRAPH/fgraph-50000-100-100.pp 50000 configFile 2>&1 &
This command given in the runGC will be executed on each of the cluster nodes when the program
execution is launched. There several options given to JVM related to intial and max memory and
the garbage collected to be used. These options play a critical role when execute a program on
large data sets. The command also specifies the input graph to be used, which in this case is
fgraph-50000-100-100 (50K node graph). The hostlist filename (4-nodes in this example)
is given as an argument to the program. The hostname for the ValdiationService, which is
jupiter.cs.umn.edu in this example, is also give as one of the argument to the GCTest program.
Step 2: Edit start-GC.sh
The structure of
start-GC.sh is shown below:
File:: start-GC.sh # This script is run on each cluster host machine in order to setup a beehive node on that machine. # To be used with run_command_cluster.sh to setup beehive nodes on multiple machines # This starts the Worker processes on nodes dir="$BEEHIVE_TESTPROGS/GraphColoring" #modify to change the directory cmd="./runGC" #tcsh # terminate any running rmiregistry pgrep rmiregistry | xargs kill -9 ps -ef | grep java | grep GC | tr -s ' ' | cut -f2 -d' ' | xargs kill -9 # run the command to start beehive process cd $dir $cmd
You need to make sure that the variables dir points to the
program directory and cmd is set to
Step 3: Edit run-command-cluster_GC.sh
The structure of this shell command file is shown below. You need to make sure that the variable
script_dir is correctly set to the
#!/usr/bin/env bash # $1 - node lists file script_dir="$BEEHIVE_TESTPROGS/GraphColoring" for node in `cat $1` do echo "running command on $node" ssh $node "sh $script_dir/start-GC.sh &" & done
Step 4: Launch Parallel Program Execution
You are no ready to start the parallel exectuin of the
GraphColoring program on
a 4-node cluster. Execute the following command in the
Step 5: Terminating Program Execution
In case you want to terminate the program execution due to some errors or other reasons, execute the following command:
clear_my_java.sh 4-nodes GC
The first argument is the hostlist file and the second argument is a unique string appearing in
the program name which happens to be
GCTest.java in this example.