Environment Configuration and Command Scripts

Codebase and TestProgr Directory Structure

$BEEHIVE_HOME should be set to the path to the Beehive directory containing two directories, one for the codebase and the other for application programs, as shown below:

.
├── BeehiveCodeBase
│   └── Beehive
│       ├── lib
│       └── src
│           ├── beehive
│           │   ├── server
│           │   ├── thrift
│           │   ├── util
│           │   ├── validation
│           │   ├── validationService
│           │   ├── worker
│           │   └── workpool
│           ├── HashTable.thrift
│           └── schema.thrift
└── BeehiveTestProgs
    └── BeehivePrograms
        └── src
            └── TestProgs
                ├── GraphColoring
                ├── MaxClique
                ├── ShortestPath
                └── util
                    └── BeehiveAppLoader.java

Bash Environment Variables

The following environment variable should be set for bash, as shown below:

export BEEHIVE_HOME=""
export BEEHIVE_CODEBASE_VERSION="Beehive"
export BEEHIVE_TESTPROGS_VERSION="BeehivePrograms"
export BEEHIVE_CODEBASE="$BEEHIVE_HOME/Beehive/BeehiveCodeBase/$BEEHIVE_CODEBASE_VERSION/src"
export BEEHVE_TESTPROGS_SRC="BEEHIVE_HOME/Beehive/BeehiveTestProgs/$TESTPROGS_VERSION/src"
export BEEHIVE_TESTPROGS="$BEEHIVE_TESTPROGS_SRC/TestProgs"
export THRIFT_JARS="$BEEHIVE_CODEBASE/lib/*"

For various example graphs for testing, the GRAPH_HOME should be set appropriately as shown in the example below:

export GRAPH_HOME="/project/cluster16/GraphGen"
export GRAPH_FGRAPH="$GRAPH_HOME/Fgraph"

The Java classpath should be set as follows:

CLASSPATH=".:$BEEHIVE_CODEBASE \
            :$BEEHIVE_TESTPROGS_SRC \
            :$GRAPH_FGRAPH \
            :$GRAPH_HOME \
            :$THRIFT_JARS \
            :$CLASSPATH"
export PATH=".:$THRIFT_JARS:$PATH"

Command Scripts for Executing an Application Program

An application program is launched using three command script programs. These programs are executed in the example code directory. For example, for the GraphColoring problem, these script programs will be executed in the following directory.

$BEEHIVE_HOME/BeehiveTestProgs/BeehiveTestProgs-V3.9.3/src/TestProgs/GraphColoring/

We need to first create a file containing the list of the cluster nodes on which we want to execute the parallel program. Suppose that we want to execute a program on a cluster of four nodes, then we create a file, say named 4-nodes, containing the hostnames of the nodes as follows:

Example hostlist file::4-nodes
nuclear01.cs.umn.edu
nuclear02.cs.umn.edu
nuclear03.cs.umn.edu
nuclear04.cs.umn.edu

For the GraphColoring problem, a Java program called GCTest.java will be executed on each node in the cluster. The details about developing a parallel program, such as GCTest.java will be discussed in the following chapters.

Before we launch the execution, there are several other steps that need to be performed as shown below:

Create a configuration file, say named configFile, and store it in the application program's directory, e.g. GraphColoring. The details of preparing the configFile are given in the next chapter.
Start the ValidationService on a dedicated host. Suppose that we will execute it on a host named jupiter.cs.umn.edu. The details are given below:
1. Log onto the host running the ValidationService and make sure the bash environment variables are correctly set. Execute the file called startValidator which contains the following command:
```
java beehive.validation.GlobalValidationService configFile
```
2. Make sure the ValidationService is running, as it will printout the configFile

Executing an Application Program

There are four important scripts which will launch the execution of the GraphColoring parallel program:

runGC
start-GC.sh
run-command-cluster.sh
clear_my_java.sh

`runGC`

The structure of this file is shown below:

java -Xms4096m -Xmx8192m -XX:+UseG1GC TestProgs.GraphColoring.GCTest \
4-nodes \
configFile \
jupiter.cs.umn.edu
$GRAPH_FGRAPH/fgraph-50000-100-100.pp \
50000 \
configFile \
2>&1 &

This command given in the runGC will be executed on each of the cluster nodes when the program execution is launched. There several options given to JVM related to intial and max memory and the garbage collected to be used. These options play a critical role when execute a program on large data sets. The command also specifies the input graph to be used, which in this case is fgraph-50000-100-100 (50K node graph). The hostlist filename (4-nodes in this example) is given as an argument to the program. The hostname for the ValidationService, which is jupiter.cs.umn.edu in this example, is also give as one of the argument to the GCTest program.

`start-GC.sh`

The structure of start-GC.sh is shown below:

File:: start-GC.sh
# This script is run on each cluster host machine in order to setup a beehive node on that machine.
# To be used with run_command_cluster.sh to setup beehive nodes on multiple machines
# This starts the Worker processes on nodes
dir="$BEEHIVE_TESTPROGS/GraphColoring" #modify to change the directory
cmd="./runGC"
#tcsh
# terminate any running rmiregistry
pgrep rmiregistry | xargs kill -9
ps -ef | grep java | grep GC | tr -s ' ' | cut -f2 -d' ' | xargs kill -9
# run the command to start beehive process
cd $dir
$cmd

You need to make sure that the variables dir points to the GraphColoring program directory and cmd is set to runGC

`run-command-cluster_GC.sh`

The structure of this shell command file is shown below. You need to make sure that the variable script_dir is correctly set to the GraphColoring directory.

#!/usr/bin/env bash
# $1 - node lists file
script_dir="$BEEHIVE_TESTPROGS/GraphColoring"
for node in `cat $1`
do
    echo "running command on $node"
    ssh $node "sh $script_dir/start-GC.sh &" &
done

You are now ready to start the parallel execution of the GraphColoring program on a 4-node cluster. Execute the following command in the GraphColoring directory:

run-command-cluster_GC.sh 4-nodes

`clear_my_java.sh`

In case you want to terminate the program execution due to some errors or other reasons, execute the following command:

clear_my_java.sh 4-nodes GC

The first argument is the hostlist file and the second argument is a unique string appearing in the program name which happens to be GCTest.java in this example.