Replicated Validation

Motivation

Centralized validation service which was proving to be a bottleneck while running the analytics problem for large size graph Replicated validation service was the solution. Here we present how to configure and use the replicated validation service

Sample Configuration File

Below is a typical configuration file for a replicated server. We will go through each configuration in detail. Note that each line that starts width "#" is a comment.

# Replicated Validator Flags
# Replicated Server Selection Policy : RANDOM, KEY_BASED
Validator.serverIdentifyPolicy=RANDOM
Validator.truncation.allowed=true
Validator.truncation.scanLimit=1000
Validator.truncation.windowSize=2000
Validator.writeStatsToFile=false
Validator.nodeSize=1000000 # Number of nodes in the graph
#Validator.RPC_MECHANISM=RMI will use Java RMI for both client-side and peer interface
Validator.RPC_MECHANISM=THRIFT
# clientPort and serverPort are the starting port numbers, increasing with validation service replica index number
Validator.thrift.clientPort=5000
Validator.thrift.serverPort=6000
Validator.clientThriftConnectionPool=20
Validator.serverThriftConnectionPool=20
Validator.clusteredGraph=src/TestProgs/GraphColoring/fgraph-20-10-10.pp_bkt_dir

# Validator testing flags
Validator.debug=false
Validator.MSI=175
Validator.ServerReplica=2
Validator.rwSetLengthCount=8
Validator.acrossCluster=0

Below is the description for config parameters used.

Validator.useReplicated

This can be set to either true or false. Setting it to true will make the Beehive system use the replicated validation service instead of a single validation service.

Validator.nodeSize

This can be set to any integer value. The replication validation service needs to know the total number of nodes in the graph to partition the graph among the replicas.

Validator.serverIdentifyPolicy

This can be set to either RANDOM or KEY_BASED. This will determine how clients choose their replicas for validation. When the policy is RANDOM, clients randomly picks a replica. When the policy is KEY_BASED, then the client will choose a replica that is responsible for a majority of the keys that are sent in the RWSet.

Validator.truncation.allowed

This can be set to either true or false. When the validation is in progress, conflict table size keeps on increasing. In order to prevent the heap space from overflowing, the conflict table is truncated periodically. This flag tells whether the table truncation should be turned on or not.

Validator.truncation.scanLimit

This can be set to any integer value; by default it is set to 1000. The truncation thread takes the entries from the conflict table and loops through it. The scanLimit tells the thread to scan through that many entries before making the truncation timestamp to proceed.

Validator.truncation.windowSize

This can be set to any integer value; by default it is set to 2000. Truncation window size tells the truncator to remove all those entries in the conflict table whose commit timestamp is than the stable timestamp by more than the window size specified.

Validator.debug

This can be set to either true or false. Developer can turn debugger on or off to monitor the print traces to the console every 10 seconds.

Validator.writeStatsToFile

This can be set to either true or false. To capture the commit and abort related details, this flag can be turned on or off. It helps to gather statistics regarding total number of aborts (due to conflict, locked or truncation) and total number of commit performed by each of the validation service replica.

Validator.RPC_MECHANISM

This can be set to either THRIFT or RMI. The communication between replicated validation servers and beehive client can happen either using apache thrift or Java Rmi

Validator.thrift.clientPort

When the RPC mechanism is set to Thrift, client port tells the port number at which replicated validation service will listen to the client validation request

Validator.thrift.serverPort

When the RPC mechanism is set to Thrift, server port tells the port number at which replicated validation service will listen to the other server validation request. This MUST be different than the Validator.thrift.clientPort

Validator.clientThriftConnectionPool

This can be set to any integer value; by default it is set to 20. It tells the total thrift connection thread handle which will be created and reused by the beehive client to make validation request to service replica.

Validator.serverThriftConnectionPool

This can be set to any integer value; by default it is set to 20 It tells the total thrift connection thread handle which will be created and reused by the validation service replica to make calls to its peer service replica.

Validator.clusteredGraph

It tells the path to the mettis file so as to partition the data based on the cluster partitioning done by the metis. For example, the metadata will contain the following information:

clusterID	from	to
0	0	3
1	4	7
2	8	11
3	12	15
4	16	19

So, say we have 2 replicated servers, first three ranges will belong to the replica 1 and last 2 range will belong to replica 2.

How to Use the Replicated Service

Choosing the nodes to run the Replicated Validation Service

There are three entities which gets started for replicated validation:

STS Manager
CTS Manager
Replicated Server

The three entities above will run on the corresponding hosts specified in a file that will be passed as an argument to run-replicated-validator-cluster.sh. The following is an example of such a file:

cs-spatial-301.cs.umn.edu
cs-spatial-302.cs.umn.edu
cs-spatial-303.cs.umn.edu
cs-spatial-304.cs.umn.edu

The file is configured such that the STS Manager runs on the first host, cs-spatial-301.cs.umn.edu, the CTS Manager on the second, cs-spatial-302.cs.umn.edu, and the Replicated Servers on the remaining; in this scenario, we have two Replicated Servers running on cs-spatial-303.cs.umn.edu and cs-spatial-304.cs.umn.edu. Note that all of the host names can be identical if one chooses to run all three entities on the same host.

Modifying the Configuration File

In order to run the replicated validation service, one must make the following changes to the configuration file:

Set the Validator.useReplicated value to true
Set the Validator.nodeSize to the number of nodes in the graph.

Running the Replicated Server

To run the Replicated Server, run the following commands:

clear_my_java.sh validation-nodes C3 #Ce is short of CertifierReplica
run-replicated-validator-cluster.sh validation-nodes

The first command will stop any replicated validation service running on the hosts provided in validation-nodes. The second command will start the replicated validation service.

Running the Application Program Command

To start the application program use the corresponding scripts. Below is an example of how to start the Graph Coloring application:

clear_my_java.sh 4-nodes GC
run-command-cluster.sh 4-nodes

The first command will stop any Graph Coloring applications running on the hosts provided in 4-nodes. The second command will start the Graph Coloring application provided that you have configured it to use the file containing the host names of the replicated servers, e.g.,

java -Xmx4096m TestProgs.GraphColoring \
  4-nodes \
  configFile \
  validation-nodes \
  TestCases/input-100-10-20 100 \
  configFile \
  1
  99 \
  2>&1 \
  &

Bypassing the CTS Server

In some instances, running a replicated server may not be necessary, but one still wants to lighten the load on the validator. That is, instead of having the validator be the point of contact for both validation and retrieving the latest STS, its responsibilities are split between a validator server and STS server. As a result, the validator is only responsible for validation and the STS server is only responsible for maintaining the STS. This can be achieved by bypassing the CTS server. To do so:

Add/configure the following option in the configFile:
```
Validator.twoConfig=true
```
Within the files containing the hostnames of the replicated serveres, e.g., validation-nodes, ensure that there are only three hostnames where the first corresponds to the STS server and the second and third corresponds to the validation server. For example:
```
cs-spatial-301.cs.umn.edu
cs-spatial-302.cs.umn.edu
cs-spatial-302.cs.umn.edu
```