Replicated Validation
Motivation
Centralized validation service which was proving to be a bottleneck while running the analytics problem for large size graph Replicated validation service was the solution. Here we present how to configure and use the replicated validation service
Sample Configuration File
Below is a typical configuration file for a replicated server. We will go through each configuration in detail. Note that each line that starts width "#" is a comment.
# Replicated Validator Flags # Replicated Server Selection Policy : RANDOM, KEY_BASED Validator.serverIdentifyPolicy=RANDOM Validator.truncation.allowed=true Validator.truncation.scanLimit=1000 Validator.truncation.windowSize=2000 Validator.writeStatsToFile=false Validator.nodeSize=1000000 # Number of nodes in the graph #Validator.RPC_MECHANISM=RMI will use Java RMI for both client-side and peer interface Validator.RPC_MECHANISM=THRIFT # clientPort and serverPort are the starting port numbers, increasing with validation service replica index number Validator.thrift.clientPort=5000 Validator.thrift.serverPort=6000 Validator.clientThriftConnectionPool=20 Validator.serverThriftConnectionPool=20 Validator.clusteredGraph=src/TestProgs/GraphColoring/fgraph-20-10-10.pp_bkt_dir # Validator testing flags Validator.debug=false Validator.MSI=175 Validator.ServerReplica=2 Validator.rwSetLengthCount=8 Validator.acrossCluster=0
Below is the description for config parameters used.
- Validator.useReplicated
- This can be set to either true or false. Setting it to true will make the Beehive system use the replicated validation service instead of a single validation service.
- Validator.nodeSize
- This can be set to any integer value. The replication validation service needs to know the total number of nodes in the graph to partition the graph among the replicas.
- Validator.serverIdentifyPolicy
- This can be set to either RANDOM or KEY_BASED. This will determine how clients choose their replicas for validation. When the policy is RANDOM, clients randomly picks a replica. When the policy is KEY_BASED, then the client will choose a replica that is responsible for a majority of the keys that are sent in the RWSet.
- Validator.truncation.allowed
- This can be set to either true or false. When the validation is in progress, conflict table size keeps on increasing. In order to prevent the heap space from overflowing, the conflict table is truncated periodically. This flag tells whether the table truncation should be turned on or not.
- Validator.truncation.scanLimit
- This can be set to any integer value; by default it is set to 1000. The truncation thread takes the entries from the conflict table and loops through it. The scanLimit tells the thread to scan through that many entries before making the truncation timestamp to proceed.
- Validator.truncation.windowSize
- This can be set to any integer value; by default it is set to 2000. Truncation window size tells the truncator to remove all those entries in the conflict table whose commit timestamp is than the stable timestamp by more than the window size specified.
- Validator.debug
- This can be set to either true or false. Developer can turn debugger on or off to monitor the print traces to the console every 10 seconds.
- Validator.writeStatsToFile
- This can be set to either true or false. To capture the commit and abort related details, this flag can be turned on or off. It helps to gather statistics regarding total number of aborts (due to conflict, locked or truncation) and total number of commit performed by each of the validation service replica.
- Validator.RPC_MECHANISM
- This can be set to either THRIFT or RMI. The communication between replicated validation servers and beehive client can happen either using apache thrift or Java Rmi
- Validator.thrift.clientPort
- When the RPC mechanism is set to Thrift, client port tells the port number at which replicated validation service will listen to the client validation request
- Validator.thrift.serverPort
- When the RPC mechanism is set to Thrift, server port tells the port number at which replicated validation service will listen to the other server validation request. This MUST be different than the Validator.thrift.clientPort
- Validator.clientThriftConnectionPool
- This can be set to any integer value; by default it is set to 20. It tells the total thrift connection thread handle which will be created and reused by the beehive client to make validation request to service replica.
- Validator.serverThriftConnectionPool
- This can be set to any integer value; by default it is set to 20 It tells the total thrift connection thread handle which will be created and reused by the validation service replica to make calls to its peer service replica.
- Validator.clusteredGraph
-
It tells the path to the mettis file so as to partition the data based on the cluster partitioning done by the metis. For example, the metadata will contain the following information:
clusterID from to 0 0 3 1 4 7 2 8 11 3 12 15 4 16 19 So, say we have 2 replicated servers, first three ranges will belong to the replica 1 and last 2 range will belong to replica 2.
How to Use the Replicated Service
Choosing the nodes to run the Replicated Validation Service
There are three entities which gets started for replicated validation:
- STS Manager
- CTS Manager
- Replicated Server
The three entities above will run on the corresponding hosts specified in a file that will be passed as an argument to run-replicated-validator-cluster.sh. The following is an example of such a file:
cs-spatial-301.cs.umn.edu cs-spatial-302.cs.umn.edu cs-spatial-303.cs.umn.edu cs-spatial-304.cs.umn.edu
The file is configured such that the STS Manager runs on the first host, cs-spatial-301.cs.umn.edu, the CTS Manager on the second, cs-spatial-302.cs.umn.edu, and the Replicated Servers on the remaining; in this scenario, we have two Replicated Servers running on cs-spatial-303.cs.umn.edu and cs-spatial-304.cs.umn.edu. Note that all of the host names can be identical if one chooses to run all three entities on the same host.
Modifying the Configuration File
In order to run the replicated validation service, one must make the following changes to the configuration file:
- Set the Validator.useReplicated value to true
- Set the Validator.nodeSize to the number of nodes in the graph.
Running the Replicated Server
To run the Replicated Server, run the following commands:
clear_my_java.sh validation-nodes C3 #Ce is short of CertifierReplica
run-replicated-validator-cluster.sh validation-nodes
The first command will stop any replicated validation service running on the hosts provided in validation-nodes. The second command will start the replicated validation service.
Running the Application Program Command
To start the application program use the corresponding scripts. Below is an example of how to start the Graph Coloring application:
clear_my_java.sh 4-nodes GC run-command-cluster.sh 4-nodes
The first command will stop any Graph Coloring applications running on the hosts provided in 4-nodes. The second command will start the Graph Coloring application provided that you have configured it to use the file containing the host names of the replicated servers, e.g.,
java -Xmx4096m TestProgs.GraphColoring \ 4-nodes \ configFile \ validation-nodes \ TestCases/input-100-10-20 100 \ configFile \ 1 99 \ 2>&1 \ &
Bypassing the CTS Server
In some instances, running a replicated server may not be necessary, but one still wants to lighten the load on the validator. That is, instead of having the validator be the point of contact for both validation and retrieving the latest STS, its responsibilities are split between a validator server and STS server. As a result, the validator is only responsible for validation and the STS server is only responsible for maintaining the STS. This can be achieved by bypassing the CTS server. To do so:
-
Add/configure the following option in the configFile:
Validator.twoConfig=true
-
Within the files containing the hostnames of the replicated serveres, e.g., validation-nodes, ensure that there are only three hostnames where the first corresponds to the STS server and the second and third corresponds to the validation server. For example:
cs-spatial-301.cs.umn.edu cs-spatial-302.cs.umn.edu cs-spatial-302.cs.umn.edu