8. Replicated Validation Service



Centralized validation service which was proving to be a bottleneck while running the analytics problem for large size graph Replicated validation service was the solution. Here we present how to configure and use the replicated validation service


8.1 Sample Configuration File

Below is a typical configuration file for a replicated server. We will go through each configuration in detail. Note that each line that starts width "#" is a comment.

    # Replicated Validator Flags
    # Replicated Server Selection Policy : RANDOM, KEY_BASED
    Validator.serverIdentifyPolicy=RANDOM
    Validator.truncation.allowed=true
    Validator.truncation.scanLimit=1000
    Validator.truncation.windowSize=2000
    Validator.writeStatsToFile=false
    Validator.nodeSize=1000000 # Number of nodes in the graph
    #Validator.RPC_MECHANISM=RMI will use Java RMI for both client-side and peer interface
    Validator.RPC_MECHANISM=THRIFT
    # clientPort and serverPort are the starting port numbers, increasing with validation service replica index number
    Validator.thrift.clientPort=5000
    Validator.thrift.serverPort=6000
    Validator.clientThriftConnectionPool=20
    Validator.serverThriftConnectionPool=20
    Validator.clusteredGraph=src/TestProgs/GraphColoring/fgraph-20-10-10.pp_bkt_dir

    # Validator testing flags
    Validator.debug=false
    Validator.MSI=175
    Validator.ServerReplica=2
    Validator.rwSetLengthCount=8
    Validator.acrossCluster=0

Below is the description for config parameters used.

  1. Validator.useReplicated

    This can be set to either true or false. Setting it to true will make the Beehive system use the replicated validation service instead of a single validation service.

  2. Validator.nodeSize

    This can be set to any integer value. The replication validation service needs to know the total number of nodes in the graph to partition the graph among the replicas.

  3. Validator.serverIdentifyPolicy

    This can be set to either RANDOM or KEY_BASED. This will determine how clients choose their replicas for validation. When the policy is RANDOM, clients randomly picks a replica. When the policy is KEY_BASED, then the client will choose a replica that is responsible for a majority of the keys that are sent in the RWSet.

  4. Validator.truncation.allowed

    This can be set to either true or false. When the validation is in progress, conflict table size keeps on increasing. In order to prevent the heap space from overflowing, the conflict table is truncated periodically. This flag tells whether the table truncation should be turned on or not.

  5. Validator.truncation.scanLimit

    This can be set to any integer value; by default it is set to 1000. The truncation thread takes the entries from the conflict table and loops through it. The scanLimit tells the thread to scan through that many entries before making the truncation timestamp to proceed.

  6. Validator.truncation.windowSize

    This can be set to any integer value; by default it is set to 2000. Truncation window size tells the truncator to remove all those entries in the conflict table whose commit timestamp is than the stable timestamp by more than the window size specified.

  7. Validator.debug

    This can be set to either true or false. Developer can turn debugger on or off to monitor the print traces to the console every 10 seconds.

  8. Validator.writeStatsToFile

    This can be set to either true or false. To capture the commit and abort related details, this flag can be turned on or off. It helps to gather statistics regarding total number of aborts (due to conflict, locked or truncation) and total number of commit performed by each of the validation service replica.

  9. Validator.RPC_MECHANISM

    This can be set to either THRIFT or RMI. The communication between replicated validation servers and beehive client can happen either using apache thrift or Java Rmi

  10. Validator.thrift.clientPort

    When the RPC mechanism is set to Thrift, client port tells the port number at which replicated validation service will listen to the client validation request

  11. Validator.thrift.serverPort

    When the RPC mechanism is set to Thrift, server port tells the port number at which replicated validation service will listen to the other server validation request. This MUST be different than the Validator.thrift.clientPort

  12. Validator.clientThriftConnectionPool

    This can be set to any integer value; by default it is set to 20. It tells the total thrift connection thread handle which will be created and reused by the beehive client to make validation request to service replica.

  13. Validator.serverThriftConnectionPool

    This can be set to any integer value; by default it is set to 20 It tells the total thrift connection thread handle which will be created and reused by the validation service replica to make calls to its peer service replica.

  14. Validator.clusteredGraph

    It tells the path to the mettis file so as to partition the data based on the cluster partitioning done by the metis.
    For example, the metadata will contain the following information:
    clusterID from to
    0 0 3
    1 4 7
    2 8 11
    3 12 15
    4 16 19
    So, say we have 2 replicated servers, first three ranges will belong to the replica 1 and last 2 range will belong to replica 2.

8.2 How to Use the Replicated Service

The following is an example on how to run a replicated validation service using the Shortest Path problem:

  1. Choosing the nodes to run the replicated validation service:

    There are three entities which gets started for replicated validation:

    1. STS Manager
    2. CTS Manager
    3. Replicated Server

    The three entities above will run on the corresponding hosts specified in a file that will be passed as an argument to run-replicated-validator-cluster.sh. The following is an example of such a file:

    cs-spatial-301.cs.umn.edu cs-spatial-302.cs.umn.edu cs-spatial-303.cs.umn.edu cs-spatial-304.cs.umn.edu

    The file is configured such that the STS Manager runs on the first host, cs-spatial-301.cs.umn.edu, the CTS Manager on the second, cs-spatial-302.cs.umn.edu, and the Replicated Servers on the remaining; in this scenario, we have two Replicated Servers running on cs-spatial-303.cs.umn.edu and cs-spatial-304.cs.umn.edu.

    Note that all of the host names can be identical if one chooses to run all three entities on the same host.


  2. Modifying the configuration file

    In order to run the replicated validation service, one must make the following changes to the configuration file:

    • Set the Validator.useReplicated value to true
    • Set the Validator.nodeSize to the number of nodes in the graph.

  3. Running the Replicated Server

    To run the Replicated Server, run the following commands:

    1. clear_my_java.sh validation-nodes C3; #Ce is short of CertifierReplica
    2. run-replicated-validator-cluster.sh validation-nodes;

    The first command will stop any replicated validation service running on the hosts provided in validation-nodes. The second command will start the replicated validation service.


  4. Running the application program command:

    To start the application program use the corresponding scripts. Below is an example of how to start the Graph Coloring application:

    1. clear_my_java.sh 4-nodes GC;
    2. run-command-cluster.sh 4-nodes;

    The first command will stop any Graph Coloring applications running on the hosts provided in 4-nodes. The second command will start the Graph Coloring application provided that you have configured it to use the file containing the host names of the replicated servers, e.g., java -Xmx4096m TestProgs.GraphColoring 4-nodes configFile validation-nodes TestCases/input-100-10-20 100 configFile 1 99 2>&1 &

8.3 Bypassing the CTS Server

In some instances, running a replicated server may not be necessary, but one still wants to lighten the load on the Validator. That is, instead of having the validator be the point of contact for both validation and retrieving the latest STS, its responsbilities are split between a validator server and STS server. As a result, the validator is only responsbile for validation and the STS server is only responsible for maintaining the STS. This can be achieved by bypassing the CTS server. To do so:

  1. Add/configure the following option in the configFile: Validator.twoConfig=true.
  2. Within the files containing the hostnames of the replicated serveres, e.g., validation-nodes, ensure that there are only three hostnames where the first corresponds to the STS server and the second and third corresponds to the validation server. For example: cs-spatial-301.cs.umn.edu cs-spatial-302.cs.umn.edu cs-spatial-302.cs.umn.edu