Fine-grain Remote Data Access

Motivation

Fetching an entire node object hosted on a remote Beehive server can be expensive in terms of communication costs. The NodeData class was briefly introduced in the apploader. Here we discuss the remote, fine-grain operations provided by Beehive to cut down communication costs. We begin our discussion the motivation and an example of how to fetch specific properties of a remote node without having to fetch the entire node from a remote server. We follow up with updating specific properties of a node without having to write it back to the system. Next, we will discuss the ability to invoke node specific methods on a remote server. We end our discussion with batching primitives to further cut down communication costs.

Fetching node properties

Depending on the domain, some applications only need a subset of properties of a given node. Take the graph coloring problem for example. Each node maintains information about whether or not is has been marked and if so, its color. In graph coloring, if a node has been marked, there is no need to color it. Although one can fetch the entire node to determine whether or not it has been marked, it can be expensive since an entire node object is being sent across the network. With NodeData, Beehive allows application programmers to fetch specific properties of a node without having to send the entire object across the network. For example, continuing with the graph coloring problem, one can fetch the value of marked which is a boolean -- thus resulting in a smaller package being sent over the network. Beehive's StorageSystem provides this primitive through StorageSystem::getNodeProperties. StorageSystem::getNodeProperties utilizes the NodeData class as an argument. The application developer is responsible for filling in the NodeData with the desired properties with NodeData::addProperty which is discussed in the AppLoader section. Using the graph coloring problem as an example:

// Assume nodeId is given. (byte) 1 is the unique byte identifier of the
// Node class we are fetching properties for. This is discussed in
// the apploader.
NodeData getMarked = new NodeData(nodeId, (byte) 1);

// Specifies which property to fetch
getMarked.addProperty("marked", null);

// Fetches the results from the node on the remote server and waits
// for a response
NodeData results = storageSystem.getNodeProperties(getMarked);

// Retrieves the value of the property on the remote server
boolean marked = results.getProperty("marked").booleanValue();

Remotely Updating node properties

Similarly to how one can get specific properties of a node without having to fetch the remote node, one can remotely update specific properties of a node. Again, this reduces network costs since a node does not have to be sent across the network. Beehive achieves this with the updateNodeProperties primitive and the use of the NodeData class. Going with the graph coloring problem, suppose we want to mark an unmarked node, then:

NodeData updateMarked = new NodeData(nodeId, (byte) 1);
updateMarked.addProperty("marked", new Boolean(true));
storageSystem.updateNodeProperties(updateMarked);

Remotely Invoking node methods

Our third topic is how one can invoke specific methods of a node on the server it is hosted on and retrieve the results. Beehive achieves this with the exec primitive and the use of the NodeData class. Consider the K-Nearest-Neighbor problem where each node maintains a collection of its K nearest neighbors. If one already has the node's id and the ids of those that are to be added/ removed to the collection, then instead of fetching the node from the remote server and writing it back, one can use exec to reduce network traffic. The following is an example:

List nbrsToAdd = new ArrayList<>();

// assume nbrsToAdd is populated with nodeIds for the sake of the example
NodeData addNbrs = new NodeData(nodeId, (byte) 1);
addNbrs.addProperty("addNeighbors", nbrsToAdd);
NodeData results = storageSystem.exec(addNbrs);

Batching Primitives

We can further cut down communication costs by accumulating a batch of fine-grained remote operations and then make one call to obtain the results for each node in the batch -- this avoid having to make remote operations for each node. Suppose in the graph coloring problem, one wants to get the marked property of multiple nodes. Then, one can use Beehive's batching primitive:

List getMarked = new ArrayList<>();

// Three nodes that we want to get the value of "marked" for
NodeData getA = new NodeData(nodeId_A, (byte) 1);
NodeData getB = new NodeData(nodeId_B, (byte) 1);
NodeData getC = new NodeData(nodeId_C, (byte) 1);

// Configuring the respective NodeDatas
getA.addProperty("marked", null);
getB.addProperty("marked", null);
getC.addProperty("marked", null);

// Adding the NodeDatas to the batch
getMarked.add(updateA);
getMarked.add(updateB);
getMarked.add(updateC);

// Retrieves the marked value for each node in the batch without having
// to make a getNodeProperties for each node.
List results = storageSystem.getNodePropertiesParallel(getMarked);

// The ordering of the NodeData results may not necessarily be the same
// ordering as the one being passed in, i.e. getMarked. For example,
// the following may get the "marked" value for nodeId_C.
boolean marked = results.get(0).getProperty("marked").booleanValue();

The batching primitives exists for StorageSystem::updateNodeProperties and StorageSystem::exec, they are StorageSystem::updateNodePropertiesParallel and StorageSystem::execParallel respectively.