4. Core Classes for Application Programming



Here we describe the core classes that programmers will need to be familiar with for application programming with Beehive.


4.1 Node

Beehive provides a Java class called Node which represents a graph vertex in a graph. Each Node in Beehive maintains the following information:

Programmers can extend this class in order to add application-specific functionality by writing a subclass of Node. For example, consider the GraphColoring problem which requires each Node to maintain its color and whether or not it has been marked. A possible subclass then, would be GCNode where additional fields and behaviors can be defined.

        public GCNode extends Node implements Exernalizable {
            boolean marked;
            int color;
            // Auxiliary methods
        }
    

It is important to note that Node is Externalizable since they can be retrieved from a remote server or created on one server but deposited on another. The fields that need to persist across the network must therefore be saved and retrieved in the writeExternal and readExternal methods respectively. Using the GCNode as an example:

        public void writeExternal(ObjectOutput out) throws IOException {
            super.writeExternal(out);           // Required
            out.writeBoolean(this.marked);
            out.writeInt(this.color);
        }

        public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
            super.readExternal(in);             // Required
            this.marked = in.readBoolean();
            this.color = in.readInt();
        }
    

Note that the fields provided by the Node class are not automatically saved and retrieved, thus a call to super is required.

It is also important to note that Beehive stores all graph vertices as Node objects so that several subclasses of Nodes can exist in the StorageSystem. Therefore, all node retrievals are returned as a Node object. It is up to the programmer to type-check the nodes and make the appropriate type-cast(s) if necessary.


4.2 NodeData

Beehive provides a NodeData class which serves for fine-grain operations within the application. The NodeData class provides the capability to query and update specific properties of a Node without having to retrieve the entire Node object from the StorageSystem. This reduces communication costs when subclasses of Node are complex and one only needs a subset of properties. The NodeData achieves this by maintaining the following:

An example of NodeData usage can be found in Chapter 6


4.3 Edges

Beehive provides an Edge class to represent the connections between graph vertices. The Beehive Edge class maintains no information. Thus, using the base Edge class serves no purpose other than representing a connection between any two nodes. Programmers can extend this class in order to add application-specific functionality by writing a subclass of Edge. For example, in the Single Source ShortestPath problem, edges maintain the weight/distance between two adjacent nodes. Thus, a possible subclass would be SPEdge where additional fields and behaviors can be defined.

        public class SPEdge extends Edge implements Externalizable {
            int weight;
            // Auxiliary methods
        }
    

It is important to note that each Edge is Externalizable since they are stored within a Node which is also Externalizable. The fields that need to persist across the network must therefore be saved and retrieved in the writeExternal and readExternal methods respectively. Using the SPEdge as an example:

        public void writeExternal(ObjectOutput out) throws IOException {
            super.writeExternal(out);       //Required
            out.writeInt(this.weight);
        }

        public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
            super.readExternal(in);         //Required
            this.weight = in.readInt();
        }
    

Note that even though the Beehive Edge class does not maintain any information, a call to super is required. This is in the event that further development of the Beehive model requires Edges to maintain some state.

It is also important to note that Beehive stores all connections as Edge objects in each Node to facilitate applications that involve hyper graph structures. Therefore, all edge retrievals are returned as an Edge object. It is up to the programmer to type-check the edges and make the appropriate type-cast(s) if necessary.


4.4 StorageSystem

Beehive provides the StorageSystem class as an abstraction to store all of the nodes needed for the application’s lifetime. The programmer can use the StorageSystem to add, update, remove, locate, and relocate nodes in the application. The StorageSystem provides all of its functionalities with location transparency so that programmers do not need to be concerned with the location of nodes. This is achieved by having each server in the cluster maintain a local StorageSystem that keeps track of its peers.

The full listing of functionalities provided by the StorageSystem can be found JAVADOC. Here we present a summary of the methods of importance to the programmer:

Excluding getAllKeys, the StorageSystem provides a “bulk” variant for each method just described. For example, one could provide a list of nodes to put, get, update, or execute upon in parallel. See Chapter 6 for more details.


4.5 Task

As mentioned in Chapter 1, a Task is an abstraction in the Beehive computation model that maintains information specific to the application problem. Each task reads and/or updates the data of some vertices in the graph. Upon completion, a task can create additional tasks. Each Task maintains the following information:

  • nodeId -- The id of th enode that the computation is to be performed on
  • phase -- The phase in which the task is in during the computation. This is application driven so that programmers can define several phases for their tasks. For example, we define several phases in the base Task class:
    • STARTUP -- The task is currently preprocessing the graph or creating the initial computation tasks to distribute
    • GENERIC -- The task is currently doing some generic computation
    • TERMINATION -- The task is currently terminating
  • status -- The status of the task during its lifetime. There are for statuses:
    • TASK_INIT -- The task was just created
    • TASK_WAITING -- The task is in the workpool waiting to be picked up for execution
    • TASK_FINISHED -- The task has been validated, committed, and removed from the workpool
  • target location -- Which workpool in the cluster the task should be ideally be deposited to.
  • affinity level -- The affinity to which the task is to be deposited to the target location. There are three levels of affinity:
    • NO_AFFINITY -- Deposit the task in an arbitrary workpool in the cluster
    • WEAK_AFFINITY -- Deposit the task at the target location if possible
    • STRONG_AFFINITY -- Deposit the task at the target location

Programmers can extend this class in order to add application-specific computatios by writing a subclass of Task. For example, consider the Single Source Shortest Path problem. Each task can maintain the current shortest distance from a node to the a source node as well as the weight of an edge that could potentially decrease the distance. Thus, a possible subclass would be SPTask where additional fields and behaviors can be defined: public class SPTask extends Task { int distFromSrc; int edgeWeight; String senderId; // Auxiliary methods } It is important to note that each task is stored as a Task object in the workpool. This is so that multiple types of task can co-exist in the workpool for more abstract applications. As a result, it is up to the programmer to type-check the tasks and make the appropriate type-cast(s) if necessary.

In general, each task carries the information that is needed for the computation that is to be done on a node by the Worker class (
Chapter 5). Each task can also carry the computation logic that can be invoked by the Worker class at run time.


4.6 LocalWorkpool

Beehive stores all created tasks in a workpool. Each server in the cluster maintains its own LocalWorkpool, but can deposit Tasks into its peer's workpool. A LocalWorkpool maintains two task-pools. The first is a ready-pool, tasks that are ready to be picked up and executed. The second is an active-pool, tasks that have been picked up and are currently being executed. In addition to maintaining two task-pools, the LocalWorkpool is also responsible for interfacing with the Validator to validate any completed tasks. Completed tasks that are valid are removed from the LocalWorkpool entirely. Those that are invalid may either be retried immediately or placed into the ready-pool for re-execution.

A full listing of the functionalities provided by a LocalWorkpool can be found JAVADOC. Here we present a summary of the functionalities that are of importance to the programmer: