Beehive is a Java based framework for parallel programming of graph applications using a transactional model of task execution. Many data analytic applications for large scale graph data require parallel processing utilizing a cluster computing environment. Parallelism in many graph problems tends to be fine-grained and irregular, and it is not easy to extract parallelism through static analysis and data partitioning. This is called amorphous paralleism. Graph problems with amorphous paralleism cannot be easily partitioned for programming using the MapReduce model. The Beehive framework addresses this problem based on a transactional model of parallel programming. In Beehive, vertex-centric computation tasks for a problem are executed as serializable translations using an optimistc model for concurrency control.
Graph data is stored in the RAM of the cluster nodes using a key-value based storage model which provides locaiton-transparancy in accessing data. Node (graph vertex) ids are used as keys. A node object contains directed edges for other nodes in the graph. Beehive allows different types of edges between two nodes, this facilitates applications involving hyper graph structures.
The two abstractions in the Beehive computation model are task and transactions:
Task -- Vertex-centric computation for a task is specific to the application problem and the algorithm. A task reads and/or updates data of some vertices. A task can create new tasks on its completion.
Transaction -- Each task is executed as a transaction based on the optimistic concurrency control model [Kung & Robinson 1981]. In case of any read-write and write-write conflict among any concurrently executed transactional tasks, only one commits while the others are aborted.
On abort, the task is re-executed as a new transaction
On commit, the updates are written to the global storage and newly created tasks are added to the task pool.
Execution phases of a transactional task are based on the optimistic execution model [Kung & Robinson 1981] as illustrated below. Beehive's framework executes a Validation Service on one of the dedicated cluster nodes. A transactional task executes the following steps during its execution. If the validation phase results in abort, then the task is re-executed as a new transaction.
Transaction (task) aquires the 'Start-Timestamp' when it begins execution
Transaction executes Read and Compute phases
Validation service checks that no concurrent transactions committed after the start-timestamp has any read-write or write-write conflicts.
Validation service assigns a Commit-Timestamp to the transaction on commit.
Transaction writes the updates to the global key-value storage
Adds any new tasks to the taskpool
Reports completion to the global validation service
Each cluster nodes executes a Java process which contains the following components:
A local workpool containing tasks to be executed
Key-value based storage system called HashtableServer, storing a subset of the items in the globally shared storage system
A pool of worker threads which fetch and execute tasks from the local workpool.