1. Introduction

Beehive is a Java based framework for parallel programming of graph applications using a transactional model of task execution. Many data analytic applications for large scale graph data require parallel processing utilizing a cluster computing environment. Parallelism in many graph problems tends to be fine-grained and irregular, and it is not easy to extract parallelism through static analysis and data partitioning. This is called amorphous paralleism.

Graph problems with amorphous paralleism cannot be easily partitioned for programming using the MapReduce model. The Beehive framework addresses this problem based on a transactional model of parallel programming. In Beehive, vertex-centric computation tasks for a problem are executed as serializable translations using an optimistc model for concurrency control.

1.1 Beehive Computation Model

Graph data is stored in the RAM of the cluster nodes using a key-value based storage model which provides locaiton-transparancy in accessing data. Node (graph vertex) ids are used as keys. A node object contains directed edges for other nodes in the graph. Beehive allows different types of edges between two nodes, this facilitates applications involving hyper graph structures.

The two abstractions in the Beehive computation model are task and transactions:

Execution phases of a transactional task are based on the optimistic execution model [Kung & Robinson 1981] as illustrated below.

Beehive's framework executes a Validation Service on one of the dedicated cluster nodes. A transactional task executes the following steps during its execution. If the validation phase results in abort, then the task is re-executed as a new transaction.

  1. Transaction (task) aquires the 'Start-Timestamp' when it begins execution

  2. Transaction executes Read and Compute phases

  3. Validation service checks that no concurrent transactions committed after the start-timestamp has any read-write or write-write conflicts.

  4. Validation service assigns a Commit-Timestamp to the transaction on commit.

  5. Transaction writes the updates to the global key-value storage

  6. Adds any new tasks to the taskpool

  7. Reports completion to the global validation service

1.2 Beehive System Architecture

Each cluster nodes executes a Java process which contains the following components:

  1. A local workpool containing tasks to be executed

  2. Key-value based storage system called HashtableServer, storing a subset of the items in the globally shared storage system

  3. A pool of worker threads which fetch and execute tasks from the local workpool.