Programming Model
The Computation Model
Graph data is stored in the RAM of the cluster nodes using a key-value based storage model which provides location-transparency in accessing data. Node ( graph vertex) IDs are used as keys. A node object contains directed edges for other nodes in the graph. Beehive allows different types of edges to exist between two nodes; this facilitates applications involving hyper graph structures. The two abstractions in the Beehive computation model are Tasks and Transactions
- Task
- A Vertex-centric computation for a task is specific to the application program and the algorithm. A task reads and/or updates data of some vertices. A task can create new tasks on its completion.
- Transaction
- Each task is executed as a transaction based on the optimistic concurrency control model Kung & Robinson, 1981. In any case of read-write and write-write conflict among any concurrently executing transactional tasks, only one commits while the others are aborted.
Execution phases of a transactional task are based on the optimistic execution model Kung & Robinson, 1981 as illustrated below. Beehive's framework executes a Validation Service on one of the dedicated cluster nodes. A transactional task executes the following steps during its execution. If the validation phase results in abort, then the task is re-executed as a new transaction.
- Transaction (task) acquires the Start-Timestamp when it begins execution.
- Transaction executes Read and Compute phases
- Validation service checks that no concurrent transactions committed after the start-timestamp has any read-write or write-write conflicts.
- Validation service assigns a Commit-Timestamp to the transaction on commit
- Transaction writes the updates to the global key-value storage
- Adds any new tasks to the taskpool
- Reports completion to the global validation service
The System Architecture
Each cluster node executes a Java process which contains the following components:
- A local workpool containing tasks to be executed
- Key-value based storage system called HashtableServer, storing a subset of the items in the globally shared storage syste
- A pool of worker threads which fetch and execute tasks from the localworkpool