|
Ensuring high
availability of critical services in face of attacks, overload conditions, or
network outages is an important requirement in many areas. This project is
investigating a system architecture for building highly available services that
are resilient to such adverse conditions. The objective is to facilitate
construction of adaptive and autonomic services incorporating recovery and
protection mechanisms in their deployment to ensure high availability. The
techniques being investigated are based on dynamic replication, relocation and
regeneration of services in case of overload conditions. Design and development
of distributed mechanisms to actively monitor the components of a service in
different domains and their operating conditions is an important requirement
towards this research goal.
This project is being conducted using the facilities of the
PlanetLab infrastructure. This research is utilizing the mobile agent technology
for deployment, relocation, and replication ofservice components. In the system
architecture being investigated, mobile agents are utilized as mobile service
containers to support relocation of a service over the Internet. This project is
also investigating an agent-based framework for monitoring the operating
conditions of services and their hosting environments at different nodes on the
PlanetLab. This research is identifying metrics and models for resource
utilization, load conditions, and available capacities at different nodes on the
PlanetLab to drive the autonomic mechanisms for service relocation,
regeneration, and replication. The PlanetLab environment poses unique challenges
as the resource capacities available to a service are based on the proportional
share model, and it can change unpredictably due to usage by other users and
applications. For finding location information of mobile and replicated
services, this project has developed a DHT based facility using the Pastry
system. This DHT based system is used to direct client requests to service
replicas based on different kinds of service-access models such as anycast and
multicast.
Our current activities on this project have focused on the design and evaluation
of a system for deploying highly available and migratable services in shared
infrastructures, such as the PlanetLab,
where the available resource capacities at a node can fluctuate significantly. A
migratable service can monitor its operating conditions and autonomously
relocate itself to another node when the available resource capacities at the
current node fall below certain acceptable limits. We investigate here the
mechanisms for service relocation, and client-side protocols to access migratory
services. The ``blackout periods'', i.e. the time during which the clients are
unable to access a migrating service, are needed to be minimized and kept within
some tolerable limits for services required to be highly available. We designed
and implemented a migratable service using a mobile agent, and evaluated its
performance in terms of the blackout periods and the service agent's abilities
to autonomously migrate in the network. We utilized replication of service
agents to reduce the blackout periods, and developed the coordination protocols
for autonomous agent migration in a group of service agents. We also developed
an infrastructure service for monitoring the PlanetLab nodes for available
resource capacities in order to assist a migratory service in selecting a target
node for relocation.
Sponsors
National Science Foundation (NSF) Grant:
0834357 and 0708604
|