Rollback Algorithm and Crash Recovery Based on Fault-Sensitive Graphs
LIU Ying,CHEN Dao-xu,XIE Li,CAO Jian-nong
Received:September 15, 1998    Revised:February 02, 1999
> 中文摘要: 扩充的面向图结构的分布式程序设计模型(extended graph-oriented model,简称ExGOM)提供了一个支持动态配置的系统框架.系统的动态配置包括系统运行时的伸缩、运行时的升级以及出现故障后的重配置.故障后的重配置所涉及的问题之一是如何恢复系统原状态,该文着重就此问题进行了讨论,给出了基于故障敏感图的异步检查点回卷算法和故障恢复策略.该算法和策略考虑了在暂时性主机故障中单个主机上有多个故障进程的情况.与其他异步回卷及故障恢复算法相比,该算法将故障区域局部化,仅对故障敏感节点进行回卷,从而有效地降低了系统开销.
Abstract:Extended graph-oriented distributed programming model (ExGOM) provides a system architecture to support dynamic configuration.Dynamic configuration involves system expansion and shrink during execution,upgrading while running,and reconfiguration after a fault occurs.One problem in reconfiguration is how to recover the system to the consistent states that exist just before the occurrence of faults.This paper is focused on this problem and proposes an asynchronous rollback algorithm and a crash recovery mechanism based on fault-sensitive graphs.The issue of multiple faulty processes on a single transient faulty host is addressed.Compared with other asynchronous rollback and recovery algorithms,the algorithm presented in this paper localizes the region of faults.Only fault-sensitive nodes are rolled back.This results in a minimized system overhead.
基金项目:本文研究得到国家863高科技项目基金(No.863-306-ZT02-03-01)和香港理式大学研究基金资助. 本文研究得到国家863高科技项目基金(No.863-306-ZT02-03-01)和香港理式大学研究基金资助.
Foundation items:
LIU Ying,CHEN Dao-xu,XIE Li,CAO Jian-nong.Rollback Algorithm and Crash Recovery Based on Fault-Sensitive Graphs.Journal of Software,2000,11(2):235-239