张程博,李影,贾统.面向分布式图计算作业的容错技术研究综述.软件学报,2021,32(7):24-0 |
面向分布式图计算作业的容错技术研究综述 |
State-of-the-Art Survey of Fault Tolerance for Distributed Graph Processing Jobs |
投稿时间:2020-09-15 修订日期:2020-10-26 |
DOI:10.13328/j.cnki.jos.006269 |
中文关键词: 图数据 故障和失效 分布式图计算 容错机制 非确定性软件系统 |
英文关键词:graph data fault and failure distributed graph processing fault tolerance uncertainty software system |
基金项目:广东省重点领域研发计划(NO.2020B010164003) |
|
摘要点击次数: 191 |
全文下载次数: 97 |
中文摘要: |
随着图数据规模的日益庞大和图计算作业的日益复杂,图计算的分布化成为必然趋势.然而图计算作业在运行过程中面临着分布式图计算系统内外各种来源的非确定性所带来的严峻的可靠性问题.本文首先分析了分布式图计算框架中不确定性因素和不同类型图计算作业的鲁棒性,并提出了基于成本、效率和质量三个维度的面向分布式图计算作业的容错技术评估框架,然后分别对分布式图计算的四种容错机制——基于检查点的容错、基于日志的容错、基于复制的容错、基于算法补偿的容错等机制结合国内外相关工作做了深入地分析、评估和比较.最后对未来的研究方向做了展望. |
英文摘要: |
As the growth of graph data scale and complexity of graph processing, the trend of distributed graph processing shall be inevitable. However, graph processing jobs run with severe reliability problems caused by the uncertainty originated from inside and outside the distributed graph processing system. This paper first analyzes the uncertainty factors of the distributed graph processing frameworks and the robustness of different types of graph processing jobs; then proposes an evaluation framework of fault tolerance for distributed graph processing based on cost, efficiency and quality of fault tolerance. This paper also analyzes, evaluates and compares the four fault-tolerant mechanisms of distributed graph processing——checkpointing based fault tolerance, logging based fault tolerance, replication based fault tolerance and algorithm compensation based fault tolerance——combining some related domestic and foreign researches. Finally, this paper expects the direction of future researches. |
HTML 下载PDF全文 查看/发表评论 下载PDF阅读器 |