Journal of Software:2010.21(12):3199-3210

Adaptive Scalable RPC Timeout Mechanism for Large Scale Clusters
QIAN Ying-Jin,XIAO Nong,JIN Shi-Yao
Chart / table
Similar Articles
Article :Browse 3487   Download 4106
Received:April 28, 2009    Revised:August 12, 2009
> 中文摘要: 在基于RPC(remote produce call)构建的分布式系统中,超时是一种通用的失效检测手段.在超大规模Lustre存储集群的压力测试中,发现传统的固定超时机制会导致很多不必要的超时而存在缺陷.提出了一种综合考虑了网络条件、服务器负载、扩展性和性能等因素的自适应可扩展的RPC超时机制(Adaptive Scalable RPC Timeout mechanism,简称AST).在其控制下,客户端超时值可以根据网络和服务器的拥塞情况动态地调整设置,而且服务器可以通过额外消息传递通知客户端修改原超时值.经过一系列的模拟和验证,其结果表明,AST是一种更适合的RPC失效检测模型,增强了系统的响应性、可靠性和稳定性,而且对系统的性能没有过大的负面影响.
Abstract:Timeouts are usually used for failure detection in RPC (remote produce call) based systems, which are typically reported on a per-call basis. During pressure testing, on a very large cluster system, it has been found that the traditional fixed timeout mechanism leads lots of unnecessary timeouts, especially when the server loading is involved. This paper proposes an Adaptive Scalable RPC Timeout (AST for short) mechanism that considers network conditions, server load, scalability, and performance. Under this control, the timeout value, set by clients, can be adapted and adjusted in a dynamic fashion, according to congestion of the network and the server. Moreover, the server can notify the client to modify the timeout value of the RPC. Via a series of simulations, it has been proved that the AST mechanism is a more suitable failure detection mechanism for RPC models with timeouts, and it enhances the system responsibility, reliability, and stability without negative impact on performance, even for large-scaled cluster systems.
文章编号:     中图分类号:    文献标志码:
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60736013 (国家自然科学基金) Supported by the National Natural Science Foundation of China under Grant No.60736013 (国家自然科学基金)
Foundation items:
Author NameAffiliation
QIAN Ying-Jin  
XIAO Nong  
JIN Shi-Yao  
Reference text:


QIAN Ying-Jin,XIAO Nong,JIN Shi-Yao.Adaptive Scalable RPC Timeout Mechanism for Large Scale Clusters.Journal of Software,2010,21(12):3199-3210