Microservice Resilience Risk Identification and Analysis Based on Chaos Engineering
Author:
Affiliation:

Clc Number:

Fund Project:

National Natural Science Foundation of China (U1934212); National Key Research and Development Program of China (2020YFB2103300)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Microservice architecture has already become the mainstream architecture pattern of Internet applications in recent years. However, compared with traditional software architectures, microservice architecture has a more sophisticated deployment structure, which makes it have to face more potential threats that make the system in fault, as well as the greater diversity of fault symptoms. Since traditional measurements like reliability cannot fully show a microservice architecture system's capability to cope with failures, microservice developers started to use the word "resilience" to describe such capability. In order to improve a microservice architecture system's resilience, developers usually need to design specific mechanisms for different system environment disruptions. How to judge whether a system environment disruption is a risk to microservice resilience, and how to find these resilience risks as much as possible before the system is released, are the research questions in microservice development. According to the microservice resilience measurement model which is proposed in authors' previous research, by integrating the chaos engineering practice, resilience risk identification and analysis approaches for microservice architecture systems are proposed. The identification approach continuously generates random system environment disruptions to the target system and monitors variations in system service performance, to find potential resilience risks, which greatly reduces human effort in risk identification. For identified resilience risks, by collecting performance monitoring data during chaos engineering, the analysis approach uses the causality search algorithm to build influence chains among system performance indicators, and provide chains with high possibility to system operators for further analysis. Finally, the effectiveness of the proposed approach is proved by a case study on a microservice architecture system.

    Reference
    Related
    Cited by
Get Citation

殷康璘,杜庆峰.基于混沌工程的微服务韧性风险识别和分析.软件学报,2021,32(5):1231-1255

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 10,2020
  • Revised:December 15,2020
  • Adopted:
  • Online: February 07,2021
  • Published: May 06,2021
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063