Abstract:Root cause analysis plays a critical role in ensuring the stability and efficiency of modern software systems, particularly in cloud computing and microservice-based systems. Large language models (LLMs), with their powerful natural language processing and data analysis capabilities, have provided new solutions for root cause analysis. LLM-based agents have further enhanced root cause analysis capabilities, such as higher levels of automation and more precise problem localization. While existing research has explored the application of LLMs in root cause analysis, research on LLM-based agents is still at an early stage. To address this gap, this survey provides a comprehensive analysis and summary of current research on LLM-based agents for root cause analysis in cloud computing and microservices systems. The main contents include (1) an overview of the architecture of LLM-based agents and the types of data involved in root cause analysis; (2) a systematic analysis of how LLM-based agents are applied to root cause analysis through the main stages of information collection, root cause localization, and effectiveness evaluation; (3) an exploration of the main challenges and future directions of LLM-based agent technologies in root cause analysis tasks.