• Volume 37,Issue 8,2026 Table of Contents
    Select All
    Display Type: |
    • Survey on Code Generation with LLM-based Agents

      2026, 37(8). DOI: 10.13328/j.cnki.jos.007593

      Abstract (1102) HTML (0) PDF 1.87 M (992) Comment (0) Favorites

      Abstract:Code generation agents powered by large language models (LLMs) are revolutionizing the software development paradigm. Distinct from previous code generation techniques, code generation agents are characterized by three core features: (1) Autonomy: the ability to independently manage the entire workflow, from task decomposition to coding and debugging. (2) Expanded task scope: capabilities that extend beyond generating code snippets to encompass the full software development lifecycle (SDLC). (3) Enhancement of engineering practicality: a shift in research emphasis from algorithmic innovation toward practical engineering challenges, such as process management, system reliability, and tool integration. This domain has recently witnessed rapid development and an explosion in research, demonstrating significant application potential. This paper presents a systematic survey of the field of LLM-based code generation agents. We trace the technology's developmental trajectory from its inception and systematically categorize its core techniques, including both single-agent and multi-agent architectures. Furthermore, this survey details the applications of LLM-based agents across the full SDLC, summarizes mainstream evaluation benchmarks and metrics, and catalogs representative tools. Finally, by analyzing the primary challenges, we identify and propose several foundational, long-term research directions for the future work of the field.

    • Survey on Application of LLM-based Agents in Root Cause Analysis of Software Systems

      2026, 37(8):1-25. DOI: 10.13328/j.cnki.jos.007594

      Abstract (598) HTML (1) PDF 1.06 M (698) Comment (0) Favorites

      Abstract:Root cause analysis plays a critical role in ensuring the stability and efficiency of modern software systems, particularly in cloud computing and microservice-based systems. Large language models (LLMs), with their powerful natural language processing and data analysis capabilities, have provided new solutions for root cause analysis. LLM-based agents have further enhanced root cause analysis capabilities, such as higher levels of automation and more precise problem localization. While existing research has explored the application of LLMs in root cause analysis, research on LLM-based agents is still at an early stage. To address this gap, this survey provides a comprehensive analysis and summary of current research on LLM-based agents for root cause analysis in cloud computing and microservices systems. The main contents include (1) an overview of the architecture of LLM-based agents and the types of data involved in root cause analysis; (2) a systematic analysis of how LLM-based agents are applied to root cause analysis through the main stages of information collection, root cause localization, and effectiveness evaluation; (3) an exploration of the main challenges and future directions of LLM-based agent technologies in root cause analysis tasks.

    • SmartGen-AADL: Multi-agent-driven System Requirements Analysis and AADL Model Generation

      2026, 37(8):1-37. DOI: 10.13328/j.cnki.jos.007595

      Abstract (408) HTML (0) PDF 5.26 M (523) Comment (0) Favorites

      Abstract:Modeling embedded systems is an essential component of model-based software development. The architecture analysis and design language (AADL), with its ability to formally express hardware-software structures and interaction relationships, is widely applied in system design. Large language models (LLMs) provide a new pathway for generating architecture models from natural language requirements. However, existing approaches exhibit significant limitations in requirement semantic understanding, boundary identification of AADL components, and construction of connection relationships, which constrain their practicality and the quality of generated models. To address these challenges, this study proposes an intelligent modeling approach for embedded systems, termed SmartGen-AADL. The overall framework is built upon a multi-agent collaboration mechanism, integrating key techniques such as semantic parsing, structural recognition, and prompt-enhanced generation, thus enabling high-quality transformation from natural language requirements into structured AADL models. The method consists of three core stages: (1) a structural agent identifies system architectures from system architecture documents and extracts standardized requirement statements; (2) a sub-problem agent performs item-level analysis and interaction mining to refine requirement granularity and explicitly model component interactions; (3) a component generation agent incorporates structural guidance and retrieval-augmented generation (RAG) of similar components into semantic prompts, guiding the LLM to produce component code that conforms to AADL syntax. To support this process, a knowledge base of “itemized requirements-AADL components” and a semantic alignment dataset of “system architecture documents-AADL architectures” are constructed. Experimental results on 15 embedded system application scenarios demonstrate that, compared with approaches solely relying on prompt engineering, the proposed multi-agent collaborative modeling method achieves significant improvements across four mainstream LLMs. Among them, the performance gains are most pronounced on the DeepSeek-r1 model: the component code error rate is reduced by an average of 34.37%, FBERT semantic similarity is increased by 6.21%, structural matching accuracy improves by more than 20%, and human evaluation scores rise by approximately 0.7 points. Furthermore, results from the ablation study reveal that the sub-problem identification mechanism enhances control over modeling granularity. The system structure tree contributes to component organization and hierarchical topology information. The retrieval-augmented generation mechanism supplies external knowledge support and reduces hallucination. Communication connection recognition ensures interface completeness and closed interaction loops. The synergy of these four mechanisms substantially promotes alignment between natural language requirements and the AADL modeling language, thereby improving model consistency.

    • Benchmarking Large Language Models for Software Testing Knowledge Q&A

      2026, 37(8):1-17. DOI: 10.13328/j.cnki.jos.007596

      Abstract (453) HTML (0) PDF 3.68 M (672) Comment (0) Favorites

      Abstract:Large language models (LLMs) have demonstrated remarkable performance in general tasks. However, their trustworthiness, robustness, and applicability in specialized domains remain insufficiently assessed. Using the compilation of software testing textbooks as a representative application scenario, this study constructs 700 carefully designed test questions covering 100 core testing concepts and methods and systematically assesses five representative LLMs in terms of reading comprehension, question-answering (Q&A), and text generation. The experimental results indicate that LLMs generally exhibit strong performance on most questions, achieving high levels of accuracy, completeness, and fluency. However, issues of reliability, such as hallucination and reasoning bias, persist, particularly when addressing current research trends and complex concepts. Further analysis reveals that LLM-generated content provides broader knowledge coverage and greater educational value compared with traditional textbooks, offering effective support for revising and teaching software testing materials. This study not only delineates the specific capability boundaries and typical deficiencies of LLMs in processing domain knowledge but also provides empirical evidence and methodological insights for advancing Q&A-driven intelligent evaluation in professional education and applications.

    • Question-answering Capability Evaluation of Large Language Models on Linux Kernel Development Knowledge

      2026, 37(8):1-28. DOI: 10.13328/j.cnki.jos.007597

      Abstract (400) HTML (0) PDF 5.13 M (651) Comment (0) Favorites

      Abstract:Large language models (LLMs) have shown great potential in software development question-answering (QA) tasks, providing new approaches for acquiring and understanding code knowledge. However, in complex system software represented by the Linux kernel, the actual capabilities of LLMs in code implementation, understanding key mechanisms, tracing evolutionary history, and analyzing design decisions remain insufficiently validated. Existing benchmarks mainly target general-purpose tasks and suffer from insufficient domain depth, difficulty saturation, and misalignment with real engineering practices, making it difficult to ensure the objectivity, accuracy, and comprehensiveness of domain-specific development knowledge QA. To objectively evaluate the QA capabilities of LLMs in complex system software, this study proposes a benchmark dataset construction method for LLM QA capability evaluation, constructs the high-quality QA benchmark for the Linux kernel (LKQABench), and further designs a multi-judge collaborative code knowledge QA evaluation method (MJ-CCE). LKQABench is built from real technical QA data in developer communities, refined through semantic analysis and human review, resulting in 202 standard QA pairs covering major Linux kernel subsystems and multiple cognitive dimensions. MJ-CCE defines a collaborative scoring and voting mechanism among multiple judge models, evaluating answers across three dimensions: key points coverage, factual correctness, and clarity of expression. Experiments on LKQABench show that current LLMs achieve satisfactory performance on single-point knowledge questions related to kernel implementation but exhibit significant shortcomings, such as missing key points and incomplete reasoning chains, when tackling cross-topic integration, deep reasoning, and version-evolution-related questions. This study not only delineates the capability boundaries of LLMs in software development knowledge QA but also provides empirical evidence to support their continuous optimization in this domain.

    • Extracting Protocol Interactions via LLM Based on Linguistic Expression Pattern Analysis

      2026, 37(8):1-18. DOI: 10.13328/j.cnki.jos.007598

      Abstract (311) HTML (0) PDF 1.42 M (550) Comment (0) Favorites

      Abstract:Extracting protocol interactions from textual specification documents written in natural language is useful, especially when to verify the correctness of a protocol before its implementation and application, or when to generate test cases for protocol-connected systems directly from specification documents. Existing approaches for this purpose rely on deep learning or large language models (LLMs). The deep learning approaches require large-scale and high-quality annotated datasets. They may not work well across protocols in different domains due to limitations imposed by the training datasets, and suffer from difficulties in transfer. The LLM-based approaches offer better generalizability, but existing work only uses simple prompt templates. It does not carefully utilize extraction examples in LLM prompting, and the information extraction process lacks optimization, which affects the effectiveness of the proposed approaches. To address these challenges, this study proposes an enhanced LLM-based method for extracting protocol interactions from protocol texts, based on linguistic expression pattern analysis. Specifically, real-world protocol description texts are first analyzed to summarize common linguistic expression patterns in such texts. Then, representative protocol description examples exhibiting these patterns are selected, and corresponding extraction rules are distilled. Further, these examples and rules are integrated to design a rule retrospection chain-of-thought method for LLM-based protocol interaction extraction. Finally, multi-path inference and self-verification techniques are used to optimize the task execution process. Experimental results on multiple protocol datasets show that the proposed method outperforms the baseline methods in terms of precision and recall of protocol interaction extraction, which confirms the effectiveness of the proposed method.

    • Boosting LLM-based Fault Localization with Parallel Exploration

      2026, 37(8):1-19. DOI: 10.13328/j.cnki.jos.007592

      Abstract (378) HTML (0) PDF 1.14 M (603) Comment (0) Favorites

      Abstract:Software fault localization is a critical issue in software engineering. In recent years, fault localization methods based on large language models (LLMs) have demonstrated a promising prospect in fault localization tasks. However, existing methods maintain only a single decision path for LLMs, which limits the search scope and results in suboptimal fault localization performance. To this end, this study proposes PRIME, an enhanced fault localization method for LLMs based on parallel exploration. The search scope of LLMs is broadened by designing a parallel exploration mechanism for fault locations. Furthermore, multiple candidate fault locations predicted by LLMs are ranked by combining a node importance evaluation method to generate optimized fault localization results. By conducting comparative analysis with other fault localization methods, comprehensive ablation experiments and parameter influence analysis, it is verified that the proposed method can effectively enhance the fault localization performance of LLMs. Compared with the existing methods, PRIME improves the Top-1 metric by over 18%, and its performance improvements in MAP and MRR metrics can reach 15% and 25%, respectively.

Current Issue


Volume , No.

Table of Contents

Archive

Volume

Issue

联系方式
  • 《Journal of Software 》
  • 主办单位:Institute of Software, CAS, China
  • 邮编:100190
  • 电话:010-62562563
  • 电子邮箱:jos@iscas.ac.cn
  • 网址:https://www.jos.org.cn
  • 刊号:ISSN 1000-9825
  •           CN 11-2560/TP
  • 国内定价:70元
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063