• Volume 37,Issue 7,2026 Table of Contents
    Select All
    Display Type: |
    • Multi-Data Flow Static Analysis Method for Vulnerability Detection in Web Applications

      2026, 37(7). DOI: 10.13328/j.cnki.jos.007581

      Abstract (327) HTML (0) PDF 1.89 M (418) Comment (0) Favorites

      Abstract:As one of the core technologies forWebapplication vulnerability detection, Static Application Security Testing (SAST) has achieved widespread industrial adoption. However, existing static analysis tools face significant challenges in handling complex logical structures inherent in modern Web applications, such as asynchronous request patterns and multi-source input semantics, due to limitations in underlying design of taint analysis algorithm. To address this issue, this study proposes a multi-data flow analysis method designed for static detections of Web security vulnerabilities. The approach extends traditional taint analysis capabilities through multi-dimensional enhancements to improve both detection performance and generalization: Vertically, a multi-stage data flow analysis is introduced to comprehensively consider data dependencies across different control flow paths through correlation and iterative algorithms. Horizontally, a multi-tag data flow analysis mechanism is implemented to distinguish different input sources via taint tags, thereby capturing finer-grained program context semantics and enhancing detection accuracy. Based on this methodology, a vulnerability detection prototype system named MultiFlow has been developed for Java/JavaScript Web applications. Experimental evaluations demonstrate that MultiFlow’s multi-data flow analysis achieves significant effectiveness on a dataset containing 60 real-world Web applications and third-party libraies, attaining precision rates of 87.18%,75.00%, and 83.72% respectively for Stored XSS, Broken Access Control and Prototype Pollution, and has obtained 8 CVE IDs. Compared with existing approaches, MultiFlow achieves higher precision and recall metrics with reduced analysis overhead, thereby validating its practical applicability.

    • Cross-project Software Defect Prediction Method Based on Personalized Federated Learning

      2026, 37(7). DOI: 10.13328/j.cnki.jos.007582

      Abstract (300) HTML (0) PDF 2.40 M (601) Comment (0) Favorites

      Abstract:To address the dual challenges of data privacy and project heterogeneity in cross-project software defect prediction, this paper proposes an innovative framework named PRIDE-SDP. The core contribution of this framework lies in the deep integration of three key technologies: adopting a personalized federated learning paradigm to customize dedicated prediction models for each heterogeneous project, integrating (ε,δ)-differential privacy mechanisms that provide rigorous mathematical guarantees to protect data locally, and designing a dedicated Temporal-Contextual Fusion Network (TCFN) to efficiently capture software metric features. Experiments on six dataset groups covering 27 open-source projects and 3 enterprise projects validate the effectiveness of this framework: compared with state-of-the-art cross-project defect prediction baselines, PRIDE-SDP achieves an average AUC improvement of 10.7% and F1-score improvement of 7.3%; performance on enterprise datasets is even more outstanding, with an average MCC improvement of 45.2%, Effort@20% improvement of 29.5%, and F1-Score improvement of 35.4% compared to advanced baseline methods. Meanwhile, when providing strong privacy guarantees, the framework's average performance retention rate can still reach over 98% of the optimal performance, and in membership inference attack experiments, it can reduce the attacker's attack accuracy by an average of more than 36%. Experimental results demonstrate that PRIDE-SDP effectively balances privacy protection and personalized adaptation capabilities while maintaining high performance.

    • Exception Trigger Stream-based Fault Localization with Automated Try Block Injection

      2026, 37(7). DOI: 10.13328/j.cnki.jos.007583

      Abstract (247) HTML (0) PDF 1.13 M (398) Comment (0) Favorites

      Abstract:Software fault localization tasks begin with the failure of program execution, and locate the root cause of the failure at the code level by analyzing the abnormal internal state during the program execution. The current mainstream spectrum-based and mutation-based fault localization techniques, as well as the most advanced technique SmartFL, utilize coverage information, mutation information, and information represented by program semantics, respectively, as windows to observe the internal state of the program while running. These three types of information are too broad and not targeted enough, and are limited by bottlenecks such as the tie of statement risk values, high mutation cost, and large information scale, respectively. EXPECT, a fault localization technique using exception trigger stream (a new source of information), was recently proposed to monitor the abnormal internal state of the program through the exception handling statements (Try-catch blocks), and achieved promising effectiveness that surpasses the aforementioned mainstream methods. The premise of EXPECT is that the faulty program must contain enough exception-handling statements. However, in the real open-source community, many software programs do not have a good exception handling mechanism, resulting in their codes containing only very sparse or even no exception handling statements, which directly affects the basis on which EXPECT runs. To this end, a software fault localization method based on exception handling statements injection, INSPECT, is proposed in this paper. By automatically injecting temporary exception handling statements into the faulty program as checkpoints for the internal state, designing a more sophisticated algorithm of the calculation for program statements’ risk value, the running scope of EXPECT is expanded to more general programs that do not contain exception handling statements. In other words, INSPECT effectively expands the application scenario of the exception trigger information, which is an efficient source of data for fault localization, and thus delivers higher generalization. Experimental results show that INSPECT obtains better fault localization effectiveness than the state-of-the-art technique with improvements of 95.25%, 55.92%, and 16.65%(simulated faults) as well as 93.39%, 57.54%, and 13.92%(real-world faults) in the best, average, and worst EXAM metrics, respectively, and 311.47%(simulated faults) and 283.31%(real-world faults) in the MRR metric.

    • Safety Analysis of C/C++ Foreign Language Calls in Python Software Packages Repository

      2026, 37(7). DOI: 10.13328/j.cnki.jos.007584

      Abstract (199) HTML (0) PDF 1.34 M (391) Comment (0) Favorites

      Abstract:The safety of software packages repository is a critical aspect of software supply chain analysis, but existing research and tools often lack effective analysis of foreign language calls in the packages. PyPI, the official packages repository for Python, stores a vast collection of Python packages from various application domains. In addition to programs written in Python, these packages often include C/C++ programs that are called through Python's foreign interfaces — the Python/C API. Analyzing the safety of foreign language calls in Python packages repository is crucial for ensuring the safety and reliability of the software supply chain. By analyzing official documentation for interoperability and relevant methods and tools for cross-language program analysis, we establish a bug benchmark suite for Python-C/C++ interoperability programs. This suite includes test cases for 15 bug patterns across 9 categories, covering 5 language features: memory, type, exception, concurrency, and numerical issues, as well as interoperability bugs in the 16 most-installed PyPI packages that involve C/C++ foreign calls. By evaluating state-of-the-art Python-C/C++ interoperability bug checkers on the bug benchmark suite, a comparative analysis of the soundness, completeness and scalability of existing research and tools is conducted, summarizing the current state and future directions of Python-C/C++ interoperability safety analysis. By analyzing more than 700 reported warnings, we identify 21 new real-world bugs of 3 bug patterns from 6 PyPI repositories.

    • PCLog: Adaptive Log Anomaly Detection via Proximal Policy Optimization and Behavior Cloning

      2026, 37(7). DOI: 10.13328/j.cnki.jos.007585

      Abstract (270) HTML (0) PDF 1.61 M (402) Comment (0) Favorites

      Abstract:The log data record the system operating status, user behavior, and error information. Log-based anomaly detection enables rapid identification of potential security risks or per formance bottlenecks, thereby enhancing operational efficiency and facilitating fault diagnosis. However, existing log anomaly detection methods still face significant challenges, such as the inability to effectively handle changes in log patterns caused by system updates and the lack of an efficient feedback mechanism to consistently maintain high detection performance. To address these issues, we propose a new log anomaly detection framework, PCLog, which uses the Proximal Policy Optimization (PPO) algo rithm as an agent to interpret log events as actions and represent log feature vectors as states. PCLog learns the system’s behavior patterns by maximizing the cumulative rewards from normal se quences and performs anomaly detection based on the probability distribution of subsequent actions. Additionally, when detection performance decreases, PCLog collects erroneous predictions as expert demonstrations and combines them with behavioral cloning from imitation learning to maximize the likelihood of expert data. This allows the model to better approximate expert behavior, dynamically correcting itself, effectively reducing false positives, and enhancing the long-term reliability of the system. Experimental results in three widely used log datasets, including HDFS, BGL, and OpenStack, show that PCLog outperforms existing methods, exhibiting high adaptability to dynamic log patterns.

    • Empirical study of communication among open-source software practitioners on Discord

      2026, 37(7). DOI: 10.13328/j.cnki.jos.007586

      Abstract (191) HTML (1) PDF 1.88 M (357) Comment (0) Favorites

      Abstract:Discord is an increasingly popular online chat platform in the field of open-source software development. Recent research on Discord focuses on software development-related technologies to help practitioners efficiently retrieve information, filter duplicate questions in conversation history, and build discourse datasets for information mining. However, there is still a significant gap in existing research: no research has yet conducted a comprehensive study on the communication characteristics between practitioners in the open-source software community on the Discord platform. In-depth analysis of these communication characteristics is of great significance for promoting efficient interaction in open-source software development and strengthening social and technical collaboration. To fill this gap, this paper conducts an empirical study to explore the communication patterns of practitioners in the open-source software community on Discord based on two datasets. The first dataset contains 616,443 utterances from seven open-source software communities, and the second dataset is a manually selected subset of the former, containing 17,289 utterances. With these two sets of data, the communication behavior of open-source software practitioners on Discord is characterized from the discourse and conversation levels, focusing on their participation, interaction patterns, and discussion topics. In addition, the impact of these communication characteristics on the response time and resolution status of questions raised on Discord was evaluated. Based on the findings, this paper explores implications for online chat platform vendors, provides recommendations for the open-source software community, and suggests directions for future research.

    • CAnalyzer: A Software Composition Analysis Technique for C/C++ Source Code

      2026, 37(7). DOI: 10.13328/j.cnki.jos.007587

      Abstract (233) HTML (0) PDF 2.54 M (454) Comment (0) Favorites

      Abstract:Third-party libraries (TPLs) are widely used in software development but also pose risks such as security vulnerabilities and license conflicts. To address these challenges, Software Composition Analysis (SCA) technology has emerged, aiming to help developers detect security vulnerabilities, outdated patches, and license compliance issues by identifying and analyzing the open-source components and their dependencies used in software, thereby ensuring the security of the software supply chain. However, existing SCA tools in the C/C++ domain face three major limitations: a lack of comprehensive TPL feature libraries, difficulty in detecting library-granularity reuse, and insufficient analysis of TPL dependency relationships. To address these issues, this research proposes a C/C++ source code software composition analysis technology—CAnalyzer—designed for library-granularity reuse detection scenarios. CAnalyzer integrates data from 15 platforms to build a feature library containing 33,100 TPLs and 30,047,290 functions. By applying feature library preprocessing and a multi-threshold matching strategy, it significantly enhances the accuracy of TPL component identification. Additionally, CAnalyzer automatically constructs TPL interdependencies by analyzing source code dependency directives. Experimental results show that CAnalyzer achieves an accuracy of 90.63% and a recall rate of 86.57% in component recognition. Its TPL identification accuracy and comprehensiveness outperform CENTRIS, TPLite, and OSSFP. In TPL dependency detection, CAnalyzer achieves a recall rate of 94.79% and precision of 98.99%. Currently, CAnalyzer has been adopted by the OpenHarmony community, helping identify 166 external components across 689 code repositories, further highlighting its significant value in open-source community management.

    • Logical Defect Detection for Large Language Model Synthesized Code in Software Supply Chain Security

      2026, 37(7). DOI: 10.13328/j.cnki.jos.007588

      Abstract (411) HTML (0) PDF 1.13 M (553) Comment (0) Favorites

      Abstract:With the rapid advancement of large language models (LLMs) in code synthesis, their generated code is increasingly being applied in intelligent foundational software supply chains. These supply chains integrate a large number of third-party modules and components developed using LLM-generated code. However, since LLMs are primarily trained on open-source code, defects and security vulnerabilities within the training data may cause potential errors in the generated code and security risks within the software supply chain. To address these issues, academic research has proposed targeted testing techniques such as EvalPlus. However, these approaches face challenges in covering critical paths in the supply chain due to their reliance on probability-based case generation, which makes it difficult to uncover deep-seated logical software defects. To overcome these limitations, we propose a defect detection method for LLM-generated code in software supply chains that integrates symbolic execution. The method employs a symbolic execution mounting mechanism to automatically identify input parameters in LLM-generated code and perform adaptation and symbolic binding. It then guides the symbolic execution engine to conduct precise constraint analysis on the program’s critical execution paths and generate efficient boundary test cases. These test cases can expose deep logical software defects that traditional methods often fail to detect. We evaluated our approach on 11 high-ranking LLMs from the LMSYS Chatbot Arena, using benchmarks from prominent defect detection frameworks and engines. The results show that our method is more effective at detecting logical defects in LLM-generated code, achieving an average pass rate reduction of 3.99% to 18.98% and coverage enhancement of 3.31% to 8.19% over existing research. These findings highlight the method’s effectiveness in improving the correctness of LLM-generated code and enhancing the security of intelligent foundational software supply chains.

    • Research on Open-source Software Supply Chain Attacks from an Industrial R&D Perspective

      2026, 37(7). DOI: 10.13328/j.cnki.jos.007589

      Abstract (291) HTML (0) PDF 1.22 M (376) Comment (0) Favorites

      Abstract:Open-source software, as the backbone of modern industrial software ecosystems, has become deeply embedded in industrial software delivery pipelines. It shortens release cycles, reduces the cost, and enhances system compatibility. Simultaneously, software-supply-chain attacks against industry are rising annually. This paper analyzes the inherent tension between R&D efficiency and software security from an industry R&D perspective, revealing an implicit consensus among industrial stakeholders that, under efficiency constraints imposed by process compliance, they can only react passively to open-source software supply-chain threats. Combined with practical cases, The paper proves that the R&D security management based on process compliance can not cope with the attack of open source software supply chain; Based on the industrial perspective, this paper proposes an analytical framework of open source software supply chain attacks, which divides the attacks into three stages: the threat of open source co-construction, the threat of closed source research and development, and the attack of product use, and summarizes the attack modes and technologies in different stages. It proposes a security rebalancing framework for the open-source software supply chain from three perspectives: collaborative governance of the open-source ecosystem, continuous compliance, and attack-surface reduction for industrial R&D, and adaptive protection for deployed products.

    • Java New Feature Test Program Generation Based on Large Language Model

      2026, 37(7). DOI: 10.13328/j.cnki.jos.007590

      Abstract (273) HTML (0) PDF 936.90 K (454) Comment (0) Favorites

      Abstract:Since its inception, the Java programming language has continuously evolved and developed. With the ongoing emergence of new language features and programming paradigms, Java’s expressiveness and execution efficiency have steadily improved, driving progress in the broader software ecosystem. To ensure the security and stability of the Java ecosystem, researchers have proposed various test program generation techniques targeting Java compilers and virtual machines to detect potential defects. However, existing approaches primarily focus on mature Java syntax, making it challenging to effectively test newly introduced language features. To address this limitation, this paper proposes LumiX, a novel test program generation method for Java's new features, based on large language models. First, LumiX leverages LLMs to analyze natural language descriptions of new Java features and summarizes their usage patterns. Then, it uses historically bug-revealing test programs as seed inputs and applies program analysis tools to extract reusable variables and functions. By combining this extracted information with the summarized feature usage descriptions, LumiX guides the LLM to generate code snippets that incorporate the new features. These generated snippets are then inserted into the seed programs to produce complete test programs that exercise the new language features. Finally, LumiX employs a dual-layer differential testing strategy. It compiles and runs the generated test programs using different Java compilers (javac and ECJ) and virtual machines (HotSpot and OpenJ9), identifying potential defects through output discrepancies. Experimental results show that LumiX can effectively generate test programs that cover new Java features and enhance the testing capabilities of existing tools. Applied to the latest Java compilers and virtual machines, LumiX discovered 16 previously unknown bugs, 12 of which have been confirmed or fixed by developers.

Current Issue


Volume , No.

Table of Contents

Archive

Volume

Issue

联系方式
  • 《Journal of Software 》
  • 主办单位:Institute of Software, CAS, China
  • 邮编:100190
  • 电话:010-62562563
  • 电子邮箱:jos@iscas.ac.cn
  • 网址:https://www.jos.org.cn
  • 刊号:ISSN 1000-9825
  •           CN 11-2560/TP
  • 国内定价:70元
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063