Welcome to Journal of Software website!

微信服务号

微信订阅号

Current Issue
Online First
Archive
Click Rank
Most Downloaded
综述文章
专刊文章
分辑系列

Article Search

Search by issue

Select AllDeselectExport

Display Method:

Preface

ZHANG Lu, LIU Hui, JIANG Jia-Jun, WANG Bo

2024,35(7):3069-3070, DOI: 10.13328/j.cnki.jos.007114

[Abstract] (310) [HTML] (159) [PDF 481.72 K] (393)

Abstract:

Empirical Study on Data Leakage Problem in Neural Program Repair

LI Qing-Yuan, ZHONG Wen-Kang, LI Chuan-Yi, GE Ji-Dong, LUO Bin

2024,35(7):3071-3092, DOI: 10.13328/j.cnki.jos.007110

[Abstract] (417) [HTML] (103) [PDF 2.60 M] (1104)

Abstract:
Repairing software defects is an inevitable and significant problem in the field of software engineering, while automated program repair (APR) techniques aim to alleviate software defect problem by repairing the defective programs automatically, accurately, and efficiently. In recent years, with the rapid development of deep learning, the field of automated program repair has emerged a method that utilizes deep neural networks to automatically capture the relationship between defective programs and their patches, called neural program repair (NPR). In terms of the number of defects that can be correctly repaired on the benchmark, NPR tools have significantly outperformed non-deep learning APR tools. However, a recent study found that the performance improvement of NPR systems may be due to the presence of test data in the training data, i.e., the data leakage. Inspired by this, to further investigate the causes and effects of data leakage in NPR systems and to evaluate existing systems more fairly, this study: (1) systematically categorizes and summarizes the existing NPR systems, defines the data leakage of NPR systems based on this classification, and designs the data leakage detection method for each category of system; (2) conducts a large-scale testing of existing models according to the data leakage detection method in the previous step and investigates the effect of data leakage on model realism and evaluation performance and the impact on the model itself; (3) analyzes the collection and filtering strategies of existing NPR system datasets, improves and supplements them, then constructs a pure large-scale NPR training dataset based on the improved strategy with the existing popular dataset, and verifies the effectiveness of this dataset in preventing data leakage. From the experimental results, it is found that the ten NPR systems studied in this investigation all had data leakage on the evaluation dataset, among which the NPR system RewardRepair had the more serious data leakage problem, with 24 data leaks on the Defects4J (v1.2.0) benchmark, and the leakage ratio was as high as 53.33%. In addition, data leakage has an impact on the robustness of the NPR system, and all five NPR systems investigated had reduced robustness due to data leakage. As a result, data leakage is a very common problem and can lead to unfair performance evaluation results of NPR systems and affect therobustness of the NPR system on the benchmark. When training NPR models, researchers should avoid data leakage as much as possible and consider the impact of data leakage on the evaluation of the performance of NPR systems to evaluate the NPR systems as fairly as possible.

Comparison Research on Rule-based and Learning-based Mutation Techniques

GONG Zhi-Hao, CHEN Yi-Zhou, CHEN Jun-Jie, HAO Dan

2024,35(7):3093-3114, DOI: 10.13328/j.cnki.jos.007113

[Abstract] (354) [HTML] (71) [PDF 2.86 M] (1011)

Abstract:
Mutation testing is an effective software testing technique. It helps improve the defect detection capability of an existing test suite by generating mutants that simulate software defects. The quality of mutants has a significant impact on the effectiveness of mutation testing. The traditional mutation testing approach typically employs manually designed syntactic rule-based mutation operators to generate mutants, and has achieved some academic success. In recent years, many studies have started to incorporate deep learning techniques to generate mutants by learning historical code from open source projects. This new approach has achieved preliminary progress in mutant generation. A comprehensive comparison of the two mutation techniques, i.e. rule-based and learning-based, which have different mechanisms but both aim to improve the defect detection capability of the test suite by mutation, is crucial for mutation testing and its downstream tasks. To handle the problem, this study designs and implements an empirical study of rule-based and learning-based mutation techniques, aiming to understand the performance of mutation techniques with different mechanisms on the task of mutation testing, as well as the variability of the generated mutants in terms of program semantics. Specifically, this study uses the Defect4J v1.2.0 dataset to compare the syntactic rule-based mutation techniques represented by MAJOR and PIT with the deep learning-based mutation techniques represented by DeepMutation, μBERT, and LEAM. The experimental results show that both rule-based and learning-based mutation techniques can effectively support mutation testing practices, but MAJOR has the best testing performance and is able to detect 85.4% of real defects. In terms of semantic representation, MAJOR has the strongest semantic representation capability, and its constructed test suite is able to kill more than 95% of the mutants generated by other mutation techniques. In terms of defect representation, both types of techniques are unique.

AmazeMap: Microservices Fault Localization Method Based on Multi-level Impact Graph

LI Ya-Xiao, LI Qing-Shan, WANG Lu, JIANG Yu-Xuan

2024,35(7):3115-3140, DOI: 10.13328/j.cnki.jos.007104

[Abstract] (471) [HTML] (65) [PDF 3.30 M] (1158)

Abstract:
Due to the large number of complex service dependencies and componentized modules, a failure in one service often causes one or more related services to fail, making it increasingly difficult to locate the cause of the failure. Therefore, how to effectively detect system faults and locate the root cause of faults quickly and accurately is the focus of current research in the field of microservices. Existing research generally builds a failure relationship model by analyzing the relationship between failures and services and metrics, but there are problems such as insufficient utilization of operation and maintenance data, incomplete modeling of fault information, coarse granularity of root cause localization, etc. Therefore, this study proposes AmazeMap, for which a multi-level fault impact graph modeling method and a microservice fault localization method are designed based on the fault impact graph. Specifically, the multi-level fault impact graph modeling method can comprehensively model the fault information by mining the collected temporal metric data and trace data while system running and considering the interrelationships between different levels; the fault localization method narrows the scope of fault impact, discovers the root cause from service instances and metrics, and finally outputs the most probable root cause of fault and metrics sequence. Based on an open-source benchmark microservice system and the AIOps contest dataset, this study designs experiments to validate AmazeMap, and also compares it with the existing methods. The results confirm AmazeMap’s effectiveness, accuracy, and efficiency.

Fuzzing Approach of Clustering Analysis-driven in Seed Scheduling

ZHANG Wen, CHEN Jin-Fu, CAI Sai-Hua, ZHANG Chi, LIU Yi-Song

2024,35(7):3141-3161, DOI: 10.13328/j.cnki.jos.007105

[Abstract] (486) [HTML] (81) [PDF 3.66 M] (1081)

Abstract:
As a widely used automated software testing technique, the primary goal of fuzzy testing is to explore as many code areas of the program under test as possible, thereby achieving higher coverage as well as detecting more bugs or errors. Most of existing fuzzy testing methods schedule the seed based on the historical mutation data of the seed, which is simpler to implement but ignores the distribution of program space explored by the seed, resulting in that the testing may fall into only a single region of the program to be probed, and causing the waste of testing resources. This study proposes the Cluzz, a fuzzing approach of clustering analysis-driven in seed scheduling. Firstly, Cluzz analyzes the difference between seeds in the feature space by combining the distribution of seed execution path coverage, and uses cluster analysis to classify the distribution of seeds execution in the program space. And then, Cluzz prioritizes the seeds according to the path coverage patterns of different seed clusters and the results of cluster analysis, explores the rare code regions and prioritizes the seeds with higher evaluation scores. Secondly, energy is allocated to the seeds by their evaluation scores, and the interesting inputs obtained from mutations are retained and categorized to update the seed cluster information. Cluzz reevaluates the seeds based on the updated seed clusters to ensure the validity of seeds during testing process, thereby exploring more unknown code regions in a limited time and improving the coverage of the program under test. Finally, the Cluzz is implemented on three current mainstream fuzzers and extensive testing work is conducted on eight popular real-world programs. The results show that Cluzz can detect an average of 1.7 times more unique crashes than a regular fuzzer, and it also outperforms a benchmark fuzzer by an average of 22.15% in terms of the number of new edges found. In addition, compared with the existing seed scheduling methods, the comprehensive performance of Cluzz is better than that of other benchmark fuzzers.

GUI Fuzzing Framework for Mobile Apps Based on Multi-modal Representation

ZHANG Shao-Kun, LI Yuan-Chun, LEI Han-Wen, JIANG Peng, LI Ding, GUO Yao, CHEN Xiang-Qun

2024,35(7):3162-3179, DOI: 10.13328/j.cnki.jos.007106

[Abstract] (463) [HTML] (112) [PDF 2.57 M] (1098)

Abstract:
GUI fuzzing plays a crucial role in enhancing the reliability and compatibility of mobile apps. However, most existing GUI fuzzing methods are inefficient, mainly because they are coarse-grained, relying solely on single-modal features to understand the GUI pages holistically. The excessive abstraction of app states leads to the neglect of many details, resulting in an insufficient understanding of GUI states and widgets. To address this issue, a GUI fuzzing framework called GUIFuzzer for mobile apps is proposed based on multi-modal representation. This framework leverages multi-modal features, such as visual features, layout context features, and fine-grained meta-attribute features, to jointly infer the semantics of GUI widgets. Then, it trains a multi-level reward-driven deep reinforcement learning model to optimize the GUI event selection strategy, thus improving the efficiency of fuzz testing. The proposed framework is evaluated on a large number of real apps. Experimental results show that GUIFuzzer significantly improves the coverage of fuzz testing compared with existing competitive baselines. A case study is also conducted on customized search for specific targets, namely sensitive API triggering, which further demonstrates the practicality of the GUIFuzzer framework.

APP Software Defect Tracking and Analysis Method Oriented to Version Evolution

LIU Hai-Yi, JIANG Ying, ZHAO Ze-Jiang

2024,35(7):3180-3203, DOI: 10.13328/j.cnki.jos.007107

[Abstract] (319) [HTML] (157) [PDF 3.16 M] (1001)

Abstract:
The speed of evolution in mobile application (APP) software market is accelerating. Effective analysis of software defects can help developers understand and repair software defects in time. However, the analysis object of existing research is not enough, which leads to isolated, fragmented information, and poor information quality. In addition, because of insufficient consideration of data verification and version mismatch issues, there are some errors in the analysis results, resulting in invalid software evolution. In order to provide more effective defect analysis results, an APP software defect tracking and analysis method oriented to version evolution (ASD-TAOVE) is proposed. First, the content of APP software defects is extracted from multi-source, heterogeneous APP software data, and the causal relationship of defect events is discovered. Then, a verification method for APP software defect content is designed, which is based on information entropy combined with text features and structural features to calculate the defect suspicious formula for verification and construction of APP software defect content heterogeneity graph. In order to consider the impact of version evolution, an APP software defect tracking analysis method is designed to analyze the evolution relationship of defects in version evolution. The evolution relationship can be transformed into the defect/evolutionary meta-paths which are useful for defect analysis. Finally, this study designs a heterogeneous information network based on deep learning to complete APP software defect analysis. The experimental results of four research questions (RQ) confirmed the effectiveness of ASD-TAOVE method of defect content verification and tracking analysis in the process of version-oriented evolution, and the accuracy of defect identification increased by about 9.9% and 5% respectively (average 7.5%). Compared with baseline methods, the ASD-TAOVE method can analyze more APP software data and provide effective defect information.

External Links

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063