Issue 6,2023 Table of Contents

Preface

XIANG Jian-Wen , ZHENG Zheng , SHEN Wen-Bo , CHANG Rui , TIAN Cong

2023, 34(6):2507-2508. DOI: 10.13328/j.cnki.jos.006854

Abstract (986) HTML (1257) PDF 595.15 K (1510) Comment (0) Favorites

Abstract:

GKCI: An Improved GNN-based Key Class Identification Method

ZHOU Chun-Ying , ZENG Cheng , HE Peng , ZHANG Yan

2023, 34(6):2509-2525. DOI: 10.13328/j.cnki.jos.006846

Abstract (1264) HTML (935) PDF 1.34 M (2008) Comment (0) Favorites

Abstract:Researchers use key classes as starting points for software understanding and maintenance. These key classes may cause a significant security risk to the software if they have defects. Therefore, identifying key classes can improve the reliability and stability of the software. Most of the existing methods are based on non-trainable solutions, which calculate the score of each node according to a certain calculation rule, and cannot fully utilize the structural information available in the software network. To solve these problems, a supervised deep learning method is proposed based on graph neural network technology. First, the project is built as a software network and the network embedding learning method Node2Vec is used to learn the node representation. Then, the node representation is mapped into a score through a simple dense network. Second, the aggregation function of the graph neural networks (GNNs) is improved to aggregate important scores instead of node embedding. The direction and weight information between nodes are also considered when aggregating the scores of neighbor nodes. Finally, the nodes are ranked in descending order according to the predicted score output by the model. To evaluate the effectiveness of the proposed method, it is applied to eight Java open-source software systems. The experimental results show that the proposed method performs better than benchmark methods. In the top 10 key candidates, the proposed method achieves 6.4% higher recall and 3.5% higher precision than the state-of-the-art.

Obfuscation-resilient Android Malware Detection Based on Graph Convolutional Networks

WU Yue-Ming , QI Meng , ZOU De-Qing , JIN Hai

2023, 34(6):2526-2542. DOI: 10.13328/j.cnki.jos.006848

Abstract (1295) HTML (1140) PDF 1.97 M (2394) Comment (0) Favorites

Abstract:Since the release of Android, it has become the most widely used mobile phone operating system in the world due to its advantages such as open source, rich hardware, and diverse application markets. At the same time, the explosive growth of Android devices and Android applications (app for short) has made it a target of 96% of mobile malware. Among current detection methods, the direct extraction of simple program features, ignoring the program semantics is fast but less accurate, and the conversion of semantic information of programs into graph models for analysis improves accuracy but has high runtime overhead and is not very scalable. To address these challenges, the program semantics of an App is distilled into a function call graph and the API call is abstracted to convert the call graph into a simpler graph. Finally, these vectors are fed into a graph convolution network (GCN) model to train a classifier with triplet loss (i.e., SriDroid). After conducting experimental analysis on 20 246 Android apps, it is found that SriDroid can achieve 99.17% malware detection accuracy with sound robustness.

Slice-level Vulnerability Detection and Interpretation Method Based on Graph Neural Network

HU Yu-Tao , WANG Su-Yuan , WU Yue-Ming , ZOU De-Qing , LI Wen-Ke , JIN Hai

2023, 34(6):2543-2561. DOI: 10.13328/j.cnki.jos.006849

Abstract (1965) HTML (1501) PDF 2.13 M (2354) Comment (0) Favorites

Abstract:As software becomes more complex, the need for research on vulnerability detection is increasing. The rapid discovery and patching of software vulnerabilities is able to minimize the damage caused by vulnerabilities. As an emerging detection method, deep learning-based vulnerability detection methods can learn from the vulnerability code and automatically generate its implied vulnerability pattern, saving a lot of human effort. However, deep learning-based vulnerability detection methods are not yet perfect; function-level detection methods have a coarse detection granularity with low detection accuracy; slice-level detection methods can effectively reduce sample noise, but there are still the following two aspects of the problem: On the one hand, most of the existing methods use artificial vulnerability datasets for experiments, and the ability to detect vulnerabilities in real environments is still in doubt; on the other hand, the work is only dedicated to detecting the existence of vulnerabilities in the slice samples and the lack of interpretability of the detection results. To address above issues, this study proposes a slice-level vulnerability detection and interpretation method based on the graph neural network. The method first normalizes the C/C++ source code and extracts slices to reduce the interference of redundant information in the samples; secondly, a graph neural network model is used to embed the slices to obtain their vector representations to preserve the structural information and vulnerability features of the source code; then the vector representations of slices are fed into the vulnerability detection model for training and prediction; finally, the trained vulnerability detection model and the vulnerability slices to be explained are fed into the vulnerability interpreter to obtain the specific lines of vulnerability code. The experimental results show that in terms of vulnerability detection, the method achieves an F1 score of 75.1% for real-world vulnerability, which is 41.2%-110.4% higher than the comparative methods. In terms of vulnerability interpretation, the method can reach 73.6% accuracy when limiting the top 10% of critical nodes, which is 8.9% and 24.9% higher than the other two interpreters, and the time overhead is reduced by 42.5% and 15.4%, respectively. Finally, this method correctly detects and explains 59 real vulnerabilities in the four open-source software, proving its practicality in real-world vulnerability discovery.

Evolutionary Coupling Analysis Method of Software Entity Based on Episode Mining

ZHANG Xin-Yu , JIN Wu-Xia , LIU Jing-Wen , FAN Ming , LIU Ting

2023, 34(6):2562-2585. DOI: 10.13328/j.cnki.jos.006853

Abstract (851) HTML (1091) PDF 2.71 M (1534) Comment (0) Favorites

Abstract:The entity evolutionary coupling analysis of software systems is helpfulfor analysis activities such as co-change candidate prediction, risk identification of software supply chain, code vulnerability traceability, defect prediction and architecture problem localization. The evolutionary coupling between two entities indicates that these entities tend to be changed together in the software revision history. Existing methods present a low accuracy to detect the frequent "having distance" co-change in the revision history. To address this problem, this study proposes an evolutionary coupling analysis method based on the combination of association rule mining, episode mining and latent semantic indexing (association rule, MINEPI and LSI based method, AR-MIM), which mines co-change relations of "having distance". The experiment verified the effectiveness of AR-MIM by compared with the four baseline methods on the dataset, collecting 58 Python projects, 242 074 pieces of training data, and 330 660 pieces of ground truth. The results show that the precision, recall, and F1 score of AR-MIM are better than those of existing methods in co-change candidate prediction.

Verification of Steering Angle Safety for Self-driving Cars Using Convex Optimization

WU Hui-Hui , ZHANG Ya-Nan , HOU Gang , WATANABE Masahiko , WANG Jie , KONG Wei-Qiang

2023, 34(6):2586-2605. DOI: 10.13328/j.cnki.jos.006851

Abstract (900) HTML (842) PDF 2.32 M (1420) Comment (0) Favorites

Abstract:Providing formal guarantees for self-driving cars is a challenging task, since input-output space (i.e., all possible combinations of inputs and outputs) is too large to explore exhaustively. This paper presents an automated verification technique ensuring steering angle safety for self-driving cars by incorporating convex optimization and deep learning verification (DLV). DLV is an automated verification framework for safety of image classification neural networks. The DLV is extended by convex optimization technique in fail-safe trajectory planning to solve the judgement problem of predicted steering angle, and thus, to achieve verification of steering angle safety for self-driving cars. The benefits of the proposed approach are demonstrated on the NVIDIA's end-to-end self-driving architecture, which is a crucial ingredient in many modern self-driving cars. The experimental results indicate that the proposed technique can successfully find adversarial misclassifications (i.e., incorrect steering decisions) within given regions and family of manipulations if they exist. Therefore, the safety verification can be achieved (if no misclassification is found for all DNN layers, in which case the network can be said to be stable or reliable w.r.t. steering decisions) or falsification (in which case the adversarial examples can be used to fine-tune the network).

Review on Exception Analysis Methods for Software Supply Chain

GE Li-Li , SHUAI Dong-Xin , XIE Jin-Yan , ZHANG Ying-Zhou , XUE Yu-Chuan , YANG Jia-Yi , MI Jie , LU Yue

2023, 34(6):2606-2627. DOI: 10.13328/j.cnki.jos.006850

Abstract (1458) HTML (1178) PDF 2.34 M (2109) Comment (0) Favorites

Abstract:Software occupies an increasingly important position in various fields of the national economy. Under the background of the Internet of everything, interaction, analysis and collaboration of information are becoming more and more common, and dependencies among programs/softwares are increasing. It makes people put forward higher requirements for system reliability and robustness. A software supply chain consists of open source components and third-party components, and its security problems have become the focus of both academia and industry in recent years. As an important part of open source software, library functions are closely related to the security of the software supply chain. In order to improve software development efficiency, software libraries or application programming interfaces (APIs) will be frequently used in the process of programming, but errors or vulnerabilities in library functions may be exploited by attackers to compromise the security of the software supply chain. These errors or vulnerabilities are often related to exceptions in library functions. Therefore, the exception analysis methods of library functions are summarized from the two aspects of accuracy and efficiency in this study. The basic idea and important process of each exception analysis method are described, and a preliminary solution is given for the challenges faced by library function exception analysis. Exception analysis of library functions in the software supply chain is helpful to enhance the robustness of software system and to ensure the security of the software supply chain.

Software Supply Chain Analysis Techniques for Java Ecosystem

MAO Tian-Yu , WANG Xing-Yu , CHANG Rui , SHEN Wen-Bo , REN Kui

2023, 34(6):2628-2640. DOI: 10.13328/j.cnki.jos.006852

Abstract (1290) HTML (1143) PDF 1.64 M (1843) Comment (0) Favorites

Abstract:With the prosperity of open-source software, almost all software companies use these reusable components as basic build blocks to build their software products, thus forming the software supply chain. The software supply chain improves development efficiency and reduces labor costs for software companies. However, it may also introduce new security problems. In particular, if one software component has high-risk vulnerabilities, the software supply chain inevitably spreads these vulnerabilities to all its dependencies, thus amplifying these vulnerabilities' impact. For example, through the software supply chain, the Log4j2 vulnerability causes a catastrophic security issue for the whole Java ecosystem. Unfortunately, current research studies on Java software supply chain mainly focus on a single component or a group of components and miss the impact study on the ecosystem scale. Therefore, this paper presents the essential software supply analysis techniques to study the component and vulnerability impact on the Java ecosystem. More specifically, the formal definition of component dependencies is first given in the software supply chain. Next, new techniques are proposed and an analysis tool is built to analyze all component dependencies in the Java ecosystem, including over 8.8 million component versions and 65 million dependencies. Finally, Log4j2, a logging library affected by the vulnerability, is used as an example to evaluate its impact on the whole Java ecosystem. The results show that the vulnerability affects 15.12% of the ecological components (71 082) and 16.87% of the component versions (1 488 971), and the vulnerability-fix rate is only 29.13%.

Attack Detection Method Based on Indicator-dependent Model Construction and Monitoring

WANG Li-Min , BU Lei , MA Le-Zhi , YU Xiao-Feng , SHEN Ning-Guo

2023, 34(6):2641-2668. DOI: 10.13328/j.cnki.jos.006847

Abstract (818) HTML (974) PDF 2.89 M (2132) Comment (0) Favorites

Abstract:With the continuous evolution of attack techniques, the difficulty of defense is increasing rapidly. In order to identify and block the attacks in a timely and effective manner, numerous detection-based defenses have been proposed in academia and industry. The current attack detection methods mainly focus on attack behaviors, and find attacks by identifying attack signals or locating abnormal activities. These solutions have the limitation of insufficient generalization and attack-orientation respectively and are easily bypassed by attackers' well-crafted behaviors, resulting in false positives and false negatives. Nevertheless, it is observed that the attacks and their variants usually leverage different attack mechanisms to bypass some defenses and achieve the same attack purpose. Since the attack purpose remains the same, the impact of these attacks on the system is still similar, so the caused system impact will not increase correspondingly with the large increase in attack methods. Based on the observation, an indicator-dependent model-based attack detection method is proposed to detect the attack variants more effectively. The proposed model focuses on the impact of the exploits on the system rather than the various attack behaviors, which is more generalizable. Based on the model, the multi-level monitoring technology is further adopted to quickly capture and locate attack traces, and finally the accurate detection of target attacks and variants is achieved, which effectively reduces the false alarm rate. The effectiveness of the proposed method is verified by the experiment, compared with existing attack behavior-based detection methods on the attack set composed of the DARPA transparent computing project and typical APT attacks. The experimental results show that the proposed solution is able to achieve 99.30% detection accuracy with an acceptable performance cost.

Heterogeneous Defect Prediction Based on Simultaneous Semantic Alignment

LI Wei-Wei , CHEN Xiang , ZHANG Heng-Wei , HUANG Zhi-Qiu , JIA Xiu-Yi

2023, 34(6):2669-2689. DOI: 10.13328/j.cnki.jos.006495

Abstract (857) HTML (695) PDF 8.02 M (1588) Comment (0) Favorites

Abstract:Heterogeneous defect prediction (HDP) can effectively solve the problem that the source project and the target project use different features. It uses heterogeneous feature data from the source project to predict the defect tendency of the software module in the target project. At present, HDP has made certain achievements, but its overall performance is not satisfactory. Most previous HDP methods solve this problem by learning domain invariant feature subspace to reduce the difference between domains. However, the source domain and the target domain usually show huge heterogeneity, which makes the domain alignment effect not satisfied. The reason is that these methods ignore the potential knowledge that the classifier should generate similar classification probability distributions for the same category in the two domains, and fail to mine the intrinsic semantic information contained in the data. In addition, because the collection of training data in newly launched projects or historical legacy projects relies on expert knowledge, is time-consuming, laborious, and error-prone, the possibility of heterogeneous defect prediction is explored based on a small number of labeled modules in the target project. Based on these, a heterogeneous defect prediction method is proposed based on simultaneous semantic alignment (SHSSAN). On the one hand, it explores the implicit knowledge learned from the labeled source projects, so as to transfer relevance between categories and achieve implicit semantic information transfer. On the other hand, in order to learn the semantic representation of unlabeled target data, centroid matching is performed through target pseudo-labels to achieve explicit semantic alignment. At the same time, SHSSAN can effectively solve the class imbalance problem and the data linearly inseparable problem, and make full use of the label information in the target project. Experiments on public heterogeneous data sets containing 30 different projects show that compared with the current excellent CTKCCA, CLSUP, MSMDA, KSETE, and CDAA methods, the F-measure and AUC are increased by 6.96%, 19.68%, 19.43%, 13.55%, 9.32% and 2.02%, 3.62%, 2.96%, 3.48%, 2.47%, respectively.

Crowdsourcing Software Development Oriented Fault Localization

LI Le-Ping , ZHANG Yu-Xia , LIU Hui

2023, 34(6):2690-2707. DOI: 10.13328/j.cnki.jos.006498

Abstract (536) HTML (648) PDF 5.92 M (1474) Comment (0) Favorites

Abstract:Fault localization is an essential precondition for repairing in software development. To this end, researchers have proposed automated fault localization (AFL) methods to facilitate the task. Such approaches have taken full advantage of information such as the execution tracks and execution results of given test cases and receive significant effectiveness in reducing the difficulty of fault localization. In competitive crowdsourcing software development, one task could receive multiple competitive implementations (solutions). This study proposes a novel approach for AFL in crowdsourcing software engineering. The key insight of the proposed approach is that when locating faulty statements in a program, it regards competitive implementations as reference programs. By searching for reference statements in reference programs for each statement in buggy program, it calculates the suspicious score of the statement by leveraging its references. Given a set of test cases and a buggy program, the test scenario is run and the initial suspicious score for each statement in the buggy program is calculated by wildly used SBFL approach. After that, suspicious score of each statement is adapted according to its similarity with statements in competitive implementations. The proposed approach is evaluated on 118 real word buggy programs that are accompanied with competitive implementations. The evaluation results suggest that compared with SBFL approaches, the cost of fault localization is reduced by more than 25%.

Test Case Prioritization Technique in Continuous Integration Based on Reinforcement Learning

ZHAO Yi-Fan , HAO Dan

2023, 34(6):2708-2726. DOI: 10.13328/j.cnki.jos.006506

Abstract (645) HTML (605) PDF 6.11 M (1758) Comment (0) Favorites

Abstract:As software delivery increasingly emphasizes speed and reliability, continuous integration (CI) has attracted more and more attention these years. Developers continue to integrate working copies into the mainline to realize software evolution. Each integration involves automated tests to verify whether the update introduces faults. However, as the scale of software increases, test suites contain more and more test cases. As software evolves, the coverage and fault detection ability of test cases also change among different CI cycles. As a result, the traditional test case prioritization techniques may be inapplicable. Techniques based on reinforcement learning can adjust prioritization strategies dynamically according to test feedback. But the existing techniques based on reinforcement learning proposed in recent years do not comprehensively consider information in the test suite during prioritization, which limits their effectiveness. This study proposes a new test case prioritization method in CI, called pointer ranking method. The method uses features like history information of test cases as inputs. In each CI cycle, the agent uses the attention mechanism to gain attention to all candidate test cases, and then obtains a prioritization result. After test execution, it obtains the updating direction from the feedback. It constantly adjusts its prioritization strategy in the process “prioritization, test execution, test feedback” and finally achieves satisfied prioritization performance. This study verifies the effectiveness of the proposed method on five large-scale datasets, and explores the impact of history length on method performance. Besides, it explores the model’s effectiveness on datasets which only contain regression test cases and the model’s execution efficiency. Finally, the study comes to the following conclusions. First, compared to existing techniques, pointer ranking method can adjust its strategy along with the evolution of the software, and effectively enhance the fault detection ability of test sequence in CI. Second, pointer ranking method has good robustness to history length. A small amount of history information can make it achieve the optimal performance. Third, pointer ranking method can handle regression test cases and newly-added test cases well. Finally, pointer ranking method has little time overhead. Considering its better and more stable prioritization performance, pointer ranking method is a very competitive method.

TWE-NMF Topic Model-based Approach for Mashup Service Clustering

LU Jia-Wei , ZHAO Wei , ZHANG Yuan-Ming , LIANG Qian-Hui , XIAO Gang

2023, 34(6):2727-2748. DOI: 10.13328/j.cnki.jos.006508

Abstract (472) HTML (674) PDF 12.19 M (1487) Comment (0) Favorites

Abstract:With the development of the Internet and service-oriented technology, a new type of Web application—Mashup service, began to become popular on the Internet and grow rapidly. How to find high-quality services among large number of Mashup services has become a focus of attention. It has been shown that finding and clustering services with similar functions can effectively improve the accuracy and efficiency of service discovery. At present, current methods mainly focus on mining the hidden functional information in the Mashup service, and use specific clustering algorithms such as K-means for clustering. However, Mashup service documents are usually short texts. Traditional mining algorithms such as LDA are difficult to represent short texts and find satisfied clustering effects from them. In order to solve this problem, this study proposes a non-negative matrix factorization combining tags and word embedding (TWE-NMF) model to discover topics for the Mashup services. This method firstly normalizes the Mashup service, then uses a Dirichlet process multinomial mixture model based on improved Gibbs sampling to automatically estimate the number of topics. Next, it combines the word embedding and service tag information with non-negative matrix factorization to calculate Mashup topic features. Moreover, a spectral clustering algorithm is used to perform Mashup service clustering. Finally, the performance of the method is comprehensively evaluated. Compared with the existing service clustering method, the experimental results show that the proposed method has a significant improvement in the evaluation indicators such as precision, recall, F-measure, purity, and entropy.

Network Representation Learning Model Integrating Accompanying Information

DU Hang-Yuan , WANG Wen-Jian , BAI Liang

2023, 34(6):2749-2764. DOI: 10.13328/j.cnki.jos.006486

Abstract (550) HTML (656) PDF 5.38 M (1589) Comment (0) Favorites

Abstract:Network representation learning is regarded as a key technology for improving the efficiency of information network analysis. It maps network nodes to low-dimensional vectors in a latent space and maintains the structure and characteristics of the original network in these vectors efficiently. In recent years, many studies focus on exploring network topology and node features intensively, and the application bears fruit in many network analysis tasks. In fact, besides these two kinds of key information, the accompanying information widely existing in the network reflects various complex relationships and plays an important role in the network’s construction and evolution. In order to improve the efficiency of network representation learning, a novel model integrating the accompanying information is proposed with the name NRLIAI. The model employs the variational auto-encoders (VAE) to propagate and process information. In addition, it aggregates and maps network topology and node features by graph convolutional operators in the encoder, reconstructs the network in the decoder, and integrates the accompanying information to guide the network representation learning. Furthermore, the proposed model solves the problem that the existing methods fail to utilize the accompanying information effectively. At the same time, the model possesses a generative ability, which enables it to reduce the overfitting problem in the learning process. With several real-world network datasets, this study conducts extensive comparative experiments on the existing methods of NRLIAL through node classification and link prediction tasks, and the experimental results have proved the feasibility of the proposed model.

Learnable Weighting Mechanism in Model-based Reinforcement Learning

HUANG Wen-Zhen , YIN Qi-Yue , ZHANG Jun-Ge , HUANG Kai-Qi

2023, 34(6):2765-2775. DOI: 10.13328/j.cnki.jos.006489

Abstract (527) HTML (491) PDF 6.55 M (1672) Comment (0) Favorites

Abstract:Model-based reinforcement learning methods train a model to simulate the environment by using the collected samples and utilize the imaginary samples generated by the model to optimize the policy, thus they have potential to improve sample efficiency. Nevertheless, due to the shortage of training samples, the environment model is often inaccurate, and the imaginary samples generated by it would be deleterious for the training process. For this reason, a learnable weighting mechanism is proposed which can reduce the negative effect on the training process by weighting the generated samples. The effect of the imaginary samples on the training process is quantified through calculating the difference between the losses on the real samples before and after updating value and policy networks by the imaginary samples. The experimental results show that the reinforcement learning algorithm using the weighting mechanism is superior to existing model-based and model-free algorithms in multiple tasks.

Leveraging Spatial-semantic Information in Object Detection and Segmentation

GUO Qi-Zhou , YUAN Chun

2023, 34(6):2776-2788. DOI: 10.13328/j.cnki.jos.006509

Abstract (709) HTML (644) PDF 5.91 M (1635) Comment (0) Favorites

Abstract:High quality feature representation can boost performance for object detection and other computer vision tasks. Modern object detectors resort to versatile feature pyramids to enrich the representation power but neglect that different fusing operations should be used for pathways of different directions to meet their different needs of information flow. This study proposes separated spatial semantic fusion (SSSF) that uses a channel attention block (CAB) in top-down pathway to pass semantic information and a spatial attention block (SAB) with a bottleneck structure in the bottom-up pathway to pass precise location signals to the top level with fewer parameters and less computation (compared with plain spatial attention without dimension reduction). SSSF is effective and has a great generality ability: It improves AP over 1.3% for object detection, about 0.8% over plain addition for fusing operation of the top-down path for semantic segmentation, and boost the instance segmentation performance in all metrics for both bounding box AP and mask AP.

Double Layer Index for Continuous k-nearest Neighbor Queries on Moving Objects

HAN Shi-Yuan , HE Qing , YU Zi-Qiang , TONG Xiang-Rong , ZHENG Bo-Long

2023, 34(6):2789-2803. DOI: 10.13328/j.cnki.jos.006492

Abstract (892) HTML (738) PDF 10.97 M (2094) Comment (0) Favorites

Abstract:For a given set of moving objects, continuous k-nearest neighbor (CKNN) query q over moving objects is to quickly identify and monitor the k-nearest objects as objects and the query point evolve. In real life, many location-based applications in transportation, social network, e-commerce, and other fields involve the basic problem of processing CKNN queries over moving objects. Most of existing work processing CKNN queries usually needs to determine a query range containing k-nearest neighbors through multiple iterations, while each iteration has to identify the number of objects in the current query range, and which dominates the query cost. In order to address this issue, this study proposes a dual index called GGI that consists of a grid index and a Gaussian mixture function to simulate the varying distribution of objects. The bottom layer of GGI employs a grid index to maintain moving objects, and the upper layer constructs Gaussian mixture model to simulate the distribution of moving objects in two-dimensional space. Based on GGI, an incremental search algorithm called IS-CKNN to process CKNN queries. This algorithm directly determines a query region that at least contains k neighbors of q based on Gaussian mixture model, which greatly reduces the number of iterations. When the objects and query point evolve, an efficient incremental query strategy is further proposed, which can maximize the use of existing query results and reduce the calculation of the current query. Finally, extensive experiments are carried out on one real dataset and two synthetic datasets to confirm the superiority of the proposed proposal.

Multi-stage Method for Online Vertical Data Partitioning Based on Spectral Clustering

LIU Peng-Ju , LI Hao-Yang , WANG Tian-Yi , LIU Huan , SUN Lu-Ming , REN Yi-Fei , LI Cui-Ping , CHEN Hong

2023, 34(6):2804-2832. DOI: 10.13328/j.cnki.jos.006496

Abstract (469) HTML (521) PDF 15.65 M (1641) Comment (0) Favorites

Abstract:Vertical data partitioning technology logically stores database table attributes satisfying certain semantic conditions in the same physical block, so as to reduce the cost of data accessing and improve the efficiency of query processing. Every query is usually related only to the table’s some attributes in the database, so a subset of the table’s attributes can be used to get the accurate query results. Reasonable vertical data partitioning can make most queries answered without scanning the whole table, so as to reduce the amount of data accessing and improve the efficiency of query processing. Traditional database vertical partitioning methods are mainly based on heuristic rules set by experts. The granularity of partitioning is coarse, and it can not provide different partition optimizations according to the characteristics of workload. Besides, when the scale of workload or the number of attributes becomes large, the execution time of the existing methods are too long to meet the performance requirements of online real-time tuning of database. Therefore, a method called spectral clustering based vertical partitioning (SCVP) is proposed for the online environment. The idea of phased solution is adapted to reduce the time complexity of the algorithm and speed up partitioning. Firstly, SCVP reduces the solution space by increasing the constraint conditions, that is, generating initial partitions by spectral clustering. Secondly, SCVP designs the algorithm to search solution space, that is, the initial partitions are optimized by combining frequent itemset mining and greedy search. In order to further improve the performance of SCVP under high-dimensional attributes, a new method called special clustering based vertical partitioning redesign (SCVP-R) is proposed which is an improved version of SCVP. SCVP-R optimizes the partitions combiner component of SCVP by introducing sympatric-competition mechanism, double-elimination mechanism, and loop mechanism. The experimental results on different datasets show that SCVP and SCVP-R have faster execution time and better performance than the current state-of-the-art vertical partitioning method.

Survey on Security and Privacy of Federated Learning Models

GU Yu-Hao , BAI Yue-Bin

2023, 34(6):2833-2864. DOI: 10.13328/j.cnki.jos.006658

Abstract (2606) HTML (3034) PDF 5.60 M (5670) Comment (0) Favorites

Abstract:As data silos emerge and importance is attached to personal privacy protection, the application modes of centralized learning are restricted, whereas federated learning has attracted great attention since it appeared owing to the fact that it, as a distributed machine learning framework, can accomplish model training without leaking users’ data. As federated learning is increasingly widely applied, its security and privacy protection capability have also begun to be questioned. This study offers a systematic summary and analysis of the research achievements domestic and foreign researchers have made in recent years in the security and privacy of federated learning models. Specifically, this study outlines the background of federated learning, clarifies its definition and workflow, and analyzes its vulnerabilities. Then, the security threats and privacy risks against federated learning are systematically analyzed and compared respectively, and the existing defense methods are summarized. Finally, the prospects of this research area and the challenges ahead are presented.

Adaptive Detection Path Configuration for In-band Network Telemetry in SDN Based on Graph Segmentation

YUAN Peng-Yi , WANG Miao , WANG Ling-Hao , ZHANG Yu-Jun , ZHOU Ji-Hua

2023, 34(6):2865-2877. DOI: 10.13328/j.cnki.jos.006494

Abstract (583) HTML (517) PDF 7.67 M (1558) Comment (0) Favorites

Abstract:Software-defined network (SDN) is a new network architecture that separates the control and forwarding planes. It can schedule and optimize network resources based on global information. Nevertheless, precise scheduling requires accurate measurement of information on the entire network (including the status of all switching devices in the network and all link information in the topology). In-band network telemetry (INT) can realize the collection of relevant information while forwarding data packets, and configuration of detection paths which cover the entire network is one of the key issues to be solved for INT. However, existing detection path configuration methods for INT have the following problems. (1) The deployment of a large number of detection nodes is required in advance, which leads to increased maintenance overhead. (2) The detection path is too long, which results in the length of detection packet exceeding the MTU value in the network. (3) The redundant detection paths cause the traffic load introduced by the measurement to account for too much of the overall network traffic. (4) The recovery time of the detection path adjustment under the dynamically changing topology is too long. In order to solve the above problems, an adaptive detection path configuration method for in-band network telemetry in SDN based on graph segmentation (ACGS) is proposed. The basic idea is to divide the network topology with the graph segmentation to restrict the length of detection path by controlling the topology scale, solve the Euler circuit in the divided subgraph to obtain a detection path that only traverses the directed edges in the subgraph once, to avoid the problems of too many detection nodes and high detection path redundancy; and use the combination of local adjustment and global adjustment to solve the problem of long recovery time of the detection path when the topology changes dynamically. The experimental results prove that the ACGS method can realize the INT detection path configuration in SDN with moderate detection path length, fewer detection nodes, lower detection path redundancy, and faster adjustment under the dynamically changing topology.

Layered Quantum Key Distribution Protocol with a Small Number of Participants

YAN Chen-Hong , LI Zhi-Hui , LIU Lu , HAN Zhao-Wei

2023, 34(6):2878-2891. DOI: 10.13328/j.cnki.jos.006504

Abstract (402) HTML (818) PDF 7.67 M (1220) Comment (0) Favorites

Abstract:Layered key structure plays an important role in quantum communication, in addition to using EPR and GHZ states to achieve layered quantum key distribution, asymmetric high dimensional multi-particle entanglement also provides a new idea for solving layered quantum key distribution. This method is more efficient in the number of quantum channel uses than the conventional quantum key distribution using bipartite links. This study introduces five layered key structures for three users, and gives a partitionable key structure for 4 and 5 users. For the various layered key structures introduced in this study, the above two methods are compared to get the protocol with the highest idealized key rate of each key structure. When the quantum network has more than three users and the key structure can be partitioned, it is proved that the idealized key rate of each layer can be 1 by using the EPR and GHZ states. Finally, the partitionable key structure of four and five users is taken as an example to expand the description.

Subversion Attack and Improvement of ECDSA Signature Scheme

YAN Du-Li , YU Yong , LI Yan-Nan , LI Hui-Lin , ZHAO Yan-Qi , TIAN Ai-Kui

2023, 34(6):2892-2905. DOI: 10.13328/j.cnki.jos.006516

Abstract (792) HTML (711) PDF 9.36 M (1688) Comment (0) Favorites

Abstract:The Snowden incident revealed the fact that certain cryptosystems were indeed subverted. Elliptic curve digital signature algorithm (ECDSA) has been widely used due to its short signature length advantage under the same security level, for example, signing bitcoin transactions. However, whether the ECDSA can be subverted and how to resist this attack remain a challenge. This study answers this question positively. Firstly, it is shown that how to use a pseudorandom function (PRF) to calculate a random value to replace the randomness used in the ECDSA. The subverted ECDSA enables an adversary to extract signing private key by obtaining at most three consecutive signatures. Secondly, the hash value of private key, message, and the random signature component are used as the second random number to improve the ECDSA scheme, and as a result, the signature scheme against subversion-resistant attack is proposed. Even an adversary replaces the component of the new signature algorithm, it cannot extract any information of the signing key. Finally, the proposed algorithm and existing algorithm are implemented, and the implementation demonstrates that the proposed scheme has advantages in terms of computational complexity and efficiency.

Grid Dividing for Single-stage Instance Segmentation

WANG Wen-Hai , LI Zhi-Qi , LU Tong

2023, 34(6):2906-2921. DOI: 10.13328/j.cnki.jos.006493

Abstract (520) HTML (729) PDF 6.82 M (1420) Comment (0) Favorites

Abstract:In recent years, single-stage instance segmentation methods have made preliminary progress in real-world applications due to their high efficiency, but there are still two drawbacks compared to two-stage counterparts. (1) Low accuracy: the single-stage method does not have multiple rounds of refinement, so its accuracy is some distance away from real-world applications; (2) Low flexibility: most existing single-stage methods are specifically designed models, which are not compatible with object detectors. This study presents an accurate and flexible framework for single-stage instance segmentation, which contains the following two key designs. (1) To improve the accuracy of instance segmentation, a grid dividing binarization algorithm is proposed, where the bounding box region is firstly divided into several grid cells and then instance segmentation is performed on each grid cell. In this way, the original full-object segmentation task is simplified into the sub-tasks of grid cells, which significantly reduces the complexity of feature representation and further improves the instance segmentation accuracy; (2) To be compatible with object detectors, a plug-and-play module is designed, which can be seamlessly plugged into most existing object detection methods, thus enabling them to perform instance segmentation. The proposed method achieves excellent performance on the public dataset, such as MS COCO. It outperforms most existing single-stage methods and even some two-stage methods.

Semantic Relationships Guided Facial Action Unit Analysis

LI Guan-Bin , ZHANG Rui-Fei , ZHU Xin , LIN Liang

2023, 34(6):2922-2941. DOI: 10.13328/j.cnki.jos.006497

Abstract (541) HTML (521) PDF 12.03 M (1540) Comment (0) Favorites

Abstract:The main purpose of facial action unit analysis is to identify the state of each facial action unit, which can be applied to many scenarios such as lie detection, autonomous driving, intelligent medical, and others. In recent years, with the popularization of deep learning in the field of computer vision, facial action unit analysis has attracted extensive attention. Face action unit analysis can be divided into two different tasks: face action unit recognition and face action unit intensity estimation. However, the existing studies usually only address one of the problems. More importantly, these methods usually only focus on designing or learning complex feature representations, but ignore the semantic correlation between facial action units. Actually, facial action units often have strong interrelationships. How to effectively use semantic knowledge for learning and reasoning is the key to facial action unit analysis tasks. This study explores to model the semantic relationship of facial action units by analyzing the symbiosis and mutual exclusion of AUs in various facial behaviors and organize the facial AUs in the form of structured knowledge-graph, and then propose an AU semantic relationship embedded representation learning (SRERL) framework. The experiments are conducted on three benchmarks: BP4D, DISFA, and FERA2015 for both facial action unit analysis tasks. The experimental results show that the proposed method outperforms the previous work and achieves state-of-the-art performance. Furthermore, the experiments are also conducted on the BP4D+ dataset and occlusion evaluation is performed on the BP4D dataset to demonstrate the outstanding generalization and robustness of proposed method.

Two-scale Real Image Blind Denoising with Self-supervised Constraints

WANG Di , PAN Jin-Shan , TANG Jin-Hui

2023, 34(6):2942-2958. DOI: 10.13328/j.cnki.jos.006512

Abstract (571) HTML (760) PDF 7.91 M (1566) Comment (0) Favorites

Abstract:Existing image denoising algorithms have achieved decent performance on the images with the additive white Gaussian noise (AWGN), while their generalization ability is not good on real-world images with unknown noise. Motivated by the significant progress of deep convolution neural networks (CNNs) in image denoising, a novel two-scale blind image denoising algorithm is proposed based on self-supervised constraints. Firstly, the proposed algorithm leverages the denoised results from the small-scale network branch to provide additional useful information for the large-scale image denoising, so as to achieve favorable denoised results. Secondly, the used network is composed of a noise estimation subnetwork and an image non-blind denoising subnetwork. The noise estimation subnetwork is firstly used to predict noise map, and then image denoising is carried out through the non-blind denoising subnetwork under the guidance of the corresponding noise map. In view of the fact that the real noise image lacks the corresponding clean image as the label, an edge-preserving self-supervised constraint is proposed based on the total variation (TV) priori, which generalizes the network to different real noisy datasets by adjusting smoothing parameters. To keep the consistency of the image background, a background guidance module (BGM) is proposed, which builds a self-supervised constraint based on the information difference between multi-scale Gaussian blurred images and thus assists the network to complete image denoising. In addition, the structural similarity attention mechanism (SAM) is proposed to guide the network to pay attention to trivial structural details in images, so as to recover real denoised images with cleaner texture details. The relevant experimental results on the SIDD, DND, and Nam benchmarks indicate that the proposed self-supervised blind denoising algorithm is superior to some deep supervised denoising methods, and the generalization performance of the network is improved significantly.

Research on Trusted Startup of Virtual Machine Based on Non-interference Theory

HUANG Hao-Xiang , ZHANG Jian-Biao , YUAN Yi-Lin , WANG Xiao

2023, 34(6):2959-2978. DOI: 10.13328/j.cnki.jos.006507

Abstract (418) HTML (971) PDF 7.64 M (1531) Comment (0) Favorites

Abstract:As a new type of high-value computing system, cloud computing has been widely used in various industries fields. Classified protection 2.0 also puts forward the requirement of dynamic trust verification for its application of active immune trusted computing technology. In the cloud computing mode, the virtual machine is the direct carrier for users to use cloud services, and its trusted startup is the basis for the trustworthiness of the virtual machine operating environment. However, since the virtual machine runs on the physical node in the form of process, its characteristics of startup process are high dynamic and unexpected interference between multiple virtual machine domains. But the existing trusted startup schemes of virtual machine have problems such as insufficient dynamic protection during virtual machine startup process and lack of elimination of unexpected interference between multiple virtual domains. To solve the above problems, this study proposes a scheme that research on trusted startup of virtual machine based on non-interference theory. Firstly, based on the non-interference theory, the run-time trusted theorem of virtual machine process is proposed. In addition, the definition of trusted launch of virtual machine is given and the judgement theorem of trusted boot of virtual machine is well proved. Then, according to the trusted startup theorem of virtual machine, the monitoring and control logic is designed based on system call, and the virtual machine startup process is actively measured and controlled. Finally, the experimental evaluation shows that the proposed scheme can effectively eliminate the unexpected interference between multiple virtual machines in complex cloud environment, ensure the dynamic credibility of virtual machine startup process, and greatly reduce the performance overhead.

微信服务号

微信订阅号

>Special Issue's Articles

>Review Articles

>Special Issue's Articles

>Review Articles

Current Issue

Volume

Issue