• Volume 36,Issue 5,2025 Table of Contents
    Select All
    Display Type: |
    • Compositional Probabilistic Barrier Certificate Generation Based on PAC Learning

      2025, 36(5):1907-1923. DOI: 10.13328/j.cnki.jos.007176 CSTR: 32375.14.jos.007176

      Abstract (418) HTML (118) PDF 6.74 K (950) Comment (0) Favorites

      Abstract:Continuous dynamical systems safety verification is an important research issue, and over the years, various verification methods have been very limited in the scale of the problems they can handle. For a given continuous dynamical system, this study proposes an algorithm to generate a set of compositional probably approximately correct (PAC) barrier certificates through a counterexample-guided approach. A formal description of the infinite-time domain safety verification problem is given in terms of probability and statistics. By establishing and solving a mixed-integer programming method based on the big-M method, the barrier certificate problem is transformed into a constrained optimization problem. Nonlinear inequalities are linearized in intervals using the mean value theorem of differentiation. Finally, this study implements the compositional PAC barrier certificate generator CPBC and evaluates its performance on 11 benchmark systems. The experimental results show that CPBC can successfully verify the safety of each dynamical system under specified different safety requirement thresholds. Compared with existing methods, the proposed method can more efficiently generate reliable probabilistic barrier certificates for complex or high-dimensional systems, with the verified example scale reaching up to hundreds of dimensions.

    • Software Traceability Recovery Framework Based on Active Learning and Semi-supervised Learning

      2025, 36(5):1924-1948. DOI: 10.13328/j.cnki.jos.007178 CSTR: 32375.14.jos.007178

      Abstract (303) HTML (94) PDF 6.76 K (1580) Comment (0) Favorites

      Abstract:Software traceability is considered critical to trustworthy software engineering, ensuring software reliability through the tracking of the software development process. Despite significant progress in automatic software traceability recovery techniques in recent years, their application in real-world commercial software projects does not meet expectations. An investigation into the application of learning-based software traceability recovery classifier models in commercial software projects is conducted. It uncovers three critical challenges faced in industrial settings. These challenges contribute to underperforming traceability models: low-quality raw data, data sparsity, and class imbalance. In response to these challenges, STRACE(AL+SSL) is proposed. It is a software traceability recovery framework that integrates active learning and semi-supervised learning. By strategically selecting valuable annotated samples and generating high-quality pseudo-labeled samples, STRACE(AL+SSL) effectively harnesses unlabeled data to address data-related challenges. Multiple comparative experiments are conducted with nearly one million issue-commit trace pair samples from 10 different enterprise projects. The results of these experiments validate the effectiveness of the proposed framework for real-world software traceability recovery tasks. The ablation results show that the unlabeled samples selected by the active learning in STRACE(AL+SSL) play a crucial role in the traceability recovery task. Additionally, the optimal combination of sample selection strategies in STRACE(AL+SSL) is confirmed. This includes CBST-Adjust for the semi-supervised sample rebalancing strategy and SMI_Flqmi, which is recognized for its cost-effectiveness and efficiency in active learning.

    • Web API Recommendation by Fusing Latent Related Words and Heterogeneous Association Compatibility

      2025, 36(5):1949-1973. DOI: 10.13328/j.cnki.jos.007185 CSTR: 32375.14.jos.007185

      Abstract (231) HTML (71) PDF 6.70 K (1418) Comment (0) Favorites

      Abstract:The service descriptions provide limited information about application scenarios, creating a gap between Mashup service component Web API recommendations based on functional similarity calculation and desired expectations. Consequently, there is a need to enhance the accuracy of function matching. While some researchers utilize collaborative associations among Web APIs to enhance recommendation compatibility, they overlook the adverse effects of functional associations on Mashup service creation, thereby limiting the enhancement of recommendation diversity. To address this issue, this study proposes a Web API recommendation method for Mashup service components that integrates latent related words and heterogeneous association compatibility. The study extracts latent related words associated with application scenarios for both Mashup requirements and Web APIs, integrating them into the generation of function vectors. By enhancing the accuracy of functional similarity matching, it obtains a high-quality candidate set of Web API components. Function association and collaboration association are modeled as heterogeneous service association. The study utilizes heterogeneous association compatibility to replace collaboration compatibility in traditional methods, thus enhancing the recommendation diversity of Web APIs. In comparison, the proposed approach demonstrates improvements in evaluation indicators, with Recall, Precision, and NDCG enhanced by 4.17% to 16.05%, 4.46% to 16.62%, and 5.57% to 17.26%, respectively. Additionally, the diversity index ILS is reduced by 8.22% to 15.23%. The Recall and Precision values for cold-start Web API recommendation are 47.71% and 46.58% of those for non-cold-start Web API recommendation, respectively. Experimental results demonstrate that the proposed method not only enhances the quality of Web API recommendation but also yields favorable results for cold-start Web API recommendations.

    • Measurement Study on Performance of Serverless Platforms

      2025, 36(5):1974-2005. DOI: 10.13328/j.cnki.jos.007191 CSTR: 32375.14.jos.007191

      Abstract (290) HTML (63) PDF 6.75 K (1803) Comment (0) Favorites

      Abstract:Serverless computing is an emerging paradigm of cloud computing, allowing developers to focus only on application logic development without the need to manage complex underlying tasks. This paradigm allows developers to quickly build smaller-granularity applications, the one at the function level. With the increasing popularity of serverless computing, major cloud computing vendors have introduced their commercial serverless platforms one after another. However, the characteristics of these platforms have yet to be systematically studied and reliably compared. A comprehensive analysis of these characteristics can help developers choose an appropriate serverless platform while developing and executing serverless applications in the right way. To this end, an empirical study is conducted on the characteristics of mainstream commercial serverless platforms. This study involves such mainstream serverless platforms as AWS Lambda, Google Cloud Functions, Microsoft Azure Functions, and Alibaba Function Compute. This study is divided into two major parts: feature summarization and runtime performance analysis. In the feature summarization, the official documents of these serverless platforms are discussed and their key features are summarized and compared in terms of development, deployment, and runtime. In the runtime performance analysis, representative benchmarks are applied to analyze the runtime performance offered by these serverless platforms on a multidimensional basis. Specifically, key factors for the cold-start performance of the applications are first analyzed, such as programming languages and memory sizes. Furthermore, the tasks-executing performance of serverless platforms is discussed. Based on the results of feature summarization and runtime performance analysis, this study sums up a series of findings and provides practical insights and potential research opportunities for developers, cloud computing vendors, and researchers.

    • Multi-granularity Fusion Android Test Sequence Reduction Based on Event Labeling

      2025, 36(5):2006-2025. DOI: 10.13328/j.cnki.jos.007203 CSTR: 32375.14.jos.007203

      Abstract (194) HTML (86) PDF 6.76 K (1574) Comment (0) Favorites

      Abstract:As too many redundant events included in crash test sequences generated by Android automated test tools may result in test replay, defect comprehension, and repairing difficulty, a great number of test sequence reduction works have been proposed. While current works only focus on the application interface changes and ignore the internal state changes during program execution. Moreover, current works only model application states at a single and abstract granularity, such as control widget granularity or activity granularity, resulting in long test sequences after reduction or inefficient reduction. This study proposes an Android test sequence reduction method combined with multi-granularity based on event labeling. By taking into account the Android lifecycle management mechanism and data flow analysis to label critical events that trigger crashes, this method can narrow the sequence reduction space and design a strategy of rough selection under low granularity and detailed reduction under high granularity. At last, a crash test sequence set containing complex scenarios such as inter-application interaction and user input is collected, and the comparison with other test sequence reduction works on this set verifies the effectiveness of the method proposed in this study.

    • Test Case Generation for Path Coverage of MPI Program Guided by Surrogate-assisted Multi-task Evolutionary Optimization

      2025, 36(5):2026-2042. DOI: 10.13328/j.cnki.jos.007204 CSTR: 32375.14.jos.007204

      Abstract (190) HTML (67) PDF 6.73 K (1672) Comment (0) Favorites

      Abstract:During the path coverage testing of a message passing interface (MPI) program based on evolutionary optimization, the fitness of evolutionary individuals needs to be evaluated by repeatedly executing the MPI program. However, repeated execution of an MPI program often requires high computational costs. Therefore, this study proposes an approach to generate test cases for path coverage of MPI programs guided by surrogate-assisted multi-task evolutionary optimization, which significantly reduces the actual execution times of MPI programs, thereby improving testing efficiency. Firstly, surrogate models are trained for each target sub-path in the target path of an MPI program. Then, the fitness of evolutionary individuals is estimated using the surrogate model corresponding to each target sub-path, and a candidate set of test cases is formed. Finally, all surrogate models are updated based on the candidate set and the actual fitness for each target sub-path. The proposed approach is applied to the basis path coverage testing of seven benchmark MPI programs and compared with several state-of-the-art approaches. The experimental results show that the proposed approach significantly improves testing efficiency while ensuring high effectiveness in generating test cases.

    • Interaction Prediction of Multi-granularity Software System Based on Graph Neural Network

      2025, 36(5):2043-2063. DOI: 10.13328/j.cnki.jos.007207 CSTR: 32375.14.jos.007207

      Abstract (316) HTML (113) PDF 6.74 K (2010) Comment (0) Favorites

      Abstract:The interactions between elements in contemporary software systems are notably intricate, encompassing relationships between packages, classes, and functions. Accurate comprehension of these relationships is pivotal for optimizing system structures and enhancing software quality. Analyzing inter-package relationships can help unveil dependencies between modules, thereby assisting developers in more effectively managing and organizing software architectures. On the other hand, a clear understanding of inter-class relationships contributes to the creation of code repositories that are more scalable and maintainable. Moreover, a clear understanding of inter-function relationships facilitates rapid identification and resolution of logical errors within programs, consequently enhancing the robustness and reliability of the software. However, current predictions of software system interaction confront challenges such as granularity disparities, inadequate features, and version changes. To address this challenge, this study constructs corresponding software network models based on the three granularities, including software packages, classes, and functions. It introduces a novel approach combining local and global features to reinforce the analysis and prediction of software systems through feature extraction and link prediction of software networks. This approach is based on the construction and handling of software networks, involving specific steps such as leveraging the node2vec method to learn local features of software networks and combining Laplacian feature vector encoding to comprehensively represent the global positional information of nodes. Subsequently, the Graph Transformer model is employed to further optimize the feature vectors of node attributes, culminating in the completion of the interaction prediction task of the software system. Extensive experimental validations are conducted on three Java open-source projects, encompassing within-version and cross-version interaction prediction tasks. The experimental results demonstrate that, compared to benchmark methods, the proposed approach achieves an average increase of 8.2% and 8.5% in AUC and AP values, respectively in within-version prediction tasks. This approach reaches an average rise of 3.5% and 2.4% in AUC and AP values, respectively, in cross-version prediction tasks.

    • Gaussian Mixture Based Hierarchical Auto-encoder Model for Sentiment Drift Detection

      2025, 36(5):2064-2078. DOI: 10.13328/j.cnki.jos.007180 CSTR: 32375.14.jos.007180

      Abstract (133) HTML (43) PDF 6.73 K (1494) Comment (0) Favorites

      Abstract:The most significant feature of social network sentiment data is its dynamic nature. Tackling public sentiment drift analysis, this study proposes a Gaussian mixture based hierarchical variational auto-encoder (GHVAE) model for detecting sentiment drifts. Specifically, the GHVAE applies Gaussian mixture distribution as the prior assumption of latent distribution, which corresponds to the multi-center of the latent distribution property to improve model performances. Moreover, the built-in drift measurement algorithm in the original HVAE model is revised to enlarge the distances among big drift scores and improve the classification performance. Several contrast and ablation experiments are implemented to validate the performance of the GHVAE. The results indicate the novelties of the GHVAE bring improvement in sentiment drift detection.

    • Group-wise Contrastive Learning Based Sequence-aware Skill Discovery

      2025, 36(5):2079-2093. DOI: 10.13328/j.cnki.jos.007184 CSTR: 32375.14.jos.007184

      Abstract (684) HTML (52) PDF 6.74 K (796) Comment (0) Favorites

      Abstract:Reinforcement learning has achieved remarkable results in decision-making tasks like intelligent dialogue systems, yet its efficiency diminishes notably in scenarios with intricate structures and scarce rewards. Researchers have integrated the skill discovery framework into reinforcement learning, aiming to maximize skill disparities to establish policies and boost agent performance in such tasks. However, the constraint posed by the limited diversity of sampled trajectory data confines existing skill discovery methods to learning a single skill per reinforcement learning episode. Consequently, this limitation results in subpar performance in complex tasks requiring sequential skill combinations within a single episode. To address this challenge, a group-wise contrastive learning based sequence-aware skill discovery method (GCSSD) is proposed, which integrates contrastive learning into the skill discovery framework. Initially, to augment trajectory data diversity, the complete trajectories interacting with the environment are segmented and grouped, employing contrastive loss to learn skill embedding representations from grouped trajectories. Subsequently, skill policy training is conducted by combining the skill embedding representation with reinforcement learning. Lastly, to enhance performance in tasks featuring diverse sequential skill combinations, the sampled trajectories are segmented into skill representations and embedded into the learned policy network, facilitating the sequential combination of learned skill policies. Experimental results demonstrate the efficacy of the GCSSD method in tasks characterized by sparse rewards and sequential skill combinations, showcasing its capability to swiftly adapt to tasks with varying sequential skill combinations using learned skills.

    • Multi-modal Self-supervised Point Cloud Representation Learning Based on Bidirectional Fit Mask Reconstruction

      2025, 36(5):2094-2113. DOI: 10.13328/j.cnki.jos.007187 CSTR: 32375.14.jos.007187

      Abstract (168) HTML (62) PDF 6.73 K (1220) Comment (0) Favorites

      Abstract:Point cloud self-supervised representation learning is conducted in an unlabeled pre-training manner, exploring the structural relationships of 3D topological geometric spaces and capturing feature representations. This approach can be applied to downstream tasks, such as point cloud classification, segmentation, and object detection. To enhance the generalization and robustness of the pretrained models, this study proposes a multi-modal self-supervised method for learning point cloud representations. The method is based on bidirectional fit mask reconstruction and comprises three main components: (1) The “bad teacher” model, guided by the inverse density scale, employs a bidirectional fit strategy that utilizes inverse density noise representation and global feature representation to expedite the convergence of the mask region towards the true value. (2) The StyleGAN-based auxiliary point cloud generation model, grounded in local geometric information, generates stylized point clouds and fuses them with mask reconstruction results while adhering to threshold constraints. The objective is to mitigate the adverse effects of noise on representation learning during the reconstruction process. (3) The multi-modal teacher model aims to enhance the diversity of the 3D feature space and prevent the collapse of modal information. It relies on the triple feature contrast loss function to fully extract the latent information contained in the point cloud-image-text sample space. The proposed method is evaluated on ModelNet, ScanObjectNN, and ShapeNet datasets for fine-tuning tasks. Experimental results demonstrate that the pretrained model achieves state-of-the-art performance in various point cloud recognition tasks, including point cloud classification, linear support vector machine classification, few-shot classification, zero-shot classification, and part segmentation.

    • Label Screening Method for Generalization and Robustness Trade-off in Convolutional Neural Network

      2025, 36(5):2114-2129. DOI: 10.13328/j.cnki.jos.007188 CSTR: 32375.14.jos.007188

      Abstract (163) HTML (64) PDF 6.75 K (1598) Comment (0) Favorites

      Abstract:Although convolutional neural networks (CNNs) are widely used in image recognition due to their excellent generalization performance, adversarial samples contaminated by noise can easily deceive fully trained network models, posing security risks. Many existing defense methods improve the robustness of models, but most inevitably sacrifice model generalization. To alleviate this issue, a label-filtered weight parameter regularization method is proposed to balance the generalization and robustness of models using the label information of samples during model training. Many previous robust model training methods suffer from two main issues: 1) The robustness of models is mainly enhanced by increasing the quantity or complexity of training set samples, which not only diminishes the dominant role of clean samples in model training but also significantly increases the workload of training tasks. 2) The label information of samples is used only to compare with model predictions to control the direction of model parameter updates, neglecting the additional information hidden in sample labels. The proposed method selects weight parameters that play a decisive role in classifying samples by filtering the correct labels of samples and the classification labels of adversarial samples and optimizes these parameters regularly to achieve a balance between model generalization and robustness. Experiments and analysis on the MNIST, CIFAR-10, and CIFAR-100 datasets demonstrate that the proposed method achieves good training results.

    • Denoising Graph Auto-encoder for Unsupervised Social Media Text Summarization

      2025, 36(5):2130-2150. DOI: 10.13328/j.cnki.jos.007199 CSTR: 32375.14.jos.007199

      Abstract (230) HTML (59) PDF 6.73 K (1663) Comment (0) Favorites

      Abstract:Social media text summarization aims to provide concise summaries for large-scale social media short texts (referred to as posts) targeting specific topics. Given the brief and informal contents of posts, traditional methods confront the challenges of sparse features and insufficient information. Recent research endeavors have leveraged social relationships among posts to refine post contents and remove redundant information, but these efforts neglect the presence of unreliable noise relationships in real social media contexts, leading to erroneous assessments of post importance and diversity. Therefore, this study proposes a novel unsupervised model DSNSum, which improves summarization performance by removing noise relationships in the social networks. Firstly, the noise relationships in real social relationship networks are statistically verified. Secondly, two noise functions are designed based on sociological theories, and a denoising graph auto-encoder (DGAE) is constructed to mitigate the influence of noise relationships and cultivate post contents of credible social relationships. Finally, a sparse reconstruction framework is utilized to select posts that maintain coverage, importance, and diversity to form a summary of a certain length. Experimental results on a total of 22 topics from two real social media platforms (Twitter and Sina Weibo) demonstrate the efficacy of the proposed model and provide new insights for subsequent research in related fields.

    • Parallel Multi-scale Spatio-temporal Graph Convolutional Network for 3D Human Pose Estimation

      2025, 36(5):2151-2166. DOI: 10.13328/j.cnki.jos.007200 CSTR: 32375.14.jos.007200

      Abstract (327) HTML (51) PDF 6.75 K (1819) Comment (0) Favorites

      Abstract:As the human pose estimation (HPE) method based on graph convolutional network (GCN) cannot sufficiently aggregate spatiotemporal features of skeleton joints and restrict discriminative features extraction, in this paper, a parallel multi-scale spatio-temporal graph convolutional network (PMST-GNet) model is built to improve the performance of 3D HPE. Firstly, a diagonally dominant spatiotemporal attention graph convolutional layer (DDA-STGConv) is designed to construct a cross-domain spatiotemporal adjacency matrix and model the joint features based on self-constraint and attention mechanism constrain, therefore enhancing information interaction among nodes. Then, a graph topology aggregation function is devised to construct different graph topologies, and a parallel multi-scale sub-graph network module (PM-SubGNet) is constructed with DDA-STGConv as the basic unit. Finally, a multi-scale feature cross fusion block (MFEB) is designed, by which multi-scale information among PM-SubGNets can interact to improve the feature representation of GCN, therefore better extracting the context information of skeleton joints. The experimental results on the mainstream 3D HPE datasets Human3.6M and MPI-INF-3DHP show that the proposed PMST-GNet model has a good effect in 3D HPE and is superior to the current mainstream GCN-based algorithms such as Sem-GCN, GraphSH, and UGCN.

    • Text and Table Numerical Question-answering Model Based on Multi-granularity Cell Contrast

      2025, 36(5):2167-2187. DOI: 10.13328/j.cnki.jos.007206 CSTR: 32375.14.jos.007206

      Abstract (235) HTML (64) PDF 6.75 K (1652) Comment (0) Favorites

      Abstract:In the task of numerical question-answering with texts and tables, the models are required to perform numerical reasoning based on given texts and tables. The goal is to generate a computational program consisting of multi-step numerical calculations, and the program’s results are used as the answer to the question. To model the texts and tables, the current work linearizes the table into a series of cell sentences through templates and then designs a generator based on the texts and cell sentences to produce the computational program. However, this approach faces a specific problem: the differences between cell sentences generated by templates are minimal, making it difficult for the generator to distinguish between cell sentences that are essential for answering the question (supporting cell sentences) and those irrelevant to the question (distracting cell sentences). Ultimately, the model generates incorrect computational programs based on distracting cell sentences. To tackle this issue, this study proposes an approach called multi-granularity cell semantic contrast (MGCC) for our generator. The main purpose of this approach is to enhance the representation distances between supporting and distracting cell sentences, thereby helping the generator differentiate between them. Specifically, this contrast mechanism is composed of coarse-grained cell semantic contrasts and fine-grained constituent element contrasts, including contrasts in row names, column names, and cell values. The experimental results validate that the proposed MGCC approach enables the generator to achieve better performance than the benchmark model on the FinQA and MultiHiertt numerical reasoning datasets. On the FinQA dataset, it leads to an improvement of up to 3.38% in answer accuracy. Notably, on the more challenging hierarchical table dataset MultiHiertt, it yields a 7.8% increase in the accuracy of the generator. Compared with GPT-3 combined with chain of chain of thought (CoT), MGCC results in respective improvements of 5.44% and 1.69% on the FinQA and MultiHiertt datasets. The subsequent analytical experiments further verify that the multi-granularity cell semantic contrast approach contributes to the model’s improved differentiation between supporting and distracting cell sentences.

    • Framework of Two-party Threshold Schemes for SM2 Digital Signature Algorithms

      2025, 36(5):2188-2211. DOI: 10.13328/j.cnki.jos.006978 CSTR: 32375.14.jos.006978

      Abstract (614) HTML (40) PDF 6.73 K (2871) Comment (0) Favorites

      Abstract:There are a lot of two-party threshold schemes for SM2 digital signature algorithms proposed in recent years, which can significantly enhance the security of private keys for SM2 digital signature algorithms. According to different methods of key splitting, public schemes can be divided into two types: multiplicative key splitting and additive key splitting. Further, these public schemes can be subdivided into various two-party threshold schemes according to different constructions of the signature random number. This study proposes the framework of two-party threshold schemes for SM2 digital signature algorithms, which provides a safe basic calculation process of two-party threshold schemes and introduces the signature random number that can be constructed variously. With the proposed framework and various constructions of the random number, this study achieves the instantiation of the framework, obtaining a variety of two-party threshold schemes for SM2 digital signature algorithms. The instantiation includes 23 known two-party threshold schemes, as well as a variety of new schemes.

    • Data Poisoning Attacks and Defense Methods for Frequency Estimation in Local Differential Privacy

      2025, 36(5):2212-2228. DOI: 10.13328/j.cnki.jos.007179 CSTR: 32375.14.jos.007179

      Abstract (264) HTML (64) PDF 6.74 K (1579) Comment (0) Favorites

      Abstract:Local differential privacy (LDP) is widely used to collect and analyze sensitive data while protecting user privacy. However, it is vulnerable to data poisoning attacks by malicious users. The k-subset mechanism and the wheel mechanism are LDP schemes with optimal utility for frequency estimation. Yet, their resistance to data poisoning attacks lacks in-depth analysis and evaluation. Therefore, data poisoning attack methods are designed to assess the resistance to data poisoning attacks of both the k-subset mechanism and the wheel mechanism. First, the random perturbed-value attack and random item attack are discussed, and then the maximal gain attack methods against the k-subset mechanism and the wheel mechanism are constructed. The attack methods can be exploited to maximize the frequencies of target items selected by attackers, which is achieved by sending carefully crafted poisoning data to the data collector via fake users. Theoretically, the attack gains are rigorously analyzed and compared, and the effects of data poisoning attacks are experimentally evaluated, demonstrating their impact on the k-subset mechanism and the wheel mechanism. Finally, defensive measures are proposed to mitigate the effects of data poisoning attacks.

    • >Review Articles
    • Survey on Acoustic-sensing-based Authentication on Mobile Devices

      2025, 36(5):2229-2253. DOI: 10.13328/j.cnki.jos.007181 CSTR: 32375.14.jos.007181

      Abstract (204) HTML (81) PDF 6.76 K (949) Comment (0) Favorites

      Abstract:With the popularity of mobile devices and the enhancement of users’ requirements for privacy protection, studies of user authentication on mobile devices have attracted widespread attention. Recently, the audio infrastructures of mobile devices have provided greater flexibility and scalability for the design of novel user authentication schemes with excellent performance. After surveying a large number of related works, this study first classifies acoustic sensing-based user authentication schemes on mobile devices according to the difference in authentication metrics and sensing methods and describes the corresponding attack model. Then, it analyzes and compares single authentication metric-based and acoustic sensing-based user authentication schemes on mobile devices. Finally, combined with the problems of existing works, this study gives two metrics (security and practicability) to measure the performance of the user authentication system and discuss future research directions.

    • Reward Fraud Attack and Defense for Federated Learning Based on Gradient Scale-up

      2025, 36(5):2254-2269. DOI: 10.13328/j.cnki.jos.007186 CSTR: 32375.14.jos.007186

      Abstract (517) HTML (49) PDF 6.73 K (1324) Comment (0) Favorites

      Abstract:In the field of federated learning, incentive mechanisms play a crucial role in enticing high-quality data contributors to engage in federated learning and acquire superior models. However, existing research in federated learning often neglects the potential misuse of these incentive mechanisms. Specifically, participants may manipulate their locally trained models to dishonestly maximize their rewards. This issue is thoroughly examined in this study. Firstly, the problem of rewards fraud in federated learning is clearly defined, and the concept of reward-cost ratio is introduced to assess the effectiveness of various rewards fraud techniques and defense mechanisms. Following this, an attack method named the “gradient scale-up attack” is proposed, focusing on manipulating model gradients to exploit the incentive system. This attack method calculates corresponding scaling factors and utilizes them to increase the contribution of the local model to gain more rewards. Finally, an efficient defense mechanism is proposed, which identifies malicious participants by examining the L2-norms of model updates, effectively thwarting gradient scale-up attacks. Through extensive analysis and experimental validation on datasets such as MNIST, the findings of this research demonstrate that the proposed attack method significantly increases rewards, while the corresponding defense method effectively mitigates fraudulent behavior by participants.

    • DEFAULT Lightweight Cryptosystem Against Statistical Fault Analysis Based on Algebraic Relationship

      2025, 36(5):2270-2287. DOI: 10.13328/j.cnki.jos.007210 CSTR: 32375.14.jos.007210

      Abstract (243) HTML (57) PDF 6.76 K (1719) Comment (0) Favorites

      Abstract:DEFAULT, a new lightweight cryptosystem presented at Asiacrypt in 2021, is designed to protect the information security of Internet of Things (IoT) devices, such as microchips, microcontrollers, and sensors. Based on the ciphertext-only attack assumption, the statistical fault analysis of the DEFAULT cipher with the algebraic relationship is proposed. The statistical fault analysis uses the random nibble-oriented fault model. It not only combines statistical distributions of the intermediate states before and after the fault injections but also takes advantage of the algebraic relationship and novel distinguishers, including Anderson Darling test-square Euclidean imbalance (AD-SEI), Anderson Darling test-maximum likelihood estimate (AD-MLE), and Anderson Darling test-Hamming weight (AD-HW). The analysis requires at least 1344 faults to achieve the reliability of 99% in the recovery of the 128-bit secret key of DEFAULT. The theoretical analysis and experimental results show that the DEFAULT lightweight cryptosystem is not resistant to the statistical fault attack based on the algebraic relationship. This study provides an important reference for the security analysis of the other lightweight cryptosystems.

    • Malicious Node Detection Based on Semi-supervised and Self-supervised Graph Representation Learning

      2025, 36(5):2288-2307. DOI: 10.13328/j.cnki.jos.007211 CSTR: 32375.14.jos.007211

      Abstract (419) HTML (41) PDF 6.75 K (1635) Comment (0) Favorites

      Abstract:In real-world scenarios, rich interaction relationships often exist among users on different platforms such as e-commerce, consumer reviews, and social networks. Constructing these relationships into a graph structure and applying graph neural network (GNN) for malicious user detection has become a research trend in related fields in recent years. However, due to the small proportion of malicious users, as well as their disguises and high labeling costs, traditional GNN methods are limited by problems suchas data imbalance, data inconsistency, and label scarcity. This study proposes a semi-supervised graph representation learning-based method for detecting malicious nodes. The method improves the GNN method for node representation learning and classification. Specifically, a class-aware malicious node detection (CAMD) method is constructed, which introduces a class-aware attention mechanism, inconsistent GNN encoders, and class-aware imbalance loss functions to solve the problems of data inconsistency and imbalance. Furthermore, to address the limitation of CAMD in detecting malicious nodes with scarce labels, a graph contrastive learning-based method, CAMD+, is proposed. CAMD+ introduces data augmentation, self-supervised graph contrastive learning, and class-aware graph contrastive learning to enable the model to learn more information from unlabeled data and fully utilize scarce label information. Finally, a large number of experimental results on real-world datasets verify that the proposed methods outperform all baseline methods and demonstrate good detection performance in situations with different degrees of label scarcity.

    • Degree Corrected General Stochastic Block Model for Community Detection in Attributed Network

      2025, 36(5):2308-2320. DOI: 10.13328/j.cnki.jos.007197 CSTR: 32375.14.jos.007197

      Abstract (53) HTML (53) PDF 6.72 K (588) Comment (0) Favorites

      Abstract:Stochastic block models can fit the generation of various networks, mining implicit structures and potential connections within these networks. Thus, they have significant advantages in community detection. General stochastic block (GSB) models discover general communities based on link communities, but they are only applicable to directed non-attributed networks. This study proposes a degree corrected general stochastic block (DCGSB) model for undirected attributed networks which models both network topology information and node attributes. In the DCGSB model, it is assumed that the generation of network topology information and node attributes follows a distribution in the form of power functions. Node degrees are introduced to characterize the scale-free property of networks, which allows the model to better fit the generation of real networks. The expectation-maximization algorithm is employed to estimate the parameters of the DCGSB model, and node-community memberships are obtained by hard partition to complete community detection. Experiments are conducted on three real attributed network datasets containing different network structures, and the proposed model is compared with ten existing community detection algorithms. Results show that the DCGSB model not only inherits the advantages of GSB models in identifying general communities but also outperforms the ten algorithms in community detection due to the introduction of attribute information and node degrees.

    • PLTree: High-performance Learning Index for Persistent Memory

      2025, 36(5):2321-2341. DOI: 10.13328/j.cnki.jos.007198 CSTR: 32375.14.jos.007198

      Abstract (234) HTML (37) PDF 6.75 K (1603) Comment (0) Favorites

      Abstract:Persistent memory (PM), serving as a supplement and potential replacement for main memory, offers a lower cost for data storage while ensuring data persistence. However, traditional index structures tailored for PM like B+ trees fail to fully exploit the distribution characteristics of data for optimizing reading and writing performance on PM. Recent research endeavors have sought to enhance indexes’ reading and writing performance on PM and support index persistence through the data distribution awareness of learning indexes. Nonetheless, existing designs of persistent learning index structures suffer from additional PM accesses and poor performance when confronted with real-world data. To address the performance degradation of persistent learning indexes in the face of real data distributions, this study proposes a learning index PLTree, a DRAM/PM hybrid architecture. PLTree optimizes reading and writing performance under real data distributions through the following approaches: (1) a two-stage approach to construct the index, eliminating last-mile search in internal nodes and reducing the access of PM, (2) model-based search for efficient query performance on PM and accelerated query by leveraging metadata in DRAM, and (3) a log-based hierarchical overflow buffer structure tailored to PM characteristics to optimize writing performance. The results show that, compared with the existing persistent memory indexes (APEX, FPTree, uTree, NBTree, and DPTree), PLTree achieves significantly better performance in index construction 1.9× to 34× across various datasets. In single-threaded scenarios, PLTree exhibits an average query and insertion performance improvement of 1.26× to 4.45× and 2.63× to 6.83×, respectively. In multi-threaded scenarios, PLTree surpasses the baseline by up to 10.2× and 23.7× in query and insertion performance, respectively.

    • Clustering Frequent Pattern Mining for Time-ordered Transaction Data

      2025, 36(5):2342-2361. DOI: 10.13328/j.cnki.jos.007209 CSTR: 32375.14.jos.007209

      Abstract (197) HTML (63) PDF 6.74 K (1788) Comment (0) Favorites

      Abstract:In this study, the problem of mining cluster frequent patterns in time-ordered transaction data is discussed for the first time. To deal with redundant operations when the Naive algorithm solves this problem, the improved cluster frequent pattern mining (ICFPM) algorithm is proposed. The algorithm uses two optimization strategies. On the one hand, it can use the defined parameter minCF to effectively reduce the search space of mining results; on the other hand, it can refer to the discriminative results of (n–1)-itemsets to accelerate the discriminative process of cluster frequent n-itemset. The algorithm also applies the ICFPM-list structure to reduce the overhead of the candidate n-itemsets construction. Simulation experiments based on two real-world datasets demonstrate the effectiveness of the ICFPM algorithm. Compared with the Naive algorithm, the ICFPM algorithm improves substantially in terms of time and space efficiency, which makes it an effective method for solving clustered frequent pattern mining.

    • Low-latency Microkernel IPC Design for SPARC Architecture

      2025, 36(5):2362-2380. DOI: 10.13328/j.cnki.jos.007192 CSTR: 32375.14.jos.007192

      Abstract (271) HTML (39) PDF 6.74 K (1832) Comment (0) Favorites

      Abstract:Microkernels migrate system services to user mode. Thanks to the isolated framework, microkernels are superior in high reliability, which meets the needs of the aerospace field. SPARC processors are widely applied on the control equipment of spacecraft, satellite payloads, and planetary vehicles. The register window mechanism of SPARC will affect the performance of inter-process communication (IPC) on microkernels. Besides, its inter-processor interrupt (IPI) also seriously affects the efficiency of cross-core IPC. As a key mechanism, IPC is vital to the overall performance of applications on microkernels. Through observing the register window mechanism, this study redesigns and implements the register bank mechanism, where the register window is allocated and managed by the kernel. Thus BankedIPC on SPARC is implemented. At the same time, as IPI underperforms on SPARC, FlexIPC is designed to optimize the performance of cross-core IPC. These approaches are employed to optimize the general synchronous IPC implemented on a self-developed microkernel ChCore. According to the test, the average IPC performance of microkernels on the optimized SPARC is about two times better with the application performance up to 15%.

    • Quantum Circuit Mapping for Distributed Superconducting Quantum Computing Architecture

      2025, 36(5):2381-2400. DOI: 10.13328/j.cnki.jos.007167 CSTR: 32375.14.jos.007167

      Abstract (247) HTML (59) PDF 6.76 K (1541) Comment (0) Favorites

      Abstract:In recent years, research on the interconnect technology of superconducting qubits has made important progress, providing an effective way to build a distributed computing architecture for superconducting quantum computers. The distributed superconducting architecture imposes strict constraints on the execution of quantum circuits in terms of network topology, qubit connectivity, and quantum state transfer protocols. To execute and schedule quantum circuits on a distributed architecture, the circuit mapping process is required to transform the quantum circuits to adapt to the underlying architecture and then to distribute the transformed circuits to multiple QPUs. The distributed circuit mapping process necessitates the insertion of additional quantum operations into the original circuit. Such operations, especially the inter-QPU state transfer operations, are susceptible to noise, leading to high error rates. Therefore, minimizing the number of such additional operations inserted by the mapping process is critical to improving the overall computation success rate. This study constructs an abstract model of distributed quantum computing based on the technical features of the interconnect technology of superconducting qubits and today’s superconducting QPUs. Moreover, this study proposes a distributed quantum circuit mapping approach based on this abstract model. The proposed approach consists of two main components the distributed qubit mapping algorithm and the qubit state routing algorithm. The former formulates the problem of distributing qubits to different QPUs as a combinatorial optimization problem and employs simulated annealing enhanced with local search to find the initial mapping that brings the optimal total routing cost. The latter constructs several heuristic qubit routing rules for different scenarios and integrates them systematically to minimize the additional operations inserted by the mapping process. The abstract model shields any technical details of the underlying architecture that are irrelevant to circuit mapping, which makes the mapping method applicable to a class of such networks rather than a specific one. Moreover, the approach proposed in this study can be used as an ancillary tool to design and evaluate the network topology of distributed systems. The experimental results show that, compared to the baseline approach, the proposed approach reduces the number of intra-chip operations (SWAP gates) and inter-chip operations (ST gates) by 69.69% and 85.88% on average, respectively, with a time overhead similar to existing algorithms.

Current Issue


Volume , No.

Table of Contents

Archive

Volume

Issue

联系方式
  • 《Journal of Software 》
  • 主办单位:Institute of Software, CAS, China
  • 邮编:100190
  • 电话:010-62562563
  • 电子邮箱:jos@iscas.ac.cn
  • 网址:https://www.jos.org.cn
  • 刊号:ISSN 1000-9825
  •           CN 11-2560/TP
  • 国内定价:70元
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063