XU Xue-Zheng , YANG De-Heng , WANG Lu , WANG Tao , HUANG An-Wen , LI Qiong
2025, 36(9):3919-3936. DOI: 10.13328/j.cnki.jos.007292 CSTR: 32375.14.jos.007292
Abstract:The memory consistency model defines constraints on memory access orders for parallel programs in multi-core systems and is an important architectural specification jointly followed by software and hardware. Sequential consistency (SC) per location is a classic axiom of memory consistency models, which specifies that all memory access operations with the same address in a multi-core system follow SC. Meanwhile, it has been widely employed in the memory consistency models of classic architectures such as X86/TSO, Power, and ARM, and plays an important role in chip memory consistency verification, system software, and parallel program development. RISV is an open-source architectural specification, and its memory model is defined by global memory orders, preserved program orders, and three axioms (the load value axiom, atomicity axiom, and progress axiom). Additionally, it does not directly include SC per location as an axiom, which poses challenges to existing memory model verification tools and system software development. This study formalizes the SC per location as a theorem based on the defined axioms and rules in the RISC-V memory model. The proof process abstracts the construction of memory access sequences with the same arbitrary address into deterministic finite automata for inductive proof. This study is a theoretical supplement to the formal methods of RISC-V memory consistency.
LI Yi-Jin , DU Shao-Min , ZHAO Jia-Cheng , WANG Xue-Ying , ZHA Yong-Quan , CUI Hui-Min
2025, 36(9):3937-3953. DOI: 10.13328/j.cnki.jos.007357 CSTR: 32375.14.jos.007357
Abstract:Instruction-level parallelism is a fundamental challenge in processor architecture research. Very long instruction word (VLIW) architecture is widely used in the field of digital signal processing to enhance instruction-level parallelism. In VLIW architecture, the instruction issue order is determined by the compiler, making its performance highly dependent on the compiler’s instruction scheduling. To explore the potential of RISC-V VLIW architecture and further enrich the RISC-V ecosystem, this study focuses on optimizing instruction scheduling algorithms for RISC-V VLIW architecture. For a single scheduling region, the integer linear programming (ILP) scheduling can achieve optimal solutions but suffers from high computational complexity, whereas list scheduling offers lower complexity at the cost of potentially suboptimal solutions. To leverage the strengths of both approaches, this study proposes a hybrid instruction scheduling algorithm. The scheduling region where the list scheduling has not reached the optimal solution can be located with the IPC theoretical model, and then the integer linear programming scheduling algorithm further processes the located scheduling region. The theoretical model is based on data flow analysis, accounting for both instruction dependencies and hardware resources, and provides a theoretical upper bound for IPC with linear complexity. The accuracy of the IPC theoretical model is a critical factor for the success of hybrid scheduling and achieves 95.74% accuracy in this study. On the given benchmark, the IPC model identifies that 94.62% of scheduling regions has reached optimal solution with list scheduling, leaving only 5.38% requiring further refinement with ILP scheduling. The proposed hybrid scheduling algorithm achieves the scheduling quality of ILP scheduling while maintaining a complexity comparable to that of list scheduling.
HAN Jin-Chi , WANG Zhi-Dong , MA Hao , SONG Wei
2025, 36(9):3954-3969. DOI: 10.13328/j.cnki.jos.007358 CSTR: 32375.14.jos.007358
Abstract:Cache simulators are indispensable tools for exploring cache architectures and researching cache side channels. Spike, the standard implementation of the RISC-V instruction set, offers a comprehensive environment for RISC-V-based cache research. However, its cache model suffers from limitations, such as low simulation granularity and notable discrepancies with the cache structures of real processors. To address these limitations, this study introduces flexible cache architectural simulator (FlexiCAS), a modified and extended version of Spike’s cache model. The modified simulator, referred to as Spike-FlexiCAS, supports a wide range of cache architectures with flexible configuration and easy extensibility. It enables arbitrary combinations of cache features, including coherence protocols and implementation methods. In addition, FlexiCAS can simulate cache behavior independently of Spike. The performance evaluations demonstrate that FlexiCAS significantly outperforms the cache model of ZSim, the fastest execution-driven simulator available.
LI Chuan-Dong , YI Ran , LUO Ying-Wei , WANG Xiao-Lin , WANG Zhen-Lin
2025, 36(9):3970-3984. DOI: 10.13328/j.cnki.jos.007359 CSTR: 32375.14.jos.007359
Abstract:Memory virtualization, a core component of virtualization technology, directly impacts the overall performance of virtual machines. Current memory virtualization approaches often involve a tradeoff between the overhead of two-dimensional address translation and page table synchronization. Traditional shadow paging employs an additional software-maintained page table to achieve address translation performance comparable to native systems. However, synchronization of shadow page tables relies on write protection, frequently causing VM-exits that significantly degrade system performance. In contrast, the nested paging approach leverages hardware-assisted virtualization, allowing the guest page table and nested page table to be directly loaded into the MMU. While this eliminates page table synchronization, the two-dimensional page table traversal will seriously degrade the address translation performance. Two-dimensional page table traversal incurs substantial performance penalties for address translation due to privilege overhead. This study proposes lazy shadow paging (LSP), which reduces page table synchronization overhead while retaining the high efficiency of shadow page tables. Leveraging the privilege model and hardware features of the RISC-V architecture, LSP analyzes the access patterns of guest OS page tables and binds synchronization with translation lookaside buffer (TLB) flushes, reducing the software overhead associated with page table updates by deferring costs until the first access to a relevant page to minimize VM-exits. In addition, it introduces a fast path for handling VM-exits, exploiting the fine-grained TLB interception and privilege-level features of RISC-V to further optimize performance. Experimental results demonstrate that under the baseline RISC-V architecture, LSP reduces VM-exits by up to 50% compared to traditional shadow paging in micro-benchmark tests. For typical applications in the SPEC2006 benchmark suite, LSP reduces VM-exits by up to 25% compared to traditional shadow paging and decreases memory accesses per TLB miss by 12 compared to nested paging.
HAN Liu-Tong , ZHANG Hong-Bin , XING Ming-Jie , WU Yan-Jun , ZHAO Chen
2025, 36(9):3985-4005. DOI: 10.13328/j.cnki.jos.007360 CSTR: 32375.14.jos.007360
Abstract:The performance acceleration of high-performance libraries on CPUs can be achieved by leveraging SIMD hardware through vectorization. Implementing vectorization requires programming methods tailored to the target SIMD hardware, which vary significantly across different SIMD extensions. To avoid redundant implementations of algorithm optimizations on various platforms and enhance the maintainability of algorithm libraries, a hardware abstraction layer (HAL) is often introduced. However, most existing HAL designs are based on fixed-length vector registers, aligning with the fixed-length nature of conventional SIMD extension instruction sets. This design fails to accommodate the variable-length vector register introduced by the RISC-V vector extension. Treating RISC-V vector extensions as fixed-length vectors within traditional HAL designs results in unnecessary overhead and performance degradation. To address this problem, the study proposes a HAL design method compatible with both variable-length vector extensions and fixed-length SIMD extensions. Using this approach, the universal intrinsic functions in the OpenCV library are redesigned and optimized to better support RISC-V vector extension devices while maintaining compatibility with existing SIMD platforms. Performance comparisons between the optimized and original OpenCV libraries reveal that the redesigned universal intrinsic function efficiently integrates RISC-V vector extensions into the HAL optimization framework, achieving a 3.93× performance improvement in core modules. These results validate the effectiveness of the proposed method, significantly enhancing the execution performance of high-performance libraries on RISC-V devices. In addition, the proposed approach has been open-sourced and integrated into the OpenCV repository, demonstrating its practicality and application value.
WANG Xiao-Bing , CHANG Jia-Jun , LI Chun-Yi , YANG Xiao-Yu , ZHAO Liang
2025, 36(9):4006-4035. DOI: 10.13328/j.cnki.jos.007222 CSTR: 32375.14.jos.007222
Abstract:Smart contracts are scripts running on the Ethereum blockchain capable of handling intricate business logic with most written in the Solidity. As security concerns surrounding smart contracts intensify, a formal verification method employing the modeling, simulation, and verification language (MSVL) alongside propositional projection temporal logic (PPTL) is proposed. A SOL2M converter is developed, facilitating semi-automatic modeling from the Solidity to MSVL programs. However, the proof of operational semantic equivalence of Solidity and MSVL is lacking. This study initially defines Solidity’s operational semantics using big-step semantics across four levels: semantic elements, evaluation rules, expressions, and statements. Subsequently, it establishes equivalence relations between states, expressions, and statements in Solidity and MSVL. In addition, leveraging the operational semantics of both languages, it employs structural induction to prove expression equivalence and rule induction to establish statement equivalence.
XIE Sheng-Long , LI Qing-Shan , DAI Jie , CUI Di
2025, 36(9):4036-4055. DOI: 10.13328/j.cnki.jos.007267 CSTR: 32375.14.jos.007267
Abstract:Bug triaging is the process of assigning bug reports to developers suitable for resolving the reported bugs, ensuring timely fixes. Current research in bug triaging mainly focuses on the text classification of bug reports. However, according to the Pareto principle, the data distribution of bug reports used for classification is unbalanced, which may lead to ineffective triaging for inactive developers. Additionally, existing classification models often neglect to model developers and struggle to capture the correlations between bugs and developers, affecting the efficiency of bug triaging. To address these issues, this study proposes a collaborative bug triaging method based on multimodal fusion (CBT-MF). This method first preprocesses bug reports and constructs a bug-developer bipartite graph. To mitigate the impact of the unbalanced distribution of bug fix records, the bipartite graph data is enhanced using K-means clustering and positive-negative sampling. To represent developer information, node features are extracted from the bipartite graph using a graph convolutional network model. Finally, correlations between bugs and developers are captured by matching inner products, and Bayesian personalized ranking (BPR) is utilized for bug report recommendation and triaging. Comprehensive experiments conducted on publicly available datasets demonstrate that CBT-MF outperforms several state-of-the-art methods in bug triaging.
MA Jie , SUN Wang-Chun , WANG Ping-Hui , ZHANG Ruo-Fei , LI Shuai-Peng , SU Zhou
2025, 36(9):4056-4071. DOI: 10.13328/j.cnki.jos.007247 CSTR: 32375.14.jos.007247
Abstract:As large language models (LLMs) continue to evolve, they have shown impressive performance in open-domain tasks. However, they exhibit limited effectiveness in domain-specific question-answering due to a lack of domain-specific knowledge. This limitation has attracted widespread attention from researchers in the field. Current research attempts to infuse domain knowledge into LLMs through a retrieve-answer approach to enhance their performance. However, this method often retrieves additional, irrelevant data, leading to a degradation in LLM effectiveness. Therefore, this study proposes a method for knowledge graph question answering based on the relevance of knowledge. This method focuses on distinguishing essential knowledge required for specific questions from noisy data. Under a framework of retrieval-relevance assessment-answering, this method guides LLMs to select appropriate knowledge for accurate answers. Moreover, this study introduces a dataset named Mecha-QA for question-answering using a mechanical domain knowledge graph, covering traditional machinery manufacturing and additive manufacturing, to promote research that integrates LLMs with knowledge graph question answering in this field. To validate the effectiveness of the proposed method, experiments are conducted on the Aero-QA dataset in the aerospace domain and the Mecha-QA dataset. Results demonstrate that the proposed method significantly improves the performance of LLMs in knowledge graph question answering in vertical domains.
LI Wen-Yi , DAI Wei , NAN Jing , LIU Cong-Hu
2025, 36(9):4072-4092. DOI: 10.13328/j.cnki.jos.007257 CSTR: 32375.14.jos.007257
Abstract:Due to the difficulty in determining the structure and training the parameters of recurrent neural network (RNN), an incremental-construction for random RNN (IRRNN) is proposed to realize the incremental construction of RNN structures and the random learning of network parameters. The IRRNN establishes an incremental constraint mechanism for hidden nodes and uses the candidate node pool strategy to realize the optimal selection of hidden nodes, avoiding the blindness of random construction of the network. Two incremental random learning methods, termed IR-1 and IR-2, are designed for local and global optimization of model parameters. Additionally, their universal approximation property is proved. Meanwhile, the dynamic property of the IRRNN model is studied to analyze its generalization performance. Experiments validated that the IRRNN exhibits favorable dynamic properties, compactness, and accuracy.
HU Si-Yu , ZHOU Yuan-Chang , ZHAO Tong , WANG Lin-Wang , JIA Wei-Le , TAN Guang-Ming
2025, 36(9):4093-4109. DOI: 10.13328/j.cnki.jos.007258 CSTR: 32375.14.jos.007258
Abstract:Molecular dynamics simulation plays an important role in material simulation, biopharmaceuticals, and other areas. In recent years, the development of AI-for-Science has greatly improved the accuracy of neural network force fields in predicting energy, force, and other properties, compared to traditional methods using potential functions. Neural network force field models may challenges such as hyperparameter settings and gradient explosion when trained by the first-order method. Based on an optimizer named reorganized layer extended Kalman filtering, this study provides several strategies to avoid hyperparameters and offers theoretical evidence for preventing gradient explosion. This study also proposes an alternate training method and analyzes its accuracy gains and time costs. A performance model of block thresholding is proposed, and its effectiveness is explored. Additionally, the property of preventing gradient explosion is proven, and the optimizer’s robustness with respect to activation functions and weight initialization is validated. Four typical neural network force field models are tested on 11 representative datasets. Results show that when the proposed optimizer and the first-order optimizer achieve comparable prediction accuracy, the proposed optimizer is 8 to 10 times faster than the first-order optimizer. It is believed that the proposed training method can inspire other AI-for-Science applications.
WANG Xin-Ao , CHEN Ke , SHOU Li-Dan , LUO Xin-Yuan , CHEN Gang
2025, 36(9):4110-4133. DOI: 10.13328/j.cnki.jos.007259 CSTR: 32375.14.jos.007259
Abstract:High-quality training data is instrumental in pre-trained language models (PLMs), yet privacy concerns often preclude the centralized collection of data from many professional domains. Federated learning offers a solution by enabling model training while safeguarding data privacy. However, the limited resources of federated learning clients pose a challenge to the training of pre-trained language models. This study addresses this issue through several steps. Firstly, it defines the problem of completing model training with limited resources and explores strategies to balance computational and communication costs for optimizing training efficiency. Secondly, it introduces an efficient federated learning framework for BERT further pre-training and fine-tuning (FedBT). FedBT facilitates the training of the BERT model on federated learning clients, encompassing both further pre-training and downstream task fine-tuning. Depending on the application context, FedBT selectively trains key parameters of the BERT model at the clients, uploading only the updated parameters to the server for aggregation. This approach significantly reduces both computational and communication overhead during training. Finally, extensive experiments are conducted on datasets from multiple professional domains. Results demonstrate that FedBT reduces client-side computational costs to 34.31% and communication costs to 7.04% during further pre-training. In downstream task fine-tuning, it reduces client-side computational costs to 48.26% and communication costs to 20.19%. The accuracy achieved in both pre-training and downstream task fine-tuning is comparable to traditional federated learning methods that train the entire model.
HAO Zhi-Feng , WANG Fei-Xia , CHEN Zheng-Ming , QIAO Jie , CAI Rui-Chu
2025, 36(9):4134-4152. DOI: 10.13328/j.cnki.jos.007261 CSTR: 32375.14.jos.007261
Abstract:Causal discovery aims to uncover causal relationships among variables from observational data, serving as a crucial method for understanding various phenomena and changes in natural, social, and technological systems. A mainstream approach for causal discovery is a constraint-based algorithm, which determines the causal structure among variables by examining their conditional independence. However, data collection in the real world often faces challenges such as limited sample sizes and high variance among nodes due to resource or technical constraints. In these scenarios, the accuracy of conditional independence tests is greatly affected, leading to erroneous deletion of causal edges of some variables in learned causal graphs, thereby impacting the accuracy of the algorithm’s output. To address this issue, this study proposes an enhanced method for conditional independence testing, which focuses on minimizing the interference of irrelevant external noise on the variables being tested, thereby improving the accuracy of conditional independence tests. Based on this enhanced method, the paper introduces a structure learning algorithm based on heuristic search, which iteratively searches for mistakenly deleted causal edges on a graph with an initial structure. This algorithm reconstructs the causal structure by combining enhanced conditional independence tests with score optimization. Experimental results show that, compared to existing methods, the proposed algorithm significantly improves both the F1 score and the structural Hamming distance (SHD) on simulated, Bayesian network, and real data, demonstrating its ability to more accurately reveal underlying causal structures in observational data with limited samples and high-variance nodes.
HUANG Qiao-Juan , CAO Cun-Gen , WANG Ya , WANG Shi
2025, 36(9):4153-4186. DOI: 10.13328/j.cnki.jos.007262 CSTR: 32375.14.jos.007262
Abstract:Commonsense knowledge is usually not explicitly expressed in natural languages but is implicitly understood in human cognition. Providing machines with commonsense knowledge has been a longstanding aim in artificial intelligence. Initially, this study manually constructs a high-precision, event-centric commonsense knowledge graph (ECKG) for seed events in Chinese. It contains 26 606 commonsense event triples encompassing causal, temporal, conditional, and other common event relationships. Although the constructed ECKG holds considerable value, its limited scale curtails practical applications. Besides, large-scale event commonsense knowledge graphs are rare in current studies. To overcome these challenges, this paper uses large language models from the GPT series to expand the above-mentioned three event relationships and sub-events of the proposed ECKG. The expansion method involves three primary steps. Firstly, specific prompts for event knowledge (ek-prompts) are designed by combining the events in the ECKG with four relationships, and GPT-4-Turbo is used to generate corresponding event triples. Secondly, the triples of the ECKG are integrated with accurate triples obtained by ek-prompts to create a specialized dataset. Additionally, GPT-3.5-Turbo is fine-tuned on the dataset to generate more specific event triples and validate the accuracy of new triples. Lastly, by analyzing the similarities among events in the ECKG and implementing an event-sharing mechanism, similar events within the same relationship are interconnected, ensuring consistency across similar event triples. Experimental results show that the newly acquired triples are of high quality, particularly those of the temporal relationships, with an accuracy rate of 98.2%. Ultimately, the proposed expansion method appends 2 433 012 commonsense event triples to the original ECKG, significantly expanding its scale and providing more commonsense knowledge for many applications in artificial intelligence.
CHEN Jian-Wei , SHEN Ying-Long , YANG Fan , LAI Yong-Xuan
2025, 36(9):4187-4206. DOI: 10.13328/j.cnki.jos.007264 CSTR: 32375.14.jos.007264
Abstract:Mainstream methods for scene text detection often use complex networks with plenty of layers to improve detection accuracy, which requires high computational costs and large storage space, thus making them difficult to deploy on embedded devices with limited computing resources. Knowledge distillation assists in training lightweight student networks by introducing soft target information related to teacher networks, thus achieving model compression. However, existing knowledge distillation methods are mostly designed for image classification and extract the soft probability distributions from teacher networks as knowledge. The amount of information carried by such methods is highly correlated with the number of categories, resulting in insufficient information when directly applied to the binary classification task in text detection. To address the problem of scene text detection, this study introduces a novel concept of information entropy and proposes a knowledge distillation method based on mask entropy transfer (MaskET). MaskET combines information entropy with traditional knowledge distillation methods to increase the amount of information transferred to student networks. Moreover, to eliminate the interference of background information in images, MaskET only extracts the knowledge within the text area by adding mask operations. Experiments conducted on six public benchmark datasets, namely ICDAR 2013, ICDAR 2015, TD500, TD-TR, Total-Text and CASIA-10K, show that MaskET outperforms the baseline model and other knowledge distillation methods. For example, MaskET improves the F1 score of MobileNetV3-based DBNet from 65.3% to 67.2% on the CASIA-10K dataset.
ZHAO Jia-Ning , WANG Jing-Jing , LUO Jia-Min , ZHOU Guo-Dong
2025, 36(9):4207-4222. DOI: 10.13328/j.cnki.jos.007303 CSTR: 32375.14.jos.007303
Abstract:Phrasal visual grounding, a fundamental and critical research task in the field of multimodal studies, aims at predicting fine-grained alignment relationships between textual phrases and image regions. Despite the remarkable progress achieved by existing phrasal visual grounding approaches, they all ignore the implicit alignment relationships between textual phrases and their corresponding image regions, commonly referred to as implicit phrase-region alignment. Predicting such relationships can effectively evaluate the ability of models to understand deep multimodal semantics. Therefore, to effectively model implicit phrase-region alignment relationships, this study proposes an implicit-enhanced causal modeling (ICM) approach for phrasal visual grounding, which employs the intervention strategies of causal reasoning to mitigate the confusion caused by shallow semantics. To evaluate models’ ability to understand deep multimodal semantics, this study annotates a high-quality implicit dataset and conducts a large number of experiments. Multiple sets of comparative experimental results demonstrate the effectiveness of the proposed ICM approach in modeling implicit phrase-region alignment relationships. Furthermore, the proposed ICM approach outperforms some advanced multimodal large language models (MLLMs) on the implicit dataset, further promoting the research of MLLMs towards more implicit scenarios.
HAN Jiang , ZHANG Zhen-Feng , LIU Yu-Guo , HU Ke-Xin , HE Shuang-Yu
2025, 36(9):4223-4240. DOI: 10.13328/j.cnki.jos.007274 CSTR: 32375.14.jos.007274
Abstract:In the industrial field, currently used access permission control technologies are increasingly struggling to address access control issues of distributed systems deployed in wide-area internet scenarios. This situation is particularly exacerbated when dealing with large-scale information systems distributed across multiple trust domains, thereby engendering an escalating proliferation of vulnerabilities. Consensus-based access control policy sharing technologies can facilitate the secure and expeditious attainment of consensus decisions among access control nodes deployed across trust domains. This study first proposes a consensus-based access permission control model for multiple nodes and presents the Super-Dumbo consensus algorithm for access control engines, which features robust security and high performance. Super-Dumbo surmounts the performance bottlenecks of Dumbo2 by optimizing the design of key steps encompassing message broadcasting, random coin toss procedures, and consensus algorithm constructs. Notably, it reduces computational overhead such as digital signature verification, thereby effectively enhancing bandwidth utilization. This achieves a substantial improvement in performance metrics, such as throughput and latency, aligning seamlessly with the performance prerequisites of the CBAC access control model, which demands low latency and high throughput from the underlying consensus algorithm.
LI Jia-Wei , WANG Sheng-Chun , WANG Wen-Cheng
2025, 36(9):4241-4249. DOI: 10.13328/j.cnki.jos.007307 CSTR: 32375.14.jos.007307
Abstract:In terms of point-in-polygon tests, a grid method proposed recently exhibits high computational efficiency. This method organizes the polygon fragments within each grid cell into stripe structures, ensuring that edges in each stripe intersect with both the left and right boundaries of the stripe. In this way, localization computation is enhanced, and GPUs are used for convenient parallel computation, resulting in a detection efficiency superior to that of various previous methods. However, stripe structures constructed based on grid cells generate redundant stripes. Besides, the method has a high space requirement for stripe construction, making it inconvenient to construct stripe structures on GPUs. In response to this, this study proposes to construct stripe structures via grid rows. Thus, redundant strips can be eliminated, and the space requirement for the creation of computation is reduced, due to which stripe structures can be constructed on GPUs, and work efficiency is improved. Experimental results show that, compared with the original method, the new method significantly accelerates the construction of stripe structures, even by over 40 times. Moreover, it has a faster detection speed and can handle dynamic polygons more efficiently.
ZHAO Ya-Ru , ZHANG Jian-Biao , CAO Yi-Hao , HUANG Hao-Xiang
2025, 36(9):4250-4270. DOI: 10.13328/j.cnki.jos.007266 CSTR: 32375.14.jos.007266
Abstract:With the proliferation of massive data and the ever-growing demand for intelligent applications, ensuring data security has become a critical measure for enhancing data quality and realizing data value. The cloud-edge-client architecture has emerged as a promising technology for efficient data processing and optimization. Federated learning (FL), an efficient decentralized machine learning paradigm that can provide privacy protection for data, has garnered extensive attention from academia and industry in recent years. However, FL has demonstrated inherent vulnerabilities that render it highly susceptible to poisoning attacks. Most existing methods for defending against poisoning attacks rely on continuously updated space, but in practical scenarios, those methods may be less robust when facing flexible attack strategies and varied attack scenarios. Therefore, this study proposes FedDiscrete, a defense method for resisting poisoning attacks in cloud-edge FL (CEFL) systems. The key idea is to compute local rankings on the client side using the scores of network model edges to create discrete update space. To ensure fairness among clients participating in the FL task, this study also introduces a contribution metric. In this way, FedDiscrete can penalize potential attackers by allocating updated global rankings. Extensive experiments demonstrate that the proposed method exhibits significant advantages and robustness against poisoning attacks, and is applicable to both independent and identically distributed (IID) and non-IID scenarios, providing protection for CEFL systems.
PU Lang , LIN Chao , WU Wei , GU Jing-Jing , HE De-Biao
2025, 36(9):4271-4284. DOI: 10.13328/j.cnki.jos.007271 CSTR: 32375.14.jos.007271
Abstract:Cloud storage has become an important part of the digital economy as it brings great convenience to users’ data management. However, complex and diverse network environments and third parties that are not fully trusted pose great threats to users' privacy. To protect users’ privacy, data is usually encrypted before storage, but the ciphertext generated by traditional encryption techniques hinders subsequent data retrieval. Public-key encryption with keyword search (PEKS) technology can provide a confidential retrieval function while guaranteeing data encryption, but the traditional PEKS scheme is vulnerable to keyword guessing attacks due to the small number of common keywords. Public-key authenticated encryption with keyword search (PAEKS) introduces authentication technology based on PEKS, which can further improve security. However, most of the existing PAEKS schemes are designed based on foreign cryptographic algorithms, which do not meet the development needs of independent innovation of cryptography in China. This study proposes an SM9-PAEKS scheme, which can effectively improve user-side retrieval efficiency by redesigning algorithm structure and transferring time-consuming operations to a resource-rich cloud server. Scheme security is also proved under the random oracle model based on q-BDHI and Gap-q-BCCA1 security assumptions. Finally, theoretical analysis and experimental results show that compared with the optimal communication cost among similar schemes, SM9-PAEKS can reduce the total computational overhead by at least 59.34% with only 96 bytes of additional communication cost, and the computational overhead reduction of keyword trapdoor generation is particularly significant, about 77.55%. This study not only helps to enrich national security algorithm applications but also provides theoretical and technical support for data encryption and retrieval in cloud storage.
WANG Shang , ZHANG Chen-Xi , PENG Xin
2025, 36(9):4285-4310. DOI: 10.13328/j.cnki.jos.007315 CSTR: 32375.14.jos.007315
Abstract:As an essential type of observability data, distributed tracing data plays a crucial role in operation and maintenance tasks like performance analysis, fault diagnosis, and system understanding. Due to the rapid increase in system scale and complexity, the volume of tracing data grows exponentially, putting forward higher storage requirements. To mitigate the storage cost of tracing data, data compression becomes a crucial approach. Existing compression methods fail to fully exploit tracing data features for achieving efficient compression, and they do not support complex queries on compressed data either. This study introduces a neural-network-based approach for compressing and querying distributed tracing data. It employs a novel redundancy extraction technique to identify pattern and structural redundancies within tracing data, and leverages neural network models and arithmetic coding to achieve efficient data compression. Meanwhile, the method enables efficient querying of compressed data without decompressing all the data. Variously sized tracing datasets are collected from four open-source microservices systems, and the proposed method is evaluated. Experimental results show relatively high compression ratios (105.5–197.6) are achieved by the proposed method, which are four times those of state-of-the-art general compression algorithms on average. Additionally, the querying efficiency of the proposed method on the compressed data is validated, showcasing faster performance than existing query tools in optimal scenarios.
HE Yu-Lin , LAI Jun-Long , CUI Lai-Zhong , YIN Jian-Fei , HUANG Zhe-Xue
2025, 36(9):4311-4326. DOI: 10.13328/j.cnki.jos.007286 CSTR: 32375.14.jos.007286
Abstract:Social network link prediction can help to reveal the potential connections between network nodes, and has important practical application value in friend recommendation and cooperation prediction. However, existing link prediction algorithms ignore the medium and long-term development trend of social network time series, and do not consider the interaction relationship between nodes in the network from a long-term perspective. To address the above-mentioned problems, a spatiotemporal attention-based multi-granularity link prediction algorithm is proposed, which can integrate the spatiotemporal features of social network time series with different granularities to improve the accuracy of link prediction. Firstly, the weights of the social network snapshot graph are constructed with the time decay function, and a graph-weighted moving average strategy is proposed to generate social network time series with different granularities reflecting short-term, medium-term, and long-term trends. Then, a neural network based on the multi-head attention mechanism is used to extract the global temporal features of social network sequences. Next, the historical interaction information of nodes within social network sequences is combined, and the neural network based on the mask attention mechanism is used to adaptively construct the network topology from a long-term perspective to dynamically adjust the interactions between nodes and is combined with graph convolutional network to model spatial information. Finally, the fusion attention neural network is proposed to extract useful short-term, medium-term and long-term information from short-term, medium-term and long-term spatiotemporal features, and perform feature fusion to accurately predict the future links of social networks. Experimental comparisons with seven existing link prediction algorithms on four social network public datasets confirm the effectiveness and superiority of the proposed method.
QIAN Zhong-Sheng , WANG Ya-Hui , YU Qing-Yuan , FAN Fu-Yu , FU Ting-Feng
2025, 36(9):4327-4348. DOI: 10.13328/j.cnki.jos.007289 CSTR: 32375.14.jos.007289
Abstract:Cross-domain recommendation (CDR) alleviates the cold start problem by transferring the user-item rating patterns from a dataset in a dense rating auxiliary domain to one in a sparse rating target domain, and has been widely studied in recent years. The clustering methods based on single-domain recommendation adopted by most CDR algorithms fail to effectively utilize overlapping information and sufficiently adapt to CDR, resulting in inaccurate clustering results. In CDR, graph convolution network (GCN) methods can fully utilize the associations between nodes to improve recommendation accuracy. However, GCN-based CDR often employs static graph learning for node embedding, ignoring the fact that user preferences may change with different recommendation scenarios, which causes poor model performance across different recommendation tasks and ineffective mitigation of data sparsity. To this end, a multi-layer recurrent GCN CDR model based on a pseudo-overlap detection mechanism is proposed. Firstly, by fully leveraging overlapping data based on the community clustering algorithm Louvain, a pseudo-overlap detection mechanism is designed to mine user trust relationships and similar user communities, thereby enhancing the adaptability and accuracy of clustering algorithms in CDR. Secondly, a multi-layer recurrent GCN consisting of an embedding learning module and a graph learning module is proposed to learn dynamic domain-shared features, domain-specific features, and dynamic graph structures. By conducting iterative enhancement of the two modules, the latest user preferences are obtained to alleviate data sparsity. Finally, a multi-layer perceptron (MLP) is employed to model user-item interactions and obtain predicted ratings. Comparative results with 12 related models across four groups of data domains demonstrate the effectiveness of the proposed method, with average improvements of 5.47%, 3.44%, and 2.38% in MRR, NDCG, and HR metrics respectively.
GOU Xiang-Yang , ZOU Lei , YU Xu
2025, 36(9):4349-4372. DOI: 10.13328/j.cnki.jos.007256 CSTR: 32375.14.jos.007256
Abstract:In recent years, graph stream analysis has gained increasing importance in both research and industry. A graph stream is a continuous sequence of edges received from a data source at a high speed. Those edges form a dynamic graph that is continuously changing. Various analyses can be performed on graph streams. Among them, triangle counting is one of the most basic operations. However, the large volume and high update speed of graph streams make it inefficient to count triangles accurately on them. It is also unnecessary, as most applications for triangle counting can tolerate small errors. Therefore, approximate triangle counting in graph streams has been a hot research topic. This study focuses on sample-based approximate triangle counting in graph streams with a sliding window model. Sliding window models focus on the most recent edges in a graph stream and consider older edges as expired. They are widely applied in various industrial scenarios and research. This study combines a count-before-sample strategy with the state-of-the-art approximate triangle counting algorithm and designs a set of novel strategies to deal with the difficulty brought by edge expiration. Extensive experiments are conducted on real-world datasets to evaluate the proposed algorithm. Results prove that the algorithm decreases the estimation error of the state-of-the-art method by more than 70%.
LI Jun-Xia , SU Jing-Feng , CUI Ying , LIU Qing-Shan
2025, 36(9):4373-4387. DOI: 10.13328/j.cnki.jos.007265 CSTR: 32375.14.jos.007265
Abstract:Image-level weakly supervised semantic segmentation usually uses convolutional neural networks (CNNs) to generate class activation maps to accurately locate targets. However, CNNs have a limited capacity to perceive global information, which results in excessively narrow foregrounds. Recently, Transformer-based weakly supervised semantic segmentation has utilized self-attention mechanisms to capture global dependencies, addressing the inherent defects of CNNs. Nevertheless, the initial class activation map generated by a Transformer often introduces a lot of background noise around the target area, resulting in unsatisfactory performance if used directly. This study comprehensively utilizes both class-to-patch and patch-to-patch attention generated by a Transformer to optimize the initial class activation map. At the same time, a semantic modulation strategy is designed to correct errors in the class-to-patch attention, using the semantic context information of the patch-to-patch attention. Finally, a class activation map that accurately covers more target areas is obtained. On this basis, a novel model for weakly supervised semantic segmentation based on a Transformer is constructed. The mIoU of the proposed method reaches 72.7% and 71.9% on the PASCAL VOC 2012 validation and test sets, respectively, and 42.3% on the MS COCO 2014 validation set, demonstrating that the proposed method achieves improved performance in weakly supervised semantic segmentation.
ZHU Jian-Qiu , HUA Yang , SONG Xiao-Ning
2025, 36(9):4388-4402. DOI: 10.13328/j.cnki.jos.007285 CSTR: 32375.14.jos.007285
Abstract:Face anti-spoofing is a powerful guarantee for the practical security of facial recognition technology. However, the constant evolution of live attack methods poses significant challenges to existing detection methods. To address the increasing number of unknown scenarios and attack methods, a two-stream face anti-spoofing model based on visual attention and domain feature fusion is proposed. First, a visual attention-based feature extraction module is proposed to strengthen the model’s capacity to extract content features based on global information. Second, a novel style feature fusion module is designed to optimize the feature representation of the sample by fusing content features with low-level textural style features. Third, a feature mapping strategy based on the Siamese network is developed and the contrast loss function is modified to improve the model robustness and avoid easy gradient oscillation during training, respectively. Furthermore, domain adversarial training (DAT) is used to reduce the sensitivity of the model to differences between sample data domains and further improve its generalization. Extensive experimental results verify the generality and strong robustness of the proposed method, demonstrating that it outperforms existing models in cross-domain performance on mainstream datasets.