Volume 36,Issue 11,2025 Table of Contents

Normalized Adaptive Variance Reduction Method

JIANG Wei , YANG Si-Fan , WANG Yi-Bo , ZHANG Li-Jun

2025, 36(11):4893-4905. DOI: 10.13328/j.cnki.jos.007383

Abstract (103) HTML (0) PDF 1.41 M (250) Comment (0) Favorites

Abstract:Stochastic optimization algorithms are recognized as essential for addressing large-scale data and complex models in machine learning. Among these, variance reduction methods, such as the STORM algorithm, have gained attention for their ability to achieve optimal convergence rates of $ {\mathrm{O}}\left({T}^{-1/3}\right) $. However, traditional variance reduction methods typically depend on specific problem parameters (e.g., the smoothness constant, noise variance, and gradient upper bound) for setting the learning rate and momentum, limiting their practical applicability. To overcome this limitation, this study proposes an adaptive variance reduction method based on a normalization technique, which eliminates the need for prior knowledge of problem parameters while maintaining optimal convergence rates. Compared to existing adaptive variance reduction methods, the proposed approach offers several advantages: (1) no reliance on additional assumptions, such as bounded gradients, bounded function values, or excessively large initial batch sizes; (2) the achievement of the optimal convergence rate of $ {\mathrm{O}}\left({T}^{-1/3}\right) $without extra term of $ {\mathrm{O}}\left(\mathrm{log}T\right)$; (3) a concise and straightforward proof, facilitating extensions to other stochastic optimization problems. The superiority of the proposed method is further validated through numerical experiments, demonstrating enhanced performance when compared to other approaches.

Deep-learning-driven Software Vulnerability Prediction: Problems, Progress, and Challenges

TANG Jia-Xin , WANG Xuan , LAI Wei , LU Ze-Yu , GUO Zhao-Qiang , YANG Yi-Biao , ZHOU Yu-Ming

2025, 36(11):4906-4952. DOI: 10.13328/j.cnki.jos.007376

Abstract (802) HTML (0) PDF 8.34 M (229) Comment (0) Favorites

Abstract:Software vulnerabilities are code segments in software that are prone to exploitation. Ensuring that software is not easily attacked is a crucial security requirement in software development. Software vulnerability prediction involves analyzing and predicting potential vulnerabilities in software code. Deep learning-driven software vulnerability prediction has become a popular research field in recent years, with a long time span, numerous studies, and substantial research achievements. To review relevant research findings and summarize the research hotspots, a survey of 151 studies related to deep learning-driven software vulnerability prediction published between 2017 and 2024 is conducted. It summarizes the research problems, progress, and challenges discussed in the literature, providing a reference for future research.

Formal Analysis of Cross-chain Protocol IBC

WEI Qiu-Yang , ZHAO Xu-Feng , ZHU Xue-Yang , ZHANG Wen-Hui , LU Yi-Han

2025, 36(11):4953-4974. DOI: 10.13328/j.cnki.jos.007356

Abstract (1018) HTML (0) PDF 6.61 M (272) Comment (0) Favorites

Abstract:Since the advent of Bitcoin, blockchain technology has profoundly influenced numerous fields. However, the absence of effective communication mechanisms between heterogeneous and isolated blockchain systems has hindered the advancement and sustainable development of the blockchain ecosystem. In response, cross-chain technology has emerged as a rapidly evolving field and a focal point of research. The decentralized nature of blockchain, coupled with the complexity of cross-chain scenarios, introduces significant security challenges. This study proposes a formal analysis of the IBC (inter-blockchain communications) protocol, one of the most widely adopted cross-chain communication protocols, to assist developers in designing and implementing cross-chain technologies with enhanced security. The IBC protocol is formalized using TLA+, a temporal logic specification language, and its critical properties are verified through the model-checking tool TLC. An in-depth analysis of the verification results reveals several issues impacting the correctness of packet transmission and token transfer. Corresponding recommendations are proposed to mitigate these security risks. The findings have been reported to the IBC developer community, with most of them receiving acknowledgment.

GONG Yuan-Jun , HUANG Jian-Jun , YOU Wei , SHI Wen-Chang , LIANG Bin , BIAN Pan , ZHANG Jian

2025, 36(11):4975-4989. DOI: 10.13328/j.cnki.jos.007362

Abstract (108) HTML (0) PDF 894.01 K (254) Comment (0) Favorites

Abstract:The longest common subsequence (LCS) is a practical metric for assessing code similarity. However, traditional LCS-based methods face challenges in scalability and in effectively capturing critical semantics for identifying code fragments that are textually different but semantically similar, due to their reliance on discrete representation-based token encoding. To address these limitations, this study proposes an LCS-oriented embedding method that encodes code into low-dimensional dense vectors, effectively capturing semantic information. This transformation enables the computationally expensive LCS calculation to be replaced with efficient vector arithmetic, further accelerated using an approximate nearest neighbor algorithm. To support this approach, an embeddable LCS-based distance metric is developed, as the original LCS metric is non-embeddable. Experimental results demonstrate that the proposed metric outperforms tree-based and literal similarity metrics in detecting complex code clones. In addition, two targeted loss functions and corresponding training datasets are designed to prioritize retaining critical semantics in the embedding process, allowing the model to identify textually different but semantically similar code elements. This improves performance in detecting complex code similarities. The proposed method demonstrates strong scalability and high accuracy in detecting complex clones. When applied to similar bug identification, it has reported 23 previously unknown bugs, all of which are confirmed by developers in real-world projects. Notably, several of these bugs are complex and challenging to detect using traditional LCS-based techniques.

Formal Verification Techniques for Implementing Post-quantum Cryptography Falcon

TIAN Xin-Lei , DONG Yi-Yi , ZHANG Ji-Xian , WANG Wei-Jia

2025, 36(11):4990-5007. DOI: 10.13328/j.cnki.jos.007368

Abstract (35) HTML (0) PDF 4.74 M (24) Comment (0) Favorites

Abstract:Falcon, a post-quantum digital signature algorithm, has been selected as one of the first schemes standardized by the National Institute of Standards and Technology (NIST). Its core algorithms, however, are highly error-prone in practical implementations, raising risks of cryptographic misuse. Ensuring the correctness of Falcon through formal verification is therefore essential. In this work, this study introduces a comprehensive proof framework that bridges the gap between Falcon’s mathematical specification and its real-world implementation. Within the EasyCrypt proof system, this study formally verifies the correctness of Falcon’s Montgomery modular multiplication, NTT, and FFT algorithms, and further explores proof techniques for integer Gaussian sampling. Moreover, this study presents and optimizes Falcon’s signing and verification implementations using Jasmin hybrid programming, thereby providing both formal correctness guarantees and practical efficiency.

Issue-based LLM Retrieval Augmentation for Generating Supplementary Code Comments

PAN Xing-Lu , ZHAO Xian-Lin , LIU Chen-Xiao , ZOU Yan-Zhen , XIE Bing

2025, 36(11):5008-5030. DOI: 10.13328/j.cnki.jos.007369

Abstract (173) HTML (0) PDF 9.35 M (325) Comment (0) Favorites

Abstract:With the widespread adoption of programming naming conventions and the increasing emphasis on self-explanatory code, traditional summarizing code comments, which are often similar to code literal meaning, are losing appeal among developers. Instead, developers value supplementary code comments that provide additional information beyond the code itself to facilitate program understanding and maintenance. However, generating such comments typically requires external information resources beyond the code base, and the diversity of supplementary information presents significant challenges to existing methods. This study leverages Issue reports as a crucial external information source and proposes an Issue-based retrieval augmentation method using large language models (LLMs) to generate supplementary code comments. The proposed method classifies the supplementary information found in Issue reports into five categories, retrieves Issue sentences containing this information, and generates corresponding comments using LLMs. In addition, the code relevance and Issue verifiability of the generated comments are evaluated to minimize hallucinations. Experiments conducted on two popular LLMs, ChatGPT and GPT-4o, demonstrate the effectiveness of the proposed method. Compared to existing approaches, the proposed method significantly improves the coverage of manual supplementary comments from 33.6% to 72.2% for ChatGPT and from 35.8% to 88.4% for GPT-4o. Moreover, the generated comments offer developers valuable supplementary information, proving essential for understanding some tricky code.

Structural-entropy-based Anomaly Detection in Attributed Graph

WU Jiang-Hao , DUAN Liang , YUE Kun , LI Ang-Sheng , YANG Pei-Zhong

2025, 36(11):5031-5044. DOI: 10.13328/j.cnki.jos.007374

Abstract (98) HTML (0) PDF 5.55 M (328) Comment (0) Favorites

Abstract:Attributed graphs are increasingly used to represent data with relational structures, and detecting anomalies with them is gaining attention. Due to their characteristics, such as rich attribute information and complex structural relationships, various types of anomalies may exist, including global, structural, and community anomalies, which often remain hidden within the graph’s deep structure. Existing methods face challenges such as loss of structural information and difficulty identifying abnormal nodes. Structural information theory leverages encoding trees to represent hierarchical relationships within data and establishes correlations across different levels by minimizing structural entropy, effectively capturing the graph’s essential structure. This study proposes an anomaly detection method for attributed graphs based on structural entropy. First, by integrating the structural and attribute information of attributed graphs, a K-dimensional encoding tree to represent the hierarchical community structure through structural entropy minimization is constructed. Next, using the node attributes and hierarchical community information within the encoding tree, scoring mechanisms for detecting structural and attribute anomalies based on Euclidean distance and connection strength between nodes are designed. This approach identifies abnormal nodes and detects various types of anomalies. The proposed method is evaluated through comparative tests on several attributed graph datasets. Experimental results demonstrate that the proposed method effectively detects different types of anomalies and significantly outperforms existing state-of-the-art methods.

Multi-class Vulnerability Detection with Structure-aware Graph Neural Network

CAO Si-Cong , SUN Xiao-Bing , BO Li-Li , WU Xiao-Xue , LI Bin , CHEN Ting , LUO Xia-Pu , ZHANG Tao , LIU Wei

2025, 36(11):5045-5061. DOI: 10.13328/j.cnki.jos.007375

Abstract (1224) HTML (0) PDF 2.01 M (460) Comment (0) Favorites

Abstract:Software vulnerabilities pose significant threats to real-world systems. In recent years, learning-based vulnerability detection methods, especially deep learning-based approaches, have gained widespread attention due to their ability to extract implicit vulnerability features from large-scale vulnerability samples. However, due to differences in features among different types of vulnerabilities and the problem of imbalanced data distribution, existing deep learning-based vulnerability detection methods struggle to accurately identify specific vulnerability types. To address this issue, this study proposes MulVD, a deep learning-based multi-class vulnerability detection method. MulVD constructs a structure-aware graph neural network (SA-GNN) that can adaptively extract local and representative vulnerability patterns while rebalancing the data distribution without introducing noise. The effectiveness of the proposed approach in both binary and multi-class vulnerability detection tasks is evaluated. Experimental results demonstrate that MulVD significantly improves the performance of existing deep learning-based vulnerability detection techniques.

Black-box Adversarial Attack for Deep Vulnerability Detection Model

QU Yu-Bin , HUANG Song , CHEN Xiang , WANG Xing-Ya , LI Long , WANG Dan , YAO Yong-Ming , JU Xiao-Lin

2025, 36(11):5062-5081. DOI: 10.13328/j.cnki.jos.007379

Abstract (684) HTML (0) PDF 4.47 M (183) Comment (0) Favorites

Abstract:In recent years, impressive capabilities have been demonstrated by deep learning-based vulnerability detection models in detecting vulnerabilities. Previous research has widely explored adversarial attacks using variable renaming to introduce disturbances in source code and evade detection. However, the effectiveness of introducing multiple disturbances through various transformation techniques in source code has not been adequately investigated. In this study, multiple synonymous transformation operators are applied to introduce disturbances in source code. A combination optimization strategy based on genetic algorithms is proposed, enabling the selection of source code transformation operators with the highest fitness to guide the generation of adversarial code segments capable of evading vulnerability detection. The proposed method is implemented in a framework named non-vulnerability generator (NonVulGen) and evaluated against deep learning-based vulnerability detection models. When applied to recently developed deep learning models, an average attack success rate of 91.38% is achieved against the CodeBERT-based model and 93.65% against the GraphCodeBERT-based model, representing improvements of 28.94% and 15.52% over state-of-the-art baselines, respectively. To assess the generalization ability of the proposed attack method, common models including Devign, ReGVD, and LineVul are targeted, achieving average success rates of 98.88%, 97.85%, and 92.57%, respectively. Experimental results indicate that adversarial code segments generated by NonVulGenx cannot be effectively distinguished by deep learning-based vulnerability detection models. Furthermore, significant reductions in attack success rates are observed after retraining the models with adversarial samples generated based on the training data, with a decrease of 96.83% for CodeBERT, 97.12% for GraphCodeBERT, 98.79% for Devign, 98.57% for ReGVD, and 97.94% for LineVul. These findings reveal the critical challenge of adversarial attacks in deep learning-based vulnerability detection models and highlight the necessity for model reinforcement before deployment.

Identification of Maintenance Status in Open-source Software Projects Based on Machine Learning

LUO Shi-Yu , LI Xin-Lei , LUO Jun-Tao , WANG Xin , ZHANG Guo-Feng , CHEN Yang

2025, 36(11):5082-5101. DOI: 10.13328/j.cnki.jos.007380

Abstract (80) HTML (0) PDF 7.48 M (198) Comment (0) Favorites

Abstract:With the widespread adoption and rapid advancement of open-source software, the maintenance of open-source software projects has become a critical phase within the software development cycle. As a globally representative developer community, GitHub hosts numerous software project repositories with similar functionalities within the same domain, creating challenges for users when selecting the appropriate project repository for use or further development. Therefore, accurate identification of project repository maintenance status holds substantial practical value. However, the GitHub platform does not provide direct metrics for assessing the maintenance status of repositories. This study proposes an automatic identification method for project repository maintenance status based on machine learning. A classification model, GitMT, has been developed and implemented to achieve this objective. By effectively integrating dynamic time series features and descriptive features, the proposed model enables accurate identification of “active” and “unmaintained” repository status. Through a series of experiments conducted on large-scale real-world data, an AUC value of 0.964 is achieved in maintenance status identification tasks. In addition, this study constructs an open-source dataset centered on the maintenance status of software project repositories—GitMT Dataset: https://doi.org/10.7910/DVN/OJ2NI3.

Code to Policy Consistency Detection for Mini Program Based on Semantic Analysis

LIU Li-Pei , MAO Jian , LIN Qi-Xiao , LYU Yu-Song , LI Jia-Wei , LIU Jian-Wei

2025, 36(11):5102-5117. DOI: 10.13328/j.cnki.jos.007387

Abstract (82) HTML (0) PDF 2.66 M (103) Comment (0) Favorites

Abstract:Mini programs are required to provide privacy policies to inform users about the types and purposes of the privacy data being collected and used. However, inconsistencies between the underlying codes and the privacy statements may occur, potentially deceiving users and leading to privacy leakage. Existing methods for detecting such inconsistencies typically rely on converting the code and policies into predefined labels for comparison. This approach introduces information loss during label conversion, resulting in underreporting. In addition, traditional code analysis methods are often ineffective against obfuscated mini program code. To address these limitations, a semantic-analysis-based method for code-to-policy consistency detection in mini programs is proposed. Customized taint analysis is utilized to capture code behaviors based on mini program coding paradigms, and a code language processing model is applied to represent these behaviors as natural language descriptions. By aligning the natural language representation of code behaviors with the stated purposes in privacy policies, expert reviewers can analyze the consistency between the two effectively. Experiments indicate that the proposed taint analysis module covers all three data return methods and four common data flow patterns within mini programs APIs, achieving superior sensitivity compared to existing methods. Semantic analysis of tens of thousands of mini programs reveals privacy leakage risks associated with certain high-frequency API calls. Case studies using the MiniChecker tool further identify real-world instances of mini programs where inconsistencies between code and privacy policies are detected.

Visual-language Multimodal Pre-training Based on Multi-entity Alignment

LI Deng , WU A-Ming , HAN Ya-Hong

2025, 36(11):5118-5133. DOI: 10.13328/j.cnki.jos.007321

Abstract (131) HTML (0) PDF 2.92 M (251) Comment (0) Favorites

Abstract:Visual-language pre-training (VLP) aims to obtain a powerful multimodal representation by learning on a large-scale image-text multimodal dataset. Multimodal feature fusion and alignment is a key challenge in multimodal model training. In most of the existing visual-language pre-training models, for the multimodal feature fusion and alignment problem, the main approach is that the extracted visual features and text features are directly input into the Transformer model. Since the attention mechanism in the Transformer calculates the similarity between pairs, it is difficult to achieve the alignment among multiple entities. Considering that the hyperedges of hypergraph neural networks possess the characteristics of connecting multiple entities and encoding high-order entity correlations, thus enabling the establishment of relationships among multiple entities. In this study, a visual-language multimodal model pre-training method based on multi-entity alignment of hypergraph neural networks is proposed. In this method, the hypergraph neural network learning module is introduced into the Transformer multi-modal fusion encoder to learn the alignment relationship of multi-modal entities, thereby enhancing the entity alignment ability of the multi-modal fusion encoder in the pre-training model. The proposed visual-language pre-training model is pre-trained on the large-scale image-text datasets and fine-tuned on multiple visual-language downstream tasks such as visual question answering, image-text retrieval, visual grounding, and natural language visual reasoning. The experimental results indicate that compared with the baseline method, the proposed method has performance improvements in multiple downstream tasks, among which the accuracy is improved by 1.8% on the NLVR² task.

Collective Emotional Stabilization Method for Social Network Rumor Detection

YIN Ming , QIAO Sheng , CHEN Wei , JIANG Ji-Jiao

2025, 36(11):5134-5157. DOI: 10.13328/j.cnki.jos.007322

Abstract (133) HTML (0) PDF 2.49 M (455) Comment (0) Favorites

Abstract:There are numerous and miscellaneous sources of online information. Judging whether it is a rumor in a timely and accurate manner is a crucial issue in the research of the cognitive domain of social media. Most of the previous studies have mainly concentrated on the text content of rumors, user characteristics, or the inherent features confined to the propagation mode, ignoring the key clues of the collective emotions generated by users’ participation in event discussions and the emotional steady-state characteristics hidden in the spread of rumors. In this study, a social network rumor detection method that is oriented by collective emotional stabilization and integrates temporal and spatial steady-state features is proposed. Based on the text features and user behaviors in rumor propagation, the temporal and spatial relationship steady-state features of collective emotions are combined for the first time, which can achieve strong expressiveness and detection accuracy. Specifically, this method takes the emotional keywords of users’ attitude towards a certain event or topic as the basis and uses recurrent neural networks to construct emotional steady-state features of the temporal relationship, enabling the collective emotions to have temporally consistent features with strong expressiveness, which can reflect the convergence effect of the collective emotions over time. The heterogeneous graph neural network is utilized to establish the connections between users and keywords, as well as between texts and keywords so that the collective emotions possess the fine-grained collective emotional steady-state features of the spatial relationship. Finally, the two types of local steady-state features are fused, possessing globality and improving the feature expression. Further classification can obtain the rumor detection results. The proposed method is run on two internationally publicly available and widely used Twitter datasets. Compared with the best-performing method in the baselines, the accuracy is improved by 3.4% and 3.2% respectively; the T-F1 value is improved by 3.0% and 1.8% respectively; the N-F1 value is improved by 2.7% and 2.3% respectively; the U-F1 value is improved by 2.3% and 1.0% respectively.

Event Coreference Resolution Method Enhanced by External Knowledge

XU Sheng , LI Pei-Feng , ZHU Qiao-Ming

2025, 36(11):5158-5177. DOI: 10.13328/j.cnki.jos.007367

Abstract (29) HTML (0) PDF 3.75 M (26) Comment (0) Favorites

Abstract:The diversity and complexity of linguistic expressions often lead to event coreference relations being reflected as latent correlations between event mentions. Existing methods predominantly rely on semantic similarity computations based on internal event features, such as triggers and arguments, which limits their ability to address such latent correlations effectively. To overcome this limitation, an external knowledge-enhanced event coreference resolution method is proposed. This approach leverages large language models (LLMs) to generate external knowledge related to coreference, encompassing discourse coherence, logical relationships, and common sense background knowledge. First, the ultra-large language model ChatGPT is utilized to construct training data enriched with external knowledge. Next, foundational LLMs like FlanT5 are fine-tuned on this data to acquire the ability to generate coreference-related external knowledge. Finally, the fine-tuned LLM generates document-level event summaries and chain-of-thought (CoT) style coreference reasoning paths. By integrating internal event features with external knowledge, the proposed method effectively identifies event coreference. Experimental results on the KBP dataset demonstrate that the proposed method outperforms previous state-of-the-art baselines.

?Multi-source Chinese Medical Knowledge Graph Entity ?Alignment via Entity Semantics and Ontology Information

DING Rui-Qing , ZHAO Jun-Feng , WANG Le-Ye

2025, 36(11):5178-5196. DOI: 10.13328/j.cnki.jos.007370

Abstract (198) HTML (0) PDF 3.97 M (506) Comment (0) Favorites

Abstract:Knowledge graph (KG), as structured representations of knowledge, has a wide range of applications in the medical field. Entity alignment, which involves identifying equivalent entities across different KGs, is a fundamental step in constructing large-scale KGs. Although extensive research has focused on this issue, most of it has concentrated on aligning pairs of KGs, typically by capturing the semantic and structural information of entities to generate embeddings, followed by calculating embedding similarity to identify equivalent entities. This study identifies the problem of alignment error propagation when aligning multiple KGs. Given the high accuracy requirements for entity alignment in medical contexts, we propose a multi-source Chinese medical knowledge graph entity alignment method (MSOI-Align) that integrates entity semantics and ontology information. Our method pairs multiple KGs and uses representation learning to generate entity embeddings. It also incorporates both the similarity of entity names and ontology consistency constraints, leveraging a large language model to filter a set of candidate entities. Subsequently, based on triadic closure theory and the large language model, MSOI-Align automatically identifies and corrects the propagation of alignment errors for the candidate entities. Experimental results on four Chinese medical knowledge graphs show that MSOI-Align significantly enhances the precision of the entity alignment task, with the Hits@1 metric increasing from 0.42 to 0.92 compared to the state-of-the-art baseline. The fused knowledge graph, CMKG, contains 13 types of ontologies, 190000 entities, and approximately 700000 triplets. Due to copyright restrictions on one of the KGs, we are releasing the fusion of the other three KGs, named OpenCMKG.

Heterogeneous Graph Attention Network for Entity Alignment

SUN Chen-Chen , JIN Yu-Yuan , SHEN De-Rong , NIE Tie-Zheng , KOU Yue

2025, 36(11):5197-5212. DOI: 10.13328/j.cnki.jos.007371

Abstract (91) HTML (0) PDF 6.09 M (156) Comment (0) Favorites

Abstract:Entity alignment (EA) aims to identify equivalent entities across different knowledge graph (KG). Embedding-based EA methods still have several limitations, listed below. First, the heterogeneous structures within KGs are not fully modeled. Second, the utilization of text information is constrained by word embeddings. Third, alignment inference algorithms are underexplored. To address these limitations, we propose a heterogeneous graph attention network for entity alignment (HGAT-EA). HGAT-EA consists of two channels: one for learning structural embeddings and the other for learning character-level semantic embeddings. The first channel employs a heterogeneous graph attention network (HGAT), which fully leverages heterogeneous structures and relation triples to learn entity embeddings. The second channel utilizes character-level literals to learn character-level semantic embeddings. HGAT-EA incorporates multiple views through these channels and maximizes the use of heterogeneous structures through HGAT. HGAT-EA introduces three alignment inference algorithms. Experimental results validate the effectiveness of HGAT-EA. Following these results, we provide detailed analyses of the various components of HGAT-EA and present the corresponding conclusions.

Chinese Idiom Misuse Diagnosis Based on Levitating Injection of Interpretation Knowledge

HE Liang , CAO Yong-Chang , HUANG Yan-Chen , WU Zhen , DAI Xin-Yu , CHEN Jia-Jun

2025, 36(11):5213-5226. DOI: 10.13328/j.cnki.jos.007373

Abstract (77) HTML (0) PDF 6.80 M (217) Comment (0) Favorites

Abstract:Chinese idioms, as an essential part of Chinese writing, possess concise expressiveness and profound cultural significance. They are typically phrases or short sentences that have become fixed through long-term use, with diverse origins and relatively stable meanings. However, due to the pictographic nature of Chinese characters and the historical evolution of Chinese vocabulary and semantics, there is often a discrepancy between the literal and actual meanings of idioms, which exhibits a unique non-compositional characteristic. This feature makes idioms prone to misuse of idioms in practice, with research showing that certain idioms are misused at a rate as high as 98.6%. Unlike in other languages, the misuse of Chinese idioms does not typically result in lexical or grammatical errors, which makes traditional spelling and grammar error detection methods ineffective at identifying idiom misuse. An intuitive approach is to incorporate the interpretations of idioms into the model, but simply combining these interpretations can lead to problems such as excessively long sentences that are hard to process and noise in knowledge. To address this, this study proposes a novel model that uses levitating knowledge injection to incorporate idiom interpretations. This model introduces learnable weight factors to control the injection process and explores effective strategies for knowledge infusion. To validate the model’s effectiveness, a dataset specifically for diagnosing the misuse of Chinese idioms is created. Experimental results show that the model achieves optimal performance across all test sets, particularly in complex scenarios involving long texts and multiple idioms, where its performance improves by 12.4%–13.9% compared to the baseline model. At the same time, training speed increases by 30%–40%, and testing speed is improved by 90%. These results demonstrate that the proposed model not only effectively integrates the interpretative features of idioms but also significantly reduces the negative impact of interpretation concatenation on the model’s processing capacity and efficiency, thus enhancing the performance of Chinese idiom misuse diagnosis and strengthening the model’s ability to handle complex scenarios with multiple idioms and lengthy interpretations.

TaGNN: Tendency-aware Graph Neural Network for Water Quality Prediction with Coupled Noise

SUN Jian-Ming , XU Yu-Yang , TONG Shuo , YING Hao-Chao , ZHANG Xiao , ZHUANG Fu-Zhen , WU Jian

2025, 36(11):5227-5240. DOI: 10.13328/j.cnki.jos.007381

Abstract (651) HTML (0) PDF 8.31 M (235) Comment (0) Favorites

Abstract:The prediction of future water quality, which involves leveraging historical water quality data from various observation nodes and their corresponding topological relationships, is recognized as a critical application of graph neural networks in environmental protection. This task is complicated by the presence of noise within both the collected numerical data and the inter-node topological structures, compounded by a coupling phenomenon. The varying directions of pollutant flow intensify the complexity of coupling between numerical and structural noise. To address these challenges, a novel tendency-aware graph neural network is proposed for water quality prediction with coupled noise. First, historical water quality trend features are used to uncover local interdependencies among raw water quality indicators, enabling the construction of multiple potential hydrological topological structures and the disentanglement of structural noise. Second, spatio-temporal features are extracted from the constructed adjacency matrices and original data to separate numerical noise. Finally, water quality predictions are obtained by aggregating coherent node representations derived from the inferred latent structures across pre- and post-structure construction phases. Experimental results demonstrate that the proposed method outperforms state-of-the-art models on real-world datasets and generates potential hydrological topological structures that closely align with actual observations. The code and data are publicly available on GitHub: https://github.com/aTongs1/TaGNN.

Review on mmWave-based Human Perception

XI Rui , ZHANG Jia , SUN Yi-Miao , HE Yuan

2025, 36(11):5242-5274. DOI: 10.13328/j.cnki.jos.007378

Abstract (115) HTML (0) PDF 7.37 M (166) Comment (0) Favorites

Abstract:With the rapid development of embedded technology, mobile computing, and the Internet of Things (IoT), an increasing number of sensing devices have been integrated into people’s daily lives, including smartphones, cameras, smart bracelets, smart routers, and headsets. The sensors embedded in these devices facilitate the collection of personal information such as location, activities, vital signs, and social interactions, thus fostering a new class of applications known as human-centric sensing. Compared with traditional sensing methods, including wearable-based, vision-based, and wireless signal-based sensing, millimeter wave (mmWave) signals offer numerous advantages, such as high accuracy, non-line-of-sight capability, passive sensing (without requiring users to carry sensors), high spatiotemporal resolution, easy deployment, and robust environmental adaptability. The advantages of mmWave-based sensing have made it a research focus in both academia and industry in recent years, enabling non-contact, fine-grained perception of human activities and physical signs. Based on an overview of recent studies, the background and research significance of mmWave-based human sensing are examined. The existing methods are categorized into four main areas: tracking and positioning, motion recognition, biometric measurement, and human imaging. Commonly used publicly available datasets are also introduced. Finally, potential research challenges and future directions are discussed, highlighting promising developments toward achieving accurate, ubiquitous, and stable human perception.

Research Advances in Programmable Switches Driven Network Security

ZOU Zhi-Kai , ZHANG Meng-Hao , LI Guan-Yu , YANG Ren-Yu , WO Tian-Yu , HU Chun-Ming , XU Ming-Wei

2025, 36(11):5276-5297. DOI: 10.13328/j.cnki.jos.007385

Abstract (534) HTML (0) PDF 7.35 M (216) Comment (0) Favorites

Abstract:With the rapid growth of network applications such as cloud computing, mobile internet, and artificial intelligence, network attacks and threats are becoming increasingly frequent and complex. This necessitates the development of network security defense technologies capable of effectively countering these threats and ensuring the security of critical infrastructure networks. Traditional defense technologies based on middleboxes can achieve high performance using specialized hardware; however, these solutions are costly, and deploying new defenses typically requires hardware upgrades. Software-based defense technologies offer high flexibility, but software-based packet processing leads to significant performance overhead. The emergence of programmable switches presents new opportunities for network security defense by offering notable advantages in both flexibility and performance, making this a prominent research focus. This study first reviews the origin and architecture of programmable switches and explores their relevant features and advantages in network security applications, including ease of management, low cost, high flexibility, and high performance. Subsequently, from the perspective of the basic triad of network security defense, namely prevention, detection, and response, this study systematically elaborates on various defense techniques utilizing programmable switches, such as access control, network scanning, network obfuscation, deep packet inspection, DDoS detection and mitigation, and intelligent data planes. The design principles, implementation mechanisms, and potential limitations of these technologies are analyzed. Finally, an outlook is provided on future research directions for network security based on programmable switches.

Legalization Principles and Transformation Verification Model of Smart Contracts

LI Ren-Xiang , JIANG Zhong-Yuan , GAO Sheng , QIAN Xiao , SHEN Xiu-Xuan , LIU Bing-Cheng , TAO Mei-Yue , MA Jian-Feng

2025, 36(11):5298-5335. DOI: 10.13328/j.cnki.jos.007386

Abstract (83) HTML (0) PDF 9.89 M (144) Comment (0) Favorites

Abstract:Contract cases account for a substantial proportion of daily civil disputes, reflecting a considerable volume. The limited accessibility and cumbersome management of traditional paper contracts have significantly hindered the efficiency of contract execution and dispute resolution. As a computer protocol designed to execute contract terms, smart contracts offer new possibilities for the execution and processing of legal contracts, with advantages such as automated execution, decentralization, and immutability. However, their reliance on strict programming logic, lack of interpretative flexibility, and difficulty in dynamic adjustments after deployment constrain the intentions of contract participants and result in uncertainties regarding legal applicability and binding force. Based on the distinctions between legal contracts and smart contracts, this study proposes four key principles, including grammatical requirements, the non-empowerment principle, validity review, and security criteria, providing a theoretical framework for generating and executing legally effective smart contracts. A smart contract transformation and verification model is further designed to adhere to these four principles. The proposed model enhances the processing of legal contracts expressed as transition systems, prevents re-entry attacks, and converts core and additional specifications into computational tree logic for security property verification. Contract passing verification is automatically converted into smart contracts. The entire transformation process complies with the proposed four principles, ensuring that the resulting smart contracts meet current legal standards and can be regarded as legal contracts. Experimental validation includes a simplified sales contract as a case study, demonstrating its initial and enhanced transition system models, partial verification results, and the representative Solidity code generated. The pre-processing operation yields a high-quality dataset constructed from 270592 samples. Consistency evaluation between contract terms and legal provisions achieves Recall rates of 90.27% at R@1, 97.91% at R@5, and 99.30% at R@10. The feature extraction model, aided by a format conversion tool with nearly 100% fidelity, achieves 91.87% accuracy at the token level, confirming the model’s accuracy and reliability. The findings indicate that the proposed principles are highly feasible, while the transformation and verification model effectively addresses the cumbersome nature of paper contract processing, enhances the convenience and flexibility of legal contract execution and management, and enables smart contracts to obtain legal protection while mitigating potential risks.

Deterministic Scheduling Theory for Time-triggered Airborne Wireless Communication Network

WEN Qin , LI Shi-Ning , XU Jiang-Wei , JI Yuan , XIAO Lin-Jie , LI Yi-Ming , CHENG Tao

2025, 36(11):5336-5357. DOI: 10.13328/j.cnki.jos.007384

Abstract (44) HTML (0) PDF 9.71 M (178) Comment (0) Favorites

Abstract:To meet the economic design requirements of aircraft and reduce internal payload, the transition from wired to wireless networks has emerged as a key direction in the upgrade of airborne networks. However, traditional wireless technologies are unable to satisfy the real-time transmission requirements of time-triggered services in airborne networks. In this study, the application characteristics of the airborne wireless communication network (AWCN) are defined, and a hybrid topology is designed by integrating the AWCN with the airborne backbone switching network. By considering conflict-free nodes, interference-free channels, path dependencies, and end-to-end delay requirements, a first-order logic formulation for the deterministic scheduling of time-triggered AWCN is developed. The minimum number of time slots required for scheduling and the primary factors affecting end-to-end delay are theoretically analyzed under different channel configurations. In addition, the expected value of the information age for data flows at the gateway in a steady state is established. A scheduling method based on integer programming is designed, and an incremental solution strategy is proposed to address the low computational efficiency caused by the large number of decision variables and the high coupling of constraints in large-scale networks. The effectiveness of the deterministic scheduling model and theoretical analysis is validated through experiments, and the impact of various scheduling factors on total flow delay and scheduling scale is examined.

Detection of Timer Concurrency Bug in Linux Kernel

ZHOU Duo-Ming , MA Lin , ZHOU Ya-Jin

2025, 36(11):5358-5387. DOI: 10.13328/j.cnki.jos.007377

Abstract (96) HTML (0) PDF 6.30 M (165) Comment (0) Favorites

Abstract:A timer is used to schedule and execute delayed tasks in an operating system. It operates asynchronously in an atomic context and can execute concurrently with different threads at any time. If developers fail to account for all possible scenarios of multithread interleaving, various types of concurrency bugs may be introduced, posing a serious threat to the security of the operating system. Timer concurrency bugs are more difficult to detect than typical concurrency bugs because they involve not only multithread interleaving but also the delayed and repeated scheduling of timer handlers. Currently, there are no tools that can effectively detect such bugs. In this study, three types of timer concurrency bugs are summarized: sleeping timer bugs, timer deadlock bugs, and zombie timer bugs. To enhance detection efficiency, firstly, all timer-related code is extracted through pointer analysis, reducing unnecessary analysis overhead. A context-sensitive, path-sensitive, and flow-sensitive interprocedural control flow graph is then constructed to provide a foundation for subsequence analysis. Based on static analysis techniques, including call graph traversal, lockset analysis, points-to analysis, and control flow analysis, three detection algorithms are designed to identify the different types of timer concurrency bugs. To evaluate the effectiveness of the proposed algorithm, they are applied to the Linux 5.15 kernel, where 328 real-world timer concurrency bugs are detected. A total of 56 patches are submitted to the Linux kernel community, with 49 patches merged into the mainline kernel, 295 bugs confirmed and fixed, and 14 CVE identifiers assigned. These results demonstrate the effectiveness of the proposed method. Finally, a systematic analysis of performance, false positives, and false negatives is conducted through comparative experiments, and methods for repairing the three types of bugs are summarized.

微信小程序

微信服务号

微信订阅号

>Review Articles

>Review Articles

Current Issue

Volume

Issue