TAN Si-Wei , LU Li-Qiang , LANG Cong-Liang , CHEN Ming-Shuai , YIN Jian-Wei
2025, 36(8):3431-3443. DOI: 10.13328/j.cnki.jos.007343 CSTR: 32375.14.jos.007343
Abstract:Current quantum programs are generally represented by quantum circuits, including various quantum gates. If the program contains gates that are directly represented as unitary matrices, these gates need to be transformed into quantum circuits composed of basic gates. This step is called quantum circuit synthesis. However, current synthesis methods may generate circuits with thousands of gates. The quality of these quantum circuits is low and they are very likely to output incorrect results when deployed to real noisy quantum hardware. When the number of qubits is increased to 8 while ensuring a small number of gates, the quantum circuit synthesis takes weeks or even months. This study proposes a quantum circuit synthesis method, realizing the fast synthesis from unitary matrices to high-quality quantum circuits. Firstly, an iterative method is introduced to approximate the target unitary matrix by inserting circuit modules. During the iteration, a look-ahead strategy with a reward mechanism is proposed to reduce redundant quantum gates. In the acceleration process of quantum circuit synthesis, the study proposes a pruning method to reduce the space of candidate circuit modules. The method first describes the closure of each candidate circuit module to characterize the representation space of the circuit, and then prunes based on the overlap rate of the representation spaces of the modules, thus constructing a small and high-quality candidate set. Furthermore, to reduce the overhead of searching for optimal gate parameters, this study packs the selected candidates with the target unitary into a uniform circuit so that we can quickly obtain the approximation distance by calculating its expectation on the ground state. Experiments show that, compared with the current optimal quantum circuit synthesis methods QuCT and QFAST, this study reduces the number of gates to 37.0%–62.5%, and achieve a 3.7–20.6 times acceleration in the 5 to 8 qubit quantum circuit synthesis.
LIU Zong-Xin , CHI Zhi-Ming , ZHAO Meng-Yu , HUANG Cheng-Chao , HUANG Xiao-Wei , CAI Shao-Wei , ZHANG Li-Jun , YANG Peng-Fei
2025, 36(8):3444-3461. DOI: 10.13328/j.cnki.jos.007344 CSTR: 32375.14.jos.007344
Abstract:Constraint solving is a fundamental approach for verifying deep neural network (DNN). In the field of AI safety, DNNs often undergo modifications in their structure and parameters for purposes such as repair or attack. In such scenarios, the problem of incremental DNN verification is proposed, which aims to determine whether a safety property still holds after the DNN has been modified. To address this, an incremental satisfiability modulo theory (SMT) algorithm based on the Reluplex framework is presented. The proposed algorithm, DeepInc, leverages the key features of the configurations from the previous solving procedure, heuristically checking whether these features can be applied to prove the correctness of the modified DNN. Experimental results demonstrate that DeepInc outperforms Marabou in terms of efficiency in most cases. Moreover, for cases where the safety property is violated both before and after modification, DeepInc achieves significantly faster performance, even when compared to the state-of-the-art verifier α, β-CROWN.
ZHANG Shan-Qiang , ZHANG Jing-Zhi , SHI Zhi-Ping , WANG Guo-Hui , GUAN Yong
2025, 36(8):3462-3476. DOI: 10.13328/j.cnki.jos.007345 CSTR: 32375.14.jos.007345
Abstract:The single-sphere driven balancing robot is an omnidirectional mobile robot, whose flexibility is particularly evident in narrow or complex environments. During the design of the kinematics and dynamics for this type of robot, it is crucial to ensure the correctness of the model. Traditional methods based on testing and simulation may not cover all system states and thus might fail to identify certain design flaws or potential safety risks. To ensure that the single-sphere driven balancing robot satisfies the correctness and safety verification requirements for safety-critical robots, a formal model of its kinematics and dynamics is constructed using the HOL Light theorem prover. The model is based on theorem libraries such as the real analysis library, matrix analysis library, and robot kinematics and dynamics library, and involves higher-order logic derivation and proof.
WANG Rong , TIAN Cong , SUN Jun , YU Bin , DUAN Zhen-Hua
2025, 36(8):3477-3493. DOI: 10.13328/j.cnki.jos.007346 CSTR: 32375.14.jos.007346
Abstract:Meta-interpretive learning (MIL) is an approach within inductive logic programming (ILP) that aims to learn a program from a set of examples, metarules, and other background knowledge. MIL uses a depth-first, failure-driven strategy to explore appropriate clauses in the program space to generate programs. This mechanism, however, inevitably leads to the problem of repeated proofs for the same goals. A pruning strategy is proposed, utilizing Prolog’s built-in database mechanism to store failed goals and their corresponding error information, effectively preventing redundant proof processes. The accumulated error information serves as a guide to help the MIL system optimize and adjust its learning process in future iterations. The correctness of the pruning algorithm is proved, and the reduction in program space is calculated theoretically. The proposed method is applied to two existing MIL systems, Metagol and MetagolAI, resulting in two new MIL systems MetagolF and MetagolAI_F. Empirical results from four different tasks demonstrate that the proposed strategy significantly reduces the time required to learn the same programs.
HE Tao , DONG Wei , WEN Yan-Jun
2025, 36(8):3494-3511. DOI: 10.13328/j.cnki.jos.007347 CSTR: 32375.14.jos.007347
Abstract:The operating system serves as the foundational platform for software, and the security of its kernel is often of paramount importance. Rust, a memory-safe programming language that has steadily gained popularity, incorporates safety mechanisms such as lifetimes, ownership, borrowing checks, and RAII. Using Rust to build kernels has emerged as a prominent area of research. However, systems built with Rust often include some unsafe code segments, which prevent the language from offering comprehensive safety guarantees at the language level. As a result, verifying these unsafe code segments is essential to ensuring the correctness and reliability of Rust-based kernels. This study proposes a method for combining the safe and unsafe code segments, called GhostFunc, to verify a microkernel constructed with Rust. Different levels of abstraction are applied to the two types of code segments, and GhostFunc is used for combined verification. Focusing on the task management and scheduling module, this study formalizes unsafe code segments such as Arc<T> using λRust and presents the formal implementation of GhostFunc. An example verification of this method is also provided. All verification efforts are based on theorem proving, and correctness is validated in Coq using the Iris separation logic framework.
DU De-Hui , YE Zhen , ZHENG Cheng-Hang , ZHU Zhen-Zhen , LI Jia-Yun
2025, 36(8):3512-3530. DOI: 10.13328/j.cnki.jos.007348 CSTR: 32375.14.jos.007348
Abstract:Extreme scenarios in autonomous driving, as well as unpredictable human behaviors, have become key factors limiting the development of autonomous driving systems (ADS). Therefore, effectively generating safety-critical scenarios is crucial for enhancing the safety of ADS. Existing methods for generating autonomous driving scenarios mainly rely on substantial road data collection and employ data-driven approaches combined with scenario generalization techniques. These methods are time-consuming, labor-intensive, and costly, making it difficult to effectively generate edge cases. In contrast, model-driven scenario modeling methods can construct logical scenario models to encapsulate complex scene features and effectively generate safety-critical scenarios. However, the challenge lies in designing a domain-knowledge-based visual scenario modeling language that supports the abstract modeling of complex driving scenarios and further explores edge-critical scenarios. To address these issues, this study proposes an approach for scenario modeling with SML4ADS2.0 and edge-critical scenario generation for autonomous driving systems. This method utilizes ontology-based modeling of scenarios within the ADS domain, combining formal quantitative evaluation with importance sampling to generate edge-critical scenarios. First, a model-driven scenario modeling method based on SML4ADS2.0 is proposed, using this language to construct models of autonomous driving scenarios. Second, the conversion of scenario models to stochastic hybrid automata (SHA) is implemented through model transformation rules and the model checking tool UPPAAL-SMC is used for quantitative evaluation and analysis of the scenario models. Subsequently, importance sampling techniques are employed to rapidly detect edge scenarios within the scenario space, effectively generating specific edge-critical scenarios from logical models. Finally, the method is demonstrated through case studies involving typical scenarios such as lane changes and overtaking. Experimental results indicate that this approach can effectively model scenarios and address the generation of safety-critical scenarios for ADS.
ZUO Zheng-Kang , LIU Zeng-Xin , KE Yu-Han , YOU Zhen , WANG Chang-Jing
2025, 36(8):3531-3553. DOI: 10.13328/j.cnki.jos.007349 CSTR: 32375.14.jos.007349
Abstract:Dynamic order statistic tree structures are a type of data structure that integrates the features of dynamic sets, order statistics, and search tree structures, supporting efficient data retrieval operations. These structures are widely used in fields such as database systems, memory management, and file management. However, current research primarily focuses on structural invariants, such as balance, while neglecting discussions on functional correctness. In addition, existing research methods mainly involve manual derivation or interactive mechanized verification for specific algorithms, lacking mature and reliable general verification frameworks and exhibiting a low level of automation. To address this, a functional modeling and automated verification framework for dynamic order statistic search tree structures, based on Isabelle, has been designed. A verified general lemma library is established to reduce the time and cost of code verification for developers. Using this functional modeling framework, unbalanced binary search trees, balanced binary search trees (exemplified by red-black trees), and balanced multi-way search trees (exemplified by 2–3 trees) are selected as instantiated cases for demonstration. With the help of the automated verification framework, multiple instantiated cases can be automatically verified by simply using induction and invoking the auto method once, or by using the try command. This provides a reference for automated verification of functional and structural correctness in complex data structure algorithms.
YU Tao , WANG Shan-Shan , XU Qian-Hui , DONG Xiao-Han , HU Dai-Jin , LUO Jie , YANG Yi-Long , LYU Jiang-Hua , MA Dian-Fu
2025, 36(8):3554-3569. DOI: 10.13328/j.cnki.jos.007350 CSTR: 32375.14.jos.007350
Abstract:The synchronous dataflow language Lustre is commonly used in the development of safety-critical systems. However, existing official code generators and the SCADE KCG code generator have not been formally verified, and their inner workings remain opaque to users. In recent years, translation validation methods that indirectly verify compiler correctness by proving the equivalence between source and target code have proven successful. This study proposes a trusted compilation method for the Lustre language, based on a pushdown automaton compilation approach and a semantic consistency verification method. The proposed method successfully implements a trusted compiler from Lustre to C and rigorously proves the translation process’s correctness using Isabelle.
XU Jia-Le , WANG Shu-Ling , LI Li-Ming , ZHAN Bo-Hua , LYU Yi , DAI Yi-Bo , CUI She-Cheng , WU Peng , TAN Yu , ZHANG Xue-Jun , ZHAN Nai-Jun
2025, 36(8):3570-3586. DOI: 10.13328/j.cnki.jos.007351 CSTR: 32375.14.jos.007351
Abstract:Operating system (OS) kernels serve as the foundation for designing safety-critical software systems. The correct functioning of computer system depends on the correctness of the underlying OS kernels, making their formal verification a critical task. However, behaviors such as multi-task concurrency, data sharing, and race conditions inherent in OS kernels pose significant challenges for formal verification. In recent years, theorem-proving methods have been widely applied to verify the functionality of OS kernel modules, achieving notable successes. The capability-based access control module in OS kernels provides fine-grained access control, designed to prevent unauthorized users from accessing kernel resources and services. Its implementation involves capability spaces for tasks, which form a set of tree structures. Each capability node includes nested, complex data structures and capability functions frequently perform operations such as access, modification, and recursive deletion of capability spaces. These factors make the formal verification of capability-based access control significantly more challenging compared to other OS modules. This study employs concurrent separation logic with refinement (CSL-R) to verify the functional correctness of a capability-based access control module in the aerospace embedded domain. The verification establishes refinement between the API functions of the capability module and their abstract specifications. First, the capability data structure is formally molded, followed by the definition of a global invariant to ensure the consistency of capability spaces. Next, the preconditions and postconditions for internal functions and the abstract specifications for API functions are defined to reflect functional correctness. Finally, the refinement between the C implementation of the API functions and their abstract specifications is rigorously proven. All definitions and verification steps are formalized using the Coq theorem prover. During the verification process, errors are identified in the C implementation, which are subsequently confirmed and corrected by the OS kernel designers.
WANG Jia-Wan , LIU Xi-Tong , BU Lei , LI Xuan-Dong
2025, 36(8):3587-3603. DOI: 10.13328/j.cnki.jos.007352 CSTR: 32375.14.jos.007352
Abstract:Cyber-physical system (CPS) is widely employed in safety-critical domains, making its safety assurance a critical concern. Formal verification serves as an effective approach for proving system safety but encounters challenges when applied to complex real-world CPS. Falsification has been proposed as an alternative, aiming to demonstrate system unsafety by identifying counterexample behaviors that violate specified safety properties. Existing path-oriented falsification methods for CPS utilize divide-and-conquer strategies to explore system behaviors along individual paths, effectively uncovering unsafe behaviors. However, in large-scale CPS, these methods are hindered by the path explosion problem, where the number of paths grows exponentially with the system’s discrete system modes, leading to significant reductions in falsification efficiency and performance. To address the path explosion issue in path-oriented falsification, this study introduces two novel techniques based on system models and specifications: path filtering and dynamic path selection. First, a specification-driven path filtering method is proposed to rapidly eliminate paths unlikely to contain unsafe behaviors by analyzing the syntax tree of system specifications and the behavioral constraints of each path. Second, a multi-armed bandit algorithm is adopted to guide the dynamic selection of paths during the falsification process, adaptively reallocating computational resources to optimize performance. Experimental evaluations on several classical benchmark cases demonstrate that the proposed techniques effectively mitigate the path explosion problem, significantly enhancing the efficiency and performance of CPS falsification. The success rate of identifying unsafe system behaviors improves by more than twofold.
ZHANG Zhuo-Ruo , CHANG Rui , YANG Shen-Yi , CHEN Fang
2025, 36(8):3604-3636. DOI: 10.13328/j.cnki.jos.007353 CSTR: 32375.14.jos.007353
Abstract:As an emerging system programming language with a focus on safety, Rust ensures memory and concurrency safety through its innovative ownership model and borrowing mechanism. Despite its design for safety, existing research has identified several safety challenges that Rust still faces. Formal verification, grounded in rigorous mathematical principles, provides robust assurances for enhancing Rust’s security. By constructing precise and well-defined semantic models, it becomes possible to formally prove that programs adhering to Rust’s type system meet safety requirements. Automated verification tools for Rust further enable users to validate the safety and correctness of their programs. This study presents a comprehensive and systematic analysis of formal verification approaches for Rust. It begins by introducing Rust’s core semantics and complex features, followed by an exploration of research and verification efforts in Rust’s formal semantics, emphasizing the role and potential of Rust’s type system in formal verification. Next, the study systematically compares the capabilities, supported language features, underlying techniques, and application scenarios of various automated Rust verification tools. These comparisons are instrumental in guiding tool selection and integration within real-world Rust development workflows. In addition, this study summarizes exemplary cases of verified Rust programs, demonstrating the significant impact of formal verification in ensuring program correctness and providing practical tool usage recommendations for developers. Finally, it discusses the key challenges in the field and outlines promising directions for future research, including the verification of unsafe Rust code, concurrent code verification, trustworthy compilation, and large model-driven formal verification. This study aims to establish a strong security foundation for the Rust community and foster the broader adoption of formal verification methods in Rust development.
TIAN Li-Li , DU De-Hui , NIE Ji-Hui , CHEN Yi-Kang , LI Ying-Da
2025, 36(8):3637-3654. DOI: 10.13328/j.cnki.jos.007354 CSTR: 32375.14.jos.007354
Abstract:With the rapid advancement of intelligent cyber-physical system (ICPS), intelligent technologies are increasingly utilized in components such as perception, decision-making, and control. Among these, deep reinforcement learning (DRL) has gained wide application in ICPS control components due to its effectiveness in managing complex and dynamic environments. However, the openness of the operating environment and the inherent complexity of ICPS necessitate the exploration of highly dynamic state spaces during the learning process. This often results in inefficiencies and poor generalization in decision-making. A common approach to address these issues is to abstract large-scale, fine-grained Markov decision processes (MDPs) into smaller-scale, coarse-grained MDPs, thus reducing computational complexity and enhancing solution efficiency. Nonetheless, existing methods fail to adequately ensure consistency between the spatiotemporal semantics of the original states, the abstracted system space, and the real system space. To address these challenges, this study proposes a causal spatiotemporal semantic-driven abstraction modeling method for deep reinforcement learning. First, causal spatiotemporal semantics are introduced to capture the distribution of value changes across time and space. Based on these semantics, a two-stage semantic abstraction process is applied to the states, constructing an abstract MDP model for the deep reinforcement learning process. Subsequently, abstraction optimization techniques are employed to fine-tune the abstract model, minimizing semantic discrepancies between the abstract states and their corresponding detailed states. Finally, extensive experiments are conducted on scenarios including lane-keeping, adaptive cruise control, and intersection crossing. The proposed model is evaluated and analyzed using the PRISM verifier. The results indicate that the proposed abstraction modeling technique demonstrates superior performance in abstraction expressiveness, accuracy, and semantic equivalence.
LI Rui-Zhi , HE Jin-Tao , OUYANG Dan-Tong
2025, 36(8):3655-3676. DOI: 10.13328/j.cnki.jos.007234 CSTR: 32375.14.jos.007234
Abstract:The minimum weakly connected dominating set problem is a classic NP-hard problem that has wide applications in various fields. This study proposes an efficient local search algorithm to solve this problem. The algorithm employs a method to construct an initial solution based on locked vertices and frequency feedback. This method ensures the inclusion of vertices that are certain or highly likely to be in the optimal solution, resulting in a high-quality initial solution. Furthermore, the study introduces a method to avoid cycling based on two-hop configuration checking, age properties, and tabu strategies. A perturbation strategy is also proposed to enable the algorithm to effectively escape from the local optimum. Additionally, effective vertex selection methods are presented to assist the algorithm in choosing vertices suitable for addition to or removal from the candidate solution by combining two scoring functions, Dscore and Nscore, with strategies for avoiding cycling. Finally, the proposed local search algorithm is evaluated on four benchmark test instances and compared with four state-of-the-art algorithms and the CPELX solver. Experimental results demonstrate that the proposed algorithm achieves better performance.
TIAN Xin-Liang , OUYANG Dan-Tong , ZHOU Hui-Si , JIANG Lu-Yu , TAI Ran , ZHANG Li-Ming
2025, 36(8):3677-3692. DOI: 10.13328/j.cnki.jos.007241 CSTR: 32375.14.jos.007241
Abstract:The minimum load coloring problem (MLCP) is an important NP-complete problem arising from wavelength division multiplexing (WDM), a technology used for building optical communication networks. The solutions to NP-complete problems grow exponentially as the size of the problems expands, so heuristic algorithms are often used to solve such problems. Analysis of research at home and abroad shows that among the existing heuristic algorithms for solving the MLCP, local search algorithms exhibit the best performance. This study proposes two optimization strategies to overcome the limitations of existing local search algorithms in data preprocessing and neighborhood space search. First, during data preprocessing, a one-degree vertex rule is proposed to reduce the size of data and thus reduce the search space of the MLCP. Second, in the search phase of the algorithm, a strategy termed two-stage best from multiple selections (TSBMS) is proposed to help local search algorithms efficiently select a high-quality neighborhood solution for neighborhood space with different sizes, which effectively improves the performance of local search algorithms for processing data of different sizes. This optimized local search algorithm is named IRLTS. Seventy-four classic test instances are adopted to validate the effectiveness of the IRLTS algorithm. Experimental results demonstrate that the IRLTS algorithm outperforms the three best local search algorithms on most test instances in terms of both optimal and average solutions. Furthermore, the effectiveness of the proposed strategy is validated through experiments, and the influence of key parameters on the IRLTS algorithm is analyzed.
HE Xian-Hao , HU Yi-Kun , LI Yi-Chen , YAN Yu-Wei , LYU Yi-Sheng , LIAO Qing , LI Yong , LI Ken-Li
2025, 36(8):3693-3708. DOI: 10.13328/j.cnki.jos.007238 CSTR: 32375.14.jos.007238
Abstract:As the scale of cities continues to increase, urban transportation systems are facing more and more challenges, such as traffic congestion and traffic safety. Traffic simulation is a method to solve urban traffic problems. It uses virtual and real computing technologies to process real-time traffic data and optimize urban traffic efficiency. It is an important method to achieve the parallel city theory in intelligent transportation. However, traditional computing systems often encounter problems such as insufficient computing resources and long simulation delays when running large-scale urban traffic simulations. To solve the above problems, this study proposes a parallel algorithm for traffic simulation of parallel cities based on the parallel city theory and the heterogeneous architecture of China’s new-generation supercomputer, Tianhe. This algorithm accurately simulates traffic elements such as vehicles, roads, and traffic signals, and applies methods such as road network division, parallel driving of vehicles, and parallel control of signal lights to achieve high-performance traffic simulation. The algorithm runs on Tianhe, a supercomputing platform with 16 nodes and more than 25000 cores, and simulates real traffic scenarios involving 2.4 million vehicles, 7797 intersections, and 170000 lanes within the Fifth Ring Road in Beijing. Compared with traditional single-node simulation, the proposed algorithm reduces the simulation time of each step from 2.21 s to 0.37 s, achieving nearly 6 times acceleration. An urban traffic simulation with a scale of one million vehicles has been successfully implemented on a domestic heterogeneous supercomputing platform.
ZHAO Hao-Jun , ZOU De-Qing , XUE Wen-Jie , WU Yue-Ming , JIN Hai
2025, 36(8):3709-3725. DOI: 10.13328/j.cnki.jos.007253 CSTR: 32375.14.jos.007253
Abstract:Software concept drift means that the structure and composition of the same type of software will change over time. In malware classification, concept drift means that the structure and composition characteristics of malware samples from the same family can change over time. This will cause a decline in the performance of fixed-mode malware classification algorithms over time. Existing methods for static malware classification experience significant performance degradation when faced with concept drift scenarios, making it difficult to meet the needs of practical applications. To address this problem, given the commonalities between natural language understanding and binary byte stream analysis, a highly accurate and robust malware classification method is proposed based on BERT and a custom autoencoder architecture. This method extracts execution-oriented malware opcode sequences through disassembly analysis to reduce redundant information. Then, it uses BERT to understand the contextual semantics of the sequences and perform vector embedding to effectively understand the deep program semantics of the malware samples. It also screens effective task-related features through the geometric median subspace projection and bottleneck autoencoders. Finally, a classifier composed of fully connected layers is used to output the classification results. The practical effectiveness of the proposed method is validated through comparative experiments with nine state-of-the-art malware classification methods in both normal and concept drift scenarios. Experimental results show that the proposed method achieves an F1 score of 99.49% in normal scenarios, outperforming those nine methods. Moreover, in concept drift scenarios, the F1 score is improved by 10.78% to 43.71% compared to the nine methods.
YUAN Bin , REN Jia-Jun , CHEN Qun-Jin-Ming , ZHANG Chi , ZOU De-Qing , JIN Hai
2025, 36(8):3726-3743. DOI: 10.13328/j.cnki.jos.007260 CSTR: 32375.14.jos.007260
Abstract:Fuzz testing automatically uncovers vulnerabilities in software. However, existing fuzz testing tools for network protocols are not able to fully explore their internal state space, resulting in limited coverage. Finite state machines comprehensively model the implementation of network protocols to provide an in-depth understanding of their system behavior and internal state space. This study proposes a fuzz testing method for network protocols based on finite state machines. It focuses on the commonly used TLS protocol, using finite state machine learning to model the implementation of the TLS protocol, reflecting the protocol’s internal state space and system behavior. Subsequently, guided by finite state machines, the fuzz testing of the TLS protocol achieves deeper depth and broader code coverage. This study also implements a prototype system, SNETFuzzer, which outperforms existing methods in important metrics such as coverage in a series of comparative experiments. SNETFuzzer successfully discovers multiple vulnerabilities, including two new ones, demonstrating its practicality and effectiveness.
ZHAO Xian-Lin , PAN Xing-Lu , ZOU Yan-Zhen , LIU Chen-Xiao , XIE Bing
2025, 36(8):3744-3768. DOI: 10.13328/j.cnki.jos.007252 CSTR: 32375.14.jos.007252
Abstract:Code comment generation is an important research task in software engineering. Mainstream methods for comment generation train deep learning models to generate comments, relying on metrics such as BLEU to evaluate comment quality on open code comment datasets. These evaluations mainly reflect the similarity between generated comments and manual reference comments in the datasets. However, the quality of the manual reference comments in open comment datasets varies widely, which leads to more and more doubts about the effectiveness of these metrics. Therefore, for code comment generation tasks, there is an urgent need for direct and effective methods to evaluate code comment quality. Such methods can improve the quality of open comment datasets and enhance the evaluation of generated comments. This study conducts research and analysis on existing quantifiable methods for code comment quality evaluation and applies a set of multi-dimensional metrics to directly evaluate the quality of code comments in mainstream open datasets, comments generated by traditional methods, and comments generated by ChatGPT. The study reveals the following findings. 1) The quality of code comments in mainstream open datasets needs improvement, with issues such as inaccuracy, poor readability, excessive simplicity, and a lack of useful information. 2) Comments generated by traditional methods are more lexically and semantically similar to the code but lack information that is more useful to developers, such as high-level intentions of the code. 3) One important reason for the low BLEU scores of generated comments is the large number of poor-quality reference comments in datasets, which lack relevance with the code or exhibit poor naturalness. These kinds of reference comments should be filtered or improved. 4) Comments generated by LLMs like ChatGPT are rich in content but tend to be lengthy. Their quality evaluation needs to be tailored to developer intentions and specific scenarios. Based on these findings, this study provides several suggestions for future research in code comment generation and comment quality evaluation.
GUO Na , WANG Ya-Qi , JIANG Hao-Nan , GU Yu , XIA Xiu-Feng
2025, 36(8):3769-3786. DOI: 10.13328/j.cnki.jos.007220 CSTR: 32375.14.jos.007220
Abstract:Learned indexes are assisting or gradually replacing traditional index structures due to their low memory usage and high query performance. However, the online retraining caused by data updates makes it unable to adapt to the scenario of frequent data updates. To avoid index reconstruction due to frequent data updates without significantly increasing memory consumption, this study proposes an adaptive update-distribution-aware learned index named DRAMA. It uses an LSM-Tree-like delayed learning method to actively learn the characteristics of the data update distribution, approximate fitting techniques to quickly establish the update-distribution model, a model merging strategy to replace the frequent retraining, and a hybrid compression technique to reduce the memory usage of model parameters in the index. The index is constructed and validated on both real and synthetic datasets. The results show that, compared to traditional indexes and state-of-the-art learned indexes, the proposed index can effectively reduce query delay in a data update environment without additional memory consumption.
LI Yun , GAO Ya , YAO Zhi-Xiu , XIA Shi-Chao , WU Guang-Fu
2025, 36(8):3787-3801. DOI: 10.13328/j.cnki.jos.007239 CSTR: 32375.14.jos.007239
Abstract:Traffic flow prediction is an important foundation and a hot research direction for traffic management in intelligent transportation systems (ITS). Traditional methods for traffic flow prediction typically rely on a large amount of high-quality historical observation data to achieve accurate predictions, but the prediction accuracy significantly decreases in more common scenarios with data scarcity in traffic networks. To address this problem, a transfer learning model is proposed based on spatial-temporal graph convolutional networks (TL-STGCN), which leverages traffic flow features from a source network with abundant data to assist in predicting future traffic flow in a target network with data scarcity. Firstly, a spatial-temporal graph convolutional network based on time attention is employed to learn the spatial and temporal features of the traffic flow data in both the source and target networks. Secondly, domain-invariant spatial-temporal features are extracted from the representations of the two networks using transfer learning techniques. Lastly, these domain-invariant features are utilized to predict the future traffic flow in the target network. To validate the effectiveness of the proposed model, experiments are conducted on real-world datasets. The results demonstrate that TL-STGCN outperforms existing methods by achieving the highest accuracy in mean absolute error, root mean square error, and mean absolute percentage error, which proves that TL-STGCN provides more accurate traffic flow predictions for scenarios with data scarcity in traffic networks.
ZHANG Zi-Long , JIA Lin-Peng , JIANG Shuo-Xuan , SUN Yi
2025, 36(8):3802-3830. DOI: 10.13328/j.cnki.jos.007269 CSTR: 32375.14.jos.007269
Abstract:Rollup is an emerging off-chain transaction processing solution for blockchains. With the continuous development of applications, the need for interoperability among different types of Rollups is increasingly growing. Existing cross-Rollup interoperability solutions typically rely on third-party service providers to assist in completion, which brings about security risks such as new trust assumptions and single-point-of-failure issues. Completing interoperability among Rollups based on the native chain does not require introducing new trust assumptions, but will consume the computing and storage resources of the native chain, reduce the transaction throughput of the native chain, and thus seriously affect the performance of cross-Rollup. Based on this, this study proposes a cross-Rollup mechanism based on a native blockchain. By aggregating and processing transactions in batches, it effectively reduces the on-chain average computation and storage resource costs of individual transactions. Specifically, a transaction validity proof scheme based on zero-knowledge proof is proposed to significantly reduce the on-chain computation overhead of transaction validity verification. A transaction storage scheme based on index table data compression is proposed, reducing the average on-chain storage overhead of cross-Rollup transactions. An aggregation scale balance adjustment algorithm is proposed, which obtains the optimal aggregation scale, achieving a balance between on-chain resource consumption and processing latency. Finally, this study validates the proposed solution through experiments. The experimental results demonstrate that under the condition of complete trustlessness, the proposed solution reduces on-chain computing and storage overheads while achieving a balance between on-chain resource consumption and processing latency. Moreover, compared to existing cross-Rollup solutions, the proposed solution exhibits good system throughput.
WU Wen-Hao , ZHANG Lei-Lei , PAN Heng , LI En-Han , ZHOU Jian-Er , LI Zhen-Yu
2025, 36(8):3831-3857. DOI: 10.13328/j.cnki.jos.007249 CSTR: 32375.14.jos.007249
Abstract:Traditional detection and defense mechanisms for distributed denial-of-service (DDoS) attacks require traffic mirroring, collection, and centralized remote analysis, which introduces extra performance overhead and fails to achieve real-time protection in high-performance networks. With the development of network devices such as programmable switches, the programmable data plane has emerged as a solid foundation for achieving high-performance DDoS attack detection. However, existing detection methods based on the programmable data plane cannot guarantee accuracy and are difficult to deploy directly in programmable switches (such as Intel Tofino) due to programming constraints. To this end, this paper proposes a programmable switch-based mechanism for detecting and defending against DDoS attacks. First, the mechanism uses the difference between the entropy of source and destination addresses to determine whether DDoS attacks occur. When DDoS attacks occur, a traffic filtration mechanism based on the difference in counts between source and destination address will defend against DDoS attacks in real time. Experimental results indicate that the proposed mechanism effectively identifies and defends against DDoS attacks. Compared with the benchmark method, the accuracy of this mechanism in window-level attack detection is increased by 17.75% on average, and the accuracy of packet-level attack filtration is increased by 3.7% on average.
ZHANG Wen-Qi , LI Xiong , YIN Zhi-Ming , LIANG Wei , HUANG Ke , ZHANG Xiao-Song
2025, 36(8):3858-3882. DOI: 10.13328/j.cnki.jos.007251 CSTR: 32375.14.jos.007251
Abstract:Dynamic searchable symmetric encryption has attracted much attention because it allows users to securely search and dynamically update encrypted documents stored in a semi-trusted cloud server. However, most searchable symmetric encryption schemes only support single-keyword search, failing to achieve conjunctive search while protecting forward and backward privacy. In addition, most schemes are not robust, which means that they cannot handle irrational update requests from a client, such as adding or deleting a certain keyword/file identifier pair, or deleting non-existent keywords/file identifier pairs. To address these challenges, this study proposes a robust scheme for conjunctive dynamic symmetric searchable encryption that preserves both forward and backward privacy, called RFBC. In this scheme, the server constructs two Bloom filters for each keyword, which are used to store the relevant hash values of the keyword/file identifier pair to be added and deleted, respectively. When the client sends update requests, the server uses the two Bloom filters to determine and filter irrational update requests, so as to guarantee the robustness of the scheme. In addition, by combining the status information of the lowest frequency keywords among multiple keywords, the Bloom filters, and the update counter, RFBC realizes conjunctive search by filtering out file identifiers that do not contain the rest keywords. Finally, by defining the leakage function, RFBC is proved to be forward private and Type-III backward private through a series of security analyses. Experimental results show that compared with related schemes, RFBC greatly improves computation and communication efficiency. Specifically, the computational overhead of update operations in RFBC is about 28% and 61.7% of that in ODXT and BDXT, respectively. The computational overhead of search operations in RFBC is about 21.9% and 27.3% of that in ODXT and BDXT, respectively. The communication overhead of search operations in RFBC is about 19.7% and 31.6% of that in ODXT and BDXT, respectively. Moreover, as the proportion of irrational updates gradually increases, RFBC exhibits significantly higher improvement in search efficiency compared to both BDXT and ODXT.
TANG Chang-Hong , ZHAO Yan-Qi , YANG Xiao-Yi , FENG Qi , YU Yong
2025, 36(8):3883-3895. DOI: 10.13328/j.cnki.jos.007255 CSTR: 32375.14.jos.007255
Abstract:As the Internet of Things and mobile Internet technologies continue to advance, a wide range of mobile devices are connected to the Internet. To identify and authenticate these devices, it is necessary to verify the digital signatures they submit. However, many mobile devices have limited computing power and typically use software modules to store keys locally or on smart chips, which increases the risk of key exposure. To avoid this risk, threshold signatures are commonly e