• Volume 37,Issue 2,2026 Table of Contents
    Select All
    Display Type: |
    • LA-tree: Query-aware Adaptive Learned Multi-dimensional Index

      2026, 37(2):485-507. DOI: 10.13328/j.cnki.jos.007570 CSTR: 32375.14.jos.007570

      Abstract (648) HTML (342) PDF 5.24 M (1980) Comment (0) Favorites

      Abstract:Structured data analysis typically requires performing multi-attribute queries over tabular data, making efficient multi-dimensional indexes key support for database systems. However, existing multi-dimensional indexing methods face limitations in high-dimensional scenarios. Traditional multi-dimensional indexing methods partition data uniformly based on data distribution but lack the awareness of query features, resulting in limited filtering effectiveness. In contrast, although existing learned multi-dimensional indexes introduce query-awareness, they often produce highly unbalanced partitions, thereby resulting in some oversized partitions and substantially increased scanning costs. To this end, this study proposes LA-tree, a novel learned tree-based multi-dimensional index that balances both data distribution and query workload awareness. In the offline construction phase, LA-tree formulates the selection of partitioning dimensions at each node as an optimization problem of minimizing the overall scan ratio of the query workload, and puts forward a hierarchical greedy search algorithm to achieve the unity of uniform partitioning and query-awareness. In the online query phase, the lightweight linear model and piecewise linear model are introduced to transform traditional numerical comparisons to fast mapping computations, thereby reducing filtering latency while ensuring the completeness of query results. In dynamic settings, an adaptive incremental update mechanism based on scan volume monitoring is proposed to efficiently adapt to changes in data and query workloads via local subtree reconstruction, thereby avoiding the high cost of rebuilding the entire index. Experimental results demonstrate that LA-tree outperforms existing methods on multiple real-world and benchmark datasets. In static settings, the query time is reduced by an average of 52% compared with the optimal benchmark method, while in dynamic settings, the update costs are reduced by 97% compared with the reconstruction methods. Additionally, low query latency and lightweight index scale are maintained.

    • >Review Articles
    • Survey on Solving SMT Formulas with Recursive Definitions

      2026, 37(2):508-542. DOI: 10.13328/j.cnki.jos.007560 CSTR: 32375.14.jos.007560

      Abstract (432) HTML (297) PDF 3.78 M (1466) Comment (0) Favorites

      Abstract:Programs with recursive data structures, such as list and tree, are widely used in computer science. Program verification problems are often translated into satisfiability modulo theories (SMT) formulas for solving. Recursive data structures are usually converted into first-order logic formulas combining algebraic data types (ADTs) and other theories such as integers. To express properties of recursive data structures, programs often include recursive functions, which in SMT are represented using assertions with quantifiers and uninterpreted functions. This study focuses on solving methods for SMT formulas with both ADTs and recursive functions. Existing techniques are reviewed from three perspectives: SMT solvers, automated theorem provers, and constrained Horn clause (CHC) solvers. Furthermore, the study conducts unified experiments to compare state-of-the-art tools on different benchmarks. It investigates the advantages and limitations of existing solving tools and techniques on various types of problems and explores potential optimization directions, providing valuable analyses and references for researchers.

    • Logical Reasoning Testing of Intelligent Question Answering System

      2026, 37(2):543-562. DOI: 10.13328/j.cnki.jos.007421 CSTR: 32375.14.jos.007421

      Abstract (714) HTML (224) PDF 2.51 M (1740) Comment (0) Favorites

      Abstract:Intelligent question answering (QA) system utilizes information retrieval and natural language processing techniques to deliver automated responses to user inquiries. Like other artificial intelligence software, intelligent QA system is prone to bugs. These bugs can degrade user experience, cause financial losses, or even trigger social panic. Therefore, it is crucial to detect and fix bugs in intelligent QA system promptly. Automated testing approaches fall into two categories. The first approach synthesizes hypothetical facts based on questions and predicted answers, then generates new questions and expected answers to detect bugs. The second approach generates semantically equivalent test inputs by injecting knowledge from existing datasets, ensuring the answer to the question remains unchanged. However, both methods have limitations in practical use. They rely heavily on the intelligent QA system’s output or training set, which results in poor testing effectiveness and generalization, especially for large-language-model-based intelligent QA systems. Moreover, these methods primarily assess semantic understanding while neglecting the logical reasoning capabilities of intelligent QA system. To address this gap, a logic-guided testing technique named QALT is proposed. It designs three logically related metamorphic relations and uses semantic similarity measurement and dependency parsing to generate high-quality test cases. The experimental results show that QALT detected a total of 9247 bugs in two different intelligent QA systems, which is 3150 and 3897 more bugs than the two current state-of-the-art techniques (i.e., QAQA and QAAskeR), respectively. Based on the statistical analysis of manually labeled results, QALT detects approximately 8073 true bugs, which is 2142 more than QAQA and 4867 more than QAAskeR. Moreover, the test inputs generated by QALT successfully reduce the MR violation rate from 22.33% to 14.37% when used for fine-tuning the intelligent QA system under test.

    • SAC-based Ensemble Framework for Multi-view Workload Forecasting in Cloud Computing

      2026, 37(2):563-583. DOI: 10.13328/j.cnki.jos.007424 CSTR: 32375.14.jos.007424

      Abstract (526) HTML (345) PDF 3.17 M (1901) Comment (0) Favorites

      Abstract:Accurate workload forecasting is essential for effective cloud resource management. However, existing models typically employ fixed architectures to extract sequential features from different perspectives, which limits the flexibility of combining various model structures to further improve forecasting performance. To address this limitation, a novel ensemble framework SAC-MWF is proposed based on the soft actor-critic (SAC) algorithm for multi-view workload forecasting. A set of feature sequence construction methods is developed to generate multi-view feature sequences at low computational cost from historical windows, enabling the model to focus on workload patterns from different perspectives. Subsequently, a base prediction model and several feature prediction models are trained on historical windows and their corresponding feature sequences, respectively, to capture workload dynamics from different views. Finally, the SAC algorithm is employed to integrate these models to generate the final forecast. Experimental results on three datasets demonstrate that SAC-MWF performs excellently in terms of effectiveness and computational efficiency.

    • Automatic Migration of AI Source Code Between Frameworks Based on Domain Knowledge Graph

      2026, 37(2):584-600. DOI: 10.13328/j.cnki.jos.007451 CSTR: 32375.14.jos.007451

      Abstract (494) HTML (719) PDF 2.52 M (807) Comment (0) Favorites

      Abstract:As the foundation of AI, deep learning frameworks play a vital role in driving the rapid progress of AI technologies. However, due to the lack of unified standards, compatibility across different frameworks remains limited. Faithful model transformation enhances interoperability by converting a source model into an equivalent model in the target framework. However, the large number and diversity of deep learning frameworks, combined with the increasing demand for custom frameworks, lead to high conversion costs. To address this issue, this study proposes an automatic AI source code migration method between frameworks based on a domain knowledge graph. The method integrates domain knowledge graphs and abstract syntax trees to systematically manage migration challenges. First, the source code is transformed into a framework-specific abstract syntax tree, from which general dependency information and operator-specific details are extracted. By applying the operator and parameter mappings stored in the domain knowledge graph, the code is migrated to the target framework, generating equivalent target model code while significantly reducing engineering complexity. Compared with existing code migration tools, the proposed method supports mutual migration among widely used deep learning frameworks, such as PyTorch, PaddlePaddle, and MindSpore. The approach has proven to be both mature and reliable, with part of its implementation open-sourced in Baidu’s official migration tool, PaConvert.

    • Code Comment Generation Method Based on Semantic Reranking

      2026, 37(2):601-620. DOI: 10.13328/j.cnki.jos.007470 CSTR: 32375.14.jos.007470

      Abstract (452) HTML (490) PDF 2.65 M (1168) Comment (0) Favorites

      Abstract:Code comments serve as natural-language descriptions of the source code functionality, helping developers quickly understand the code’s semantics and functionality, thus improving software development and maintenance efficiency. However, writing and maintaining code comments is time-consuming and labor-intensive, often leading to issues such as absence, inconsistency, and obsolescence. Therefore, the automatic generation of comments for source code has attracted significant attention. Existing methods typically use information retrieval techniques or deep learning techniques for automatic code comment generation, but both have their limitations. Some research has integrated these two techniques, but such approaches often fail to effectively leverage the advantages of both methods. To address these issues, this study proposes a semantic reranking-based code comment generation method, SRBCS. SRBCS employs a semantic reranking model to rank and select comments generated by various approaches, thus integrating multiple methods and maximizing their respective strengths in the comment generation process. We compared SRBCS with 11 code comment generation approaches on two subject datasets. Experimental results demonstrate that SRBCS effectively integrates different approaches and outperforms existing methods in code comment generation.

    • Diffusion-model-guided Root Cause Analysis

      2026, 37(2):621-640. DOI: 10.13328/j.cnki.jos.007473 CSTR: 32375.14.jos.007473

      Abstract (382) HTML (758) PDF 3.54 M (790) Comment (0) Favorites

      Abstract:Root cause analysis refers to identifying the underlying factors that lead to abnormal failures in complex systems. Causal-based backward reasoning methods, founded on structural causal models, are among the optimal approaches for implementing root cause analysis. Most current causality-driven root cause analysis methods require the prior discovery of the causal structure from data as a prerequisite, making the effectiveness of the analysis heavily dependent on the success of this causal discovery task. Recently, score function-based intervention identification has gained significant attention. By comparing the variance of score function derivatives before and after interventions, this approach detects the set of intervened variables, showing potential to overcome the constraints of causal discovery in root cause analysis. However, mainstream score function-based intervention identification is often limited by the score function estimation step. The analytical solutions used in existing methods struggle to effectively model the real distribution of high-dimensional complex data. In light of recent advances in data generation, this study proposes a diffusion model-guided root cause analysis strategy. Specifically, the proposed method first estimates the score functions corresponding to data distributions before and after the anomaly using diffusion models. It then identifies the set of root cause variables by observing the variance of the first-order derivatives of the overall score function after weighted fusion. Furthermore, to solve the issue of computational overhead raised by the pruning operation, an acceleration strategy is proposed to estimate the score function from the initially trained diffusion model, avoiding the re-training cost of the diffusion model after each pruning operation. Experimental results on simulated and real-world datasets demonstrate that the proposed method accurately identifies the set of root cause variables. Furthermore, ablation studies show that the guidance provided by the diffusion model is critical to the improved performance.

    • Root Cause Change Identification for Failures in Large-scale Online System

      2026, 37(2):641-661. DOI: 10.13328/j.cnki.jos.007482 CSTR: 32375.14.jos.007482

      Abstract (305) HTML (173) PDF 3.92 M (677) Comment (0) Favorites

      Abstract:In large-scale online service systems, software changes occur frequently and are on the rise due to the need to adapt to rapidly changing user demands and information technologies, such as continuous integration and delivery. Although engineers rigorously test new software versions before deployment, some defects may go unnoticed during testing and end up being deployed to the production environment. This primarily occurs because significant differences exist between the testing and production environments in terms of load, scale, and user characteristics. As a result, these defects can impact the system’s availability and stability. To better understand the impact and behavior of defective changes after deployment to the production environment, this study conducts an empirical analysis using real change failure data from WeChat, a large-scale global instant messaging system. Five key findings related to defective changes are derived from this analysis. Based on these empirical findings and conclusions, this study proposes a lightweight root cause change identification method. This method aims to automatically identify the root cause changes that lead to failure, assisting operations and maintenance engineers in root cause localization and trouble shooting efforts. To validate the effectiveness of the proposed method, a real dataset containing various types of defective changes from WeChat’s production environment is collected, along with a simulated change dataset based on a microservice benchmark system. A systematic evaluation of the proposed method is then conducted. The experimental results show that the proposed method achieves Top-3 root cause change hit rates of 80% and 84% for the WeChat production environment dataset and simulated change data, respectively, significantly outperforming the state-of-the-art defective change detection methods. Moreover, from an engineering practice perspective, the system uses only 2.3 GB of memory and has an average analysis latency of 28.6 s when processing typical-scale failures, thus meeting the requirements of actual production environments.

    • Code Summarization Enhancing Method with Dependency-aware Hierarchical Neural Network

      2026, 37(2):662-683. DOI: 10.13328/j.cnki.jos.007504 CSTR: 32375.14.jos.007504

      Abstract (376) HTML (230) PDF 5.62 M (1663) Comment (0) Favorites

      Abstract:As an emerging technique in software engineering, automatic source code summarization aims to generate natural language descriptions for given code snippets. State-of-the-art code summarization techniques utilize encoder-decoder neural models. The encoder extracts the semantic representations of the source code, while the decoder translates them into human-readable code summaries. However, many existing approaches treat input code snippets as standalone functions, often overlooking the context dependencies between the target function and its invoked subfunctions. Ignoring these dependencies can result in the omission of crucial semantic information, potentially reducing the quality of the generated summaries. To this end, this study proposes a dependency-aware hierarchical code summarization neural model, DHCS. DHCS is designed to improve code summarization by explicitly modeling the hierarchical dependencies between the target function and its subfunctions. The proposed approach employs a hierarchical encoder consisting of both a subfunction encoder and a target function encoder, allowing the model to capture both local and contextual semantic representations effectively. Meanwhile, a self-supervised task, namely the masked subfunction prediction, is introduced to enhance the representation learning of subfunctions. Furthermore, the topic distribution of subfunctions is mined and incorporated into a summary decoder with a topic-aware copy mechanism. Therefore, it enables the direct extraction of key information from subfunctions, facilitating more effective summary generation for the target function. Finally, extensive experiments are conducted on three real-world datasets constructed for Python, Java, and Go languages, which clearly validate the effectiveness of the proposed approach.

    • Self-supervised Recommendation Method Based on Multiple Views

      2026, 37(2):684-699. DOI: 10.13328/j.cnki.jos.007419 CSTR: 32375.14.jos.007419

      Abstract (347) HTML (429) PDF 2.13 M (1635) Comment (0) Favorites

      Abstract:Self-supervised learning (SSL) can extract self-supervised signals from raw data, which holds great potential for improving recommendation performance. However, two key challenges remain in current self-supervised learning-based recommendation methods. First, most self-supervised recommendation models apply random perturbations to the same node and use the generated different results as self-supervised signals. However, due to the extensive homogeneity in recommendation systems, this method ignores the information from neighboring nodes, which affects the recommendation performance. Secondly, while historical interaction information between users and items as well as the social relationship information between users are the focus of current self-supervised learning-based recommendation models, the internal relationships between items are often neglected. This also leads to insufficient self-supervised signals. Based on these challenges, a self-supervised recommendation method based on multiple views is proposed. This method considers perspectives from preference, user, and item, and employs a multi-view joint training approach for self-supervised learning. By combining the social relationships between users, the category relationships between items, and the historical interaction information between users and items, self-supervised signals are fully extracted. Experiments conducted on three real public datasets validate that the proposed multi-view-based self-supervised learning method is effective in improving recommendation performance.

    • Self-supervised Graph Representation Learning via Structural Relation Modeling

      2026, 37(2):700-715. DOI: 10.13328/j.cnki.jos.007439 CSTR: 32375.14.jos.007439

      Abstract (285) HTML (353) PDF 2.16 M (565) Comment (0) Favorites

      Abstract:This study aims to learn robust graph representations from unlabeled graph data. A novel framework, termed structural relation modeling (SRM), is proposed for self-supervised graph representation learning to alleviate inherent limitations caused by unlabeled data and graph topological imbalances. First, rather than focusing solely on local structures or node embeddings as in most existing methods, this study models complex structural relations, such as local-global relations and node correlations, among nodes, subgraphs, and entire graphs within a unified framework to better capture graph topology and utilize structural self-supervision. Second, a partition-based subgraph sampling mechanism is introduced to mitigate over-aggregation and topological decay induced by graph topological imbalance, enabling more uniform information propagation through mini-batch training. Third, a node regularization strategy is applied to improve training stability and efficiency, resulting in more accurate structural representations. Extensive experiments on node and graph classification across 12 public datasets demonstrate the effectiveness and generalizability of the proposed method.

    • Multiplex Heterogeneous Graph Neural Network for Node Classification

      2026, 37(2):716-731. DOI: 10.13328/j.cnki.jos.007440 CSTR: 32375.14.jos.007440

      Abstract (443) HTML (533) PDF 3.12 M (674) Comment (0) Favorites

      Abstract:In recent years, heterogeneous graph convolutional networks have emerged as a mainstream approach for node classification due to their ability to effectively capture semantic information in heterogeneous networks. However, several challenges remain. Most existing studies focus on general heterogeneous networks, where only a single type of edge is assumed between any two nodes. This simplification overlooks the multiple relationships that exist among multi-type nodes in multiplex heterogeneous networks and fails to explicitly explore the impact of different relations on the representations of various node types. Moreover, the over-smoothing issue inherent in graph neural networks limits these models to capturing only low-order local information, making it difficult to learn global correlation patterns in the network. To address these challenges, this study proposes a multiplex heterogeneous graph neural network (MHGNN) designed for node classification. The proposed MHGNN first learns local initial representations of each node type under different relational contexts. It then explicitly models the importance of each relation and effectively integrates the representations of different node types across multiple relations, thus capturing the relational diversity within multiplex heterogeneous networks. In addition, drawing inspiration from the microeconomic concepts of substitutes and complements, the study constructs substitute and complement matrices that encode global similarity features. These matrices are incorporated into the model via graph neural aggregation to enhance the learning of higher-order global semantic information across different node types. Finally, contrastive learning is employed to reconcile and fuse the distinct yet complementary representations learned from both local and global views, yielding the final node embeddings. Extensive experiments conducted on six real-world datasets demonstrate that the proposed MHGNN significantly outperforms state-of-the-art models across various evaluation metrics for node classification.

    • Antelope: 3-party Privacy-preserving Machine Learning Framework Based on GPU

      2026, 37(2):732-748. DOI: 10.13328/j.cnki.jos.007445 CSTR: 32375.14.jos.007445

      Abstract (538) HTML (329) PDF 2.23 M (944) Comment (0) Favorites

      Abstract:As concerns over data privacy continue to grow, secure multi-party computation (MPC) has gained considerable research attention due to its ability to protect sensitive information. However, the communication and memory demands of MPC protocols limit their performance in privacy-preserving machine learning (PPML). Reducing interaction rounds and memory overhead in secure computation protocols remains both essential and challenging, particularly in GPU-accelerated environments. This study focuses on the design and implementation of GPU-friendly protocols for linear and nonlinear computations. To eliminate overhead associated with integer operations, 64-bit integer matrix multiplication, and convolution are implemented using CUDA extensions in PyTorch. A most significant bit (MSB) extraction protocol with low communication rounds is proposed, based on 0-1 encoding. In addition, a low-communication-complexity hybrid multiplication protocol is introduced to reduce the communication overhead of secure comparison, enabling efficient computation of ReLU activation layers. Finally, Antelope, a GPU-based 3-party framework, is proposed to support efficient privacy-preserving machine learning. This framework significantly reduces the performance gap between secure and plaintext computation and supports end-to-end training of deep neural networks. Experimental results demonstrate that the proposed framework achieves 29×–101× speedup in training and 1.6×–35× in inference compared to the widely used CPU-based FALCON (PoPETs 2020). When compared with GPU-based approaches, training performance reaches 2.5×–3× that of CryptGPU (S&P 2021) and 1.2×–1.6× that of Piranha (USENIX Security 2022), while inference is accelerated by factors of 11× and 2.8×, respectively. Notably, the proposed secure comparison protocol exhibits significant advantages when processing small input sizes.

    • Multi-label Label-specific Feature Learning Based on Invariance Injection

      2026, 37(2):749-761. DOI: 10.13328/j.cnki.jos.007447 CSTR: 32375.14.jos.007447

      Abstract (351) HTML (389) PDF 1.72 M (855) Comment (0) Favorites

      Abstract:Label-specific features serve as an effective strategy for addressing multi-label classification tasks. By tailoring discriminative features to the individual preferences of each label, such features enhance the generalization capability of classification models. Existing methods typically focus on manipulating features to extract those relevant to label discrimination. Rather than following this conventional approach, this study explores a novel perspective based on feature invariance for label-specific feature learning. Specifically, invariance is injected into classifiers with respect to label-irrelevant features by intentionally manipulating these features for each class label. Accordingly, an invariance-based label-specific feature learning method, termed INVA, is proposed. INVA estimates the feature covariance matrix for each label to capture intra-class variation, thus identifying label-irrelevant features. Classifiers are then endowed with invariance to these features by solving a perturbation risk minimization problem. Furthermore, an upper bound of the perturbation risk is derived to enhance computational efficiency. Comprehensive experiments on standard multi-label benchmark datasets demonstrate the effectiveness of the proposed method.

    • Dependency-syntax-information-enhanced Fully Non-autoregressive Translation

      2026, 37(2):762-783. DOI: 10.13328/j.cnki.jos.007463 CSTR: 32375.14.jos.007463

      Abstract (237) HTML (207) PDF 3.86 M (1722) Comment (0) Favorites

      Abstract:The main challenge of fully non-autoregressive translation (Fully NAT) lies in maintaining translation quality comparable to autoregressive translation (AT) while preserving the decoding speed advantage. This challenge arises from the parallel decoding nature, which prevents Fully NAT methods from capturing dependency information on the target side, leading to degraded translation quality. Therefore, enhancing the model’s ability to capture dependency information from the source side is a natural approach, especially given that syntactic information has proven effective in improving AT methods. Although significant progress has been made in this area in recent years, there has been limited research on incorporating syntactic information in Fully NAT. Through experiments on five translation benchmarks (e.g., workshop on machine translation, WMT), it is found that dependency syntax information is highly beneficial for fully NAT methods, significantly improving translation performance with a decoding speed cost that remains within an acceptable range. The code has been released at https://github.com/tianxiexiaozhu77/syngec.

    • Customized Review Generation Integrating Multimodal Information

      2026, 37(2):784-798. DOI: 10.13328/j.cnki.jos.007465 CSTR: 32375.14.jos.007465

      Abstract (297) HTML (423) PDF 6.20 M (1818) Comment (0) Favorites

      Abstract:With the rapid development of merchant review websites, the volume of content on these websites has increased significantly, making it challenging for users to quickly find valuable reviews. This study introduces a new task, “multimodal customized review generation”. The task aims to generate customized reviews for specific users about products they have not yet reviewed, thus providing valuable insights into these products. To achieve this goal, this study explores a multimodal review generation framework based on a pre-trained language model. Specifically, a multimodal pre-trained language model is employed, which takes product images and user preferences as inputs. The visual and textual features are then fused to generate customized reviews. Experimental results demonstrate that the proposed model is effective in generating high-quality customized reviews.

    • Knowledge-enhanced Automatic Selection for Time Series Anomaly Detection Algorithm

      2026, 37(2):799-816. DOI: 10.13328/j.cnki.jos.007481 CSTR: 32375.14.jos.007481

      Abstract (338) HTML (192) PDF 3.29 M (1452) Comment (0) Favorites

      Abstract:Time series anomaly detection plays an important role in many real-world applications, such as monitoring key metrics (e.g., CPU and memory usage) in cloud-native database systems to detect system failures timely. Although many advanced time series anomaly detection algorithms have been proposed in recent years, it has been shown that different algorithms excel in different application scenarios in terms of anomaly detection accuracy, and there is no universally optimal method. Therefore, studying the problem of automatically selecting the most suitable time series anomaly detection algorithm based on the data characteristics of various scenarios is crucial to achieving higher detection accuracy. Existing studies typically address this problem using time series classification (TSC) techniques, training a classifier on data from historical tasks, where the input is a time series, and the output is the predicted most accurate anomaly detection algorithm for that time series. Although TSC-based solutions improve detection accuracy, existing standard TSC algorithms fail to fully utilize the knowledge from historical anomaly detection tasks. This study proposes a knowledge-enhanced time series anomaly detection framework. Specifically, in addition to training the TSC model with hard labels that represent the best detection algorithm for each historical time series, the accuracy of all candidate algorithms evaluated on historical data is used to estimate the class distribution of the input time series. The distribution is treated as a soft label, providing the algorithm selector (i.e., the TSC model) with more knowledge about the relationships between the anomaly detection algorithms. Meanwhile, a module is designed to flexibly integrate various types of external knowledge (e.g., descriptions of the domain, characteristics of time series, and anomalies) into the TSC model. The proposed method is designed as a plugin that can be seamlessly integrated into any TSC model to enhance its performance in anomaly detection algorithm selection, regardless of the model architecture. Extensive experiments on various types of time series datasets validate the effectiveness of this approach.

    • >Review Articles
    • Empowering Relational Database Systems with AI: Standardization, Technologies, and Challenges

      2026, 37(2):817-859. DOI: 10.13328/j.cnki.jos.007506 CSTR: 32375.14.jos.007506

      Abstract (1203) HTML (363) PDF 4.20 M (1171) Comment (0) Favorites

      Abstract:The advent of the big data era has introduced massive data applications characterized by four defining attributes: volume, variety, velocity, and value. These attributes pose revolutionary challenges to conventional data acquisition methods, management strategies, and database processing capabilities. Recent breakthroughs in artificial intelligence (AI), particularly in machine learning and deep learning, have demonstrated remarkable advancements in representation learning, computational efficiency, and model interpretability, thus offering innovative solutions to these challenges. This convergence of AI and database systems has given rise to a new generation of intelligent database management systems, which integrate AI technologies across three core architectural layers: (1) natural language interfaces for user interaction, (2) automated database administration frameworks (including parameter tuning, index recommendation, database diagnostics, and workload management), and (3) machine learning-based efficient and scalable components (such as learned indexes, adaptive partitioning, query optimization, and scheduling). Furthermore, new intelligent component application programming interfaces (APIs) have lowered the integration barrier between AI and database systems. This study systematically investigates intelligent databases through a standardization-centric framework, delineating common processing paradigms across the research themes of interaction paradigms, management architectures, and kernel design. By examining standardized processes, interfaces, and collaboration mechanisms, this study uncovers the core logic enabling database self-optimization, reviews current research advancements, and provides an in-depth analysis of the technical challenges and prospects for future development.

    • Path-aggregation-based Graph Fraud Detection

      2026, 37(2):860-874. DOI: 10.13328/j.cnki.jos.007423 CSTR: 32375.14.jos.007423

      Abstract (569) HTML (209) PDF 2.20 M (1370) Comment (0) Favorites

      Abstract:With the development of information technology, the interaction between information networks, human society, and physical space deepens, and the phenomenon of information space risk overflow becomes more severe. Fraudulent incidents have sharply increased, making fraud detection an important research field. Fraudulent behavior has brought numerous negative impacts to society, gradually presenting emerging characteristics such as intelligence, industrialization, and high concealment. Traditional expert rules and deep graph neural network algorithms are becoming increasingly limited in addressing fraudulent activities. Current fraud detection methods often rely on local information from the nodes themselves and neighboring nodes, either focusing on individual users, analyzing the relationship between nodes and graph topology, or utilizing graph embedding technology to learn node representations. Although these approaches offer certain fraud detection capabilities, they overlook the crucial role of long-range association patterns of entities and fail to explore common patterns among massive fraudulent paths, limiting comprehensive fraud detection capabilities. In response to the limitations of existing fraud detection methods, this study proposes a graph fraud detection model called path aggregation graph neural network (PA-GNN), based on path aggregation. The model includes variable-length path sampling, position-related unified path encoding, path interaction and aggregation, and aggregation-related fraud detection. Several paths originating from a node interact globally and compare their similarities, extracting common patterns among fraudulent paths, thus more comprehensively revealing the association patterns between fraudulent behaviors, and achieving fraud detection through path aggregation. Experimental results across multiple datasets in fraud scenarios, including financial transactions, social networks, and review networks, show that the area under the curve (AUC) and average precision (AP) metrics of the proposed method have significantly improved compared to the optimal benchmark models. In addition, the proposed method uncovers potential common fraudulent path patterns for fraud detection tasks, driving nodes to learn these important patterns and obtain more expressive representations, which offers a certain level of interpretability.

    • Secure and Efficient Fine-grained Statistical Analysis and Verifiable Data Aggregation Scheme Based on TEE

      2026, 37(2):875-893. DOI: 10.13328/j.cnki.jos.007426 CSTR: 32375.14.jos.007426

      Abstract (305) HTML (185) PDF 3.84 M (684) Comment (0) Favorites

      Abstract:With the rapid development of the Internet of Things (IoT), a growing number of smart terminal devices collect large volumes of patient medical data to support healthcare applications, offering considerable value for medical research. However, such data typically involve sensitive patient information and may face security risks such as tampering and unauthorized access during aggregation and transmission. To address these security and privacy concerns while enabling fine-grained statistical analysis, this study proposes a secure and efficient statistical analysis and verifiable data aggregation scheme based on trusted execution environments (TEE). The proposed scheme improves the m and m2 dual-message BGN homomorphic encryption algorithm and integrates digital signatures to ensure data confidentiality and integrity. A verifiable aggregate signature algorithm is introduced to enable batch validation of encrypted data, thus reducing authentication overhead. By shifting the complex statistical analysis of ciphertext data into the TEE, the scheme enhances computational efficiency while reducing processing costs. Moreover, fine-grained statistical analysis is achieved through an access control mechanism based on edge servers that authorize research center access. Performance evaluations indicate that the proposed scheme significantly reduces computational overhead on both the statistical analysis and data owner sides.

    • Detection Framework of Non-invasive Attack Against Private-algorithm Cryptographic Chips

      2026, 37(2):894-914. DOI: 10.13328/j.cnki.jos.007455 CSTR: 32375.14.jos.007455

      Abstract (351) HTML (406) PDF 6.23 M (882) Comment (0) Favorites

      Abstract:In recent years, cryptographic chips have developed rapidly. However, they are also facing a significant threat from non-invasive attacks. Although both international and domestic standards provide testing methods for non-invasive attacks, these standards are formulated for public algorithms and are not applicable to private algorithms, which still present considerable security risks. This study proposes a detection framework for private-algorithm cryptographic chips, which includes three components: timing analysis tests, simple power/electromagnetic analysis tests, and differential power/electromagnetic analysis tests. For the timing analysis test, a method based on average denoising is adopted, which significantly improves the accuracy of execution time measurements. Methods based on visual observation and cross-correlation analysis are presented for simple power/electromagnetic analysis tests. Finally, for differential power analysis, TVLA-1 and TVLA-2 are employed to detect leakages from various sources and evaluate the vulnerabilities of private-algorithm cryptographic chips to differential power attacks. The proposed framework serves as an effective supplement to traditional non-invasive attack detection, significantly expanding its application range. To verify the effectiveness of the framework, black-box experiments are conducted on several cryptographic chips. The results demonstrate that the framework can effectively assess the resilience of private-algorithm cryptographic chips against non-invasive attacks.

    • User-centric Secure Cloud-assisted Cross-application Data Circulation

      2026, 37(2):915-933. DOI: 10.13328/j.cnki.jos.007461 CSTR: 32375.14.jos.007461

      Abstract (395) HTML (187) PDF 3.09 M (1679) Comment (0) Favorites

      Abstract:This study proposes a cloud-assisted, user-centric, and secure data circulation scheme, referred to as CADC (user-centric secure cloud-assisted cross-application data circulation scheme). It enables convenient authentication and on-demand trusted data circulation for mobile users in a multi-App environment, allowing the unlocking of data value in mobile networks. Formal security analysis demonstrates that CADC can resist semi-honest cloud service providers and App service providers. Performance evaluation results indicate that CADC is highly efficient for both users and Apps.

    • Blockchain-based Semi-distributed Message Authentication and Encryption Scheme in 5G Internet of Vehicles

      2026, 37(2):934-952. DOI: 10.13328/j.cnki.jos.007490 CSTR: 32375.14.jos.007490

      Abstract (336) HTML (247) PDF 2.57 M (1754) Comment (0) Favorites

      Abstract:The 5G Internet of vehicles (IoV) enables high-speed data transmission by applying 5G technology to vehicular networks. However, with the rapid growth in the number of vehicles, traditional single third-party key generation can lead to single-point failures. In addition, wireless communication faces risks such as message interception, tampering, and network disruption due to a large number of untrustworthy false messages. To address these issues, this study proposes a blockchain-based semi-distributed message authentication and encryption scheme. First, a semi-distributed key generation and distributed information-sharing framework based on a consortium blockchain is designed. The 5G base stations provide full network coverage, while regional vehicle management centers act as full nodes to maintain the blockchain’s proper operation, and vehicles, as lightweight nodes, can only view the information stored on the blockchain. Second, to ensure message source authentication and confidentiality, a certificateless signature algorithm without bilinear pairing is designed, and an inverse hash chain is employed to generate reputation tokens for message encryption and decryption. To address the problem of untrustworthy false messages, a reputation-based system is introduced, where disseminating false messages leads to a decline in the vehicle’s reputation value. This mechanism helps to constrain vehicle behavior and reduces the number of false messages at the source. Finally, security analysis and experimental results demonstrate that the proposed scheme ensures communication security, effectively mitigates single-point failure risks through semi-distributed key acquisition, and prevents tampering, replay, and impersonation attacks. Moreover, the proposed scheme incurs low computational and communication overhead, meets the real-time requirements of IoV, and incurs low Gas costs for executing reputation update contracts, further demonstrating the practicality and feasibility of the proposed scheme.

    • Salient Object Detection Method Based on Edge-enhanced Wide Decoder

      2026, 37(2):953-968. DOI: 10.13328/j.cnki.jos.007443 CSTR: 32375.14.jos.007443

      Abstract (291) HTML (306) PDF 3.57 M (1526) Comment (0) Favorites

      Abstract:Salient object detection is developing rapidly. However, several critical challenges remain. Most existing methods struggle with high-resolution images due to either excessive computational demands or suboptimal detection quality. In addition, traditional convolutional operations commonly used in current algorithms lack targeted enhancement, resulting in inadequate edge detail extraction and blurred object boundaries. To address these limitations, this study proposes a salient object detection method based on an edge-enhanced wide decoder, which improves edge segmentation accuracy and enhances small-scale object detection while reducing computational overhead. A hybrid feature encoder combining a residual network and a Swin Transformer is employed to lower computational overhead. Traditional convolutions are replaced with a differential convolution module, where multiple types of differential convolutions are executed in parallel to extract richer edge information. A multi-scale attention module is incorporated to compute attention across four hierarchical feature layers, enabling better focus on objects of varying sizes. In addition, a multilevel wide decoder with large convolutional kernels is utilized to conduct long-range contextual modeling of fused features, effectively reducing redundant information and further boosting detection performance. Code will be released at https://github.com/wapitier/EEWDNet.

Current Issue


Volume , No.

Table of Contents

Archive

Volume

Issue

联系方式
  • 《Journal of Software 》
  • 主办单位:Institute of Software, CAS, China
  • 邮编:100190
  • 电话:010-62562563
  • 电子邮箱:jos@iscas.ac.cn
  • 网址:https://www.jos.org.cn
  • 刊号:ISSN 1000-9825
  •           CN 11-2560/TP
  • 国内定价:70元
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063