• Current Issue
  • Online First
  • Archive
  • Click Rank
  • Most Downloaded
  • 综述文章
  • 专刊文章
  • 分辑系列
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2024,35(2):513-531, DOI: 10.13328/j.cnki.jos.006944
    [Abstract] (342) [HTML] (48) [PDF 3.72 M] (583)
    Abstract:
    As an essential mechanism of group collaboration in software development, code comments are widely used by developers to improve the efficiency of specific developing tasks. However, code comments do not directly affect the software operation, and developers often ignore them, which leads to poor quality of code comments and affects development efficiency. Quality issues of code comments hinder code understanding, bring misunderstanding, or even introduce bugs, which receive widespread attention from researchers. This study systematically analyzes the research work of global scholars on quality issues of code comments in recent years by literature review. It also summarizes related studies in three aspects: evaluation dimensions of code comment quality, indicators of code comment quality, and strategies to promote code comment quality and points out shortcomings, challenges, and suggestions for the current research.
    2024,35(2):532-580, DOI: 10.13328/j.cnki.jos.006953
    [Abstract] (249) [HTML] (46) [PDF 2.62 M] (510)
    Abstract:
    The effectiveness of a test suite in defect detection refers to the extent to which the test suite could detect the defects hidden in the software. How to evaluate this performance of a test suite is an important issue. Coverage and mutation score are two of the most important and widely used metrics for test suite effectiveness. To quantify the defect detection capability of a test suite, researchers have devoted a large amount of research effort to this issue and have made significant progress. However, inconsistent conclusions can be observed among the existing studies, and some challenges still call for prompt resolution in the area. This study systematically summarizes the research results achieved by scholars both in China and abroad in the field of the evaluation of test suite effectiveness over the years. To start with, it expounds the problems in the research on the evaluation of test suite effectiveness. Then, it outlines and analyzes the evaluation of test suite effectiveness based on coverage and mutation score and presents the application of the evaluation of test suite effectiveness in test suite optimization. Finally, the study points out the challenges faced by this line of research and suggests the directions of future research.
    2024,35(2):581-603, DOI: 10.13328/j.cnki.jos.006975
    [Abstract] (807) [HTML] (46) [PDF 1.08 M] (793)
    Abstract:
    Open source software has been a key infrastructure of modern society, supporting software development in almost every field. Through various kinds of code reuse such as install dependency, API call, project fork, file copy, and code clone, open source software forms an intricate supply (i.e., dependency) network, which is referred to as an open source software supply chain. On the one hand, software supply chains facilitate software development and have become the foundation of the software industry. On the other hand, risks from upstream software can affect downstream software along the supply chain, leading to the ripple effect in open source software supply chains. Open source software supply chains have attracted more and more attention from both the academia and the industry. To help advance researchers’ knowledge of open source software supply chains, this study provides a definition and research framework of open source software supply chains from a holistic perspective. Then, it conducts a systematic literature review on worldwide research and summarizes the status quo of research from three aspects: structure and evolution, risk propagation and management, and dependency management. Finally, the study summarizes the challenges and opportunities of future research on open source software supply chains.
    2024,35(2):604-628, DOI: 10.13328/j.cnki.jos.006981
    [Abstract] (524) [HTML] (45) [PDF 7.13 M] (688)
    Abstract:
    This study focuses on Code Generation task that aims at generating relevant code fragments according to given natural language descriptions. In the process of software development, developers often encounter two scenarios. One is writing a large amount of repetitive and low-technical code for implementing common functionalities. The other is writing code that depends on specific task requirements, which may necessitate external resources such as documentation or other tools. Therefore, code generation has received a lot of attention among academia and industry for assisting developers in coding. It has also been one of the key concerns in the field of software engineering to make machines understand users’ requirements and write programs on their own. The recent development of deep learning techniques, especially pre-training models, makes the code generation task achieve promising performance. In this study, the current work on deep learning-based code generation is systematically reviewed and the current deep learning-based code generation methods are classified into three categories: methods based on code features, methods incorporated with retrieval, and methods incorporated with post-processing. The first category refers to the methods that use deep learning algorithms for code generation based on code features, and the second and third categories improve the performance of the methods in the first category. The existing research results of each category of methods are systematically reviewed, summarized, and commented. Besides, the study analyzes the corpus and the popular evaluation metrics used in the existing code generation work. Finally, it summarizes the overall literature review and provides a prospect for future research directions worthy of attention.
    2024,35(2):629-674, DOI: 10.13328/j.cnki.jos.006983
    [Abstract] (550) [HTML] (45) [PDF 2.26 M] (397)
    Abstract:
    Under the new era of “human-machine-thing” ternary integration and ubiquitous computing, the software deployment and operation environment of “open and changeable”, “diverse needs”, and “complex scenarios” have put forward more requirements and higher expectations for the governance of open-source software library ecosystems. To further promote the construction of trusted software supply chain ecosystems and create an independent and controllable technical system based on the ubiquitous computing model, this study focuses on open-source software library ecosystems. It collects 348 authoritative papers in this field in the past two decades (2001–2023) and sorts out the research work of open-source software library management ecological governance technology. The study discusses the modeling and analysis, evolution and maintenance, quality assurance, and management of open-source software supply chain ecosystems, and summarizes the research status, problems, challenges and trends.
    2024,35(2):675-710, DOI: 10.13328/j.cnki.jos.006933
    Abstract:
    Graph data, such as citation networks, social networks, and transportation networks, exist widely in the real world. Graph neural networks (GNNs) have attracted extensive attention due to their strong expressiveness and excellent performance in a variety of graph analysis applications. However, the excellent performance of GNNs benefits from label data which are difficult to obtain, and complex network models with high computational costs. Knowledge distillation (KD) is introduced into the GNNs to address the labeled data scarcity and high complexity of GNNs. KD is a method of training constructed small models (student models) by soft-label supervision information from larger models (teacher models) to yield better performance and accuracy. Therefore, how to apply the KD technology to graph data has become a research challenge, but there is still a lack of a graph-based KD research review. Aiming at providing a comprehensive overview of KD based on graphs, this study first summarizes the existing studies and fills in the review gap in this field. Specifically, this study first introduces the background knowledge of graph and KD. Then, three types of graph-based knowledge distillation methods are comprehensively summarized, including graph knowledge distillation for deep neural networks (DNNs), graph knowledge distillation for GNNs, and self-KD-based graph knowledge distillation. Furthermore, each type of method is further divided into knowledge distillation methods based on the output layer, the middle layer, and the constructed graph. Subsequently, the design ideas of various graph-based knowledge distillation algorithms are analyzed and compared, and the advantages and disadvantages of the algorithms are concluded with experimental results. In addition, the application of graph-based knowledge distillation in computer vision, natural language processing, recommendation systems, and other fields are also listed. Finally, the development of graph-based knowledge distillation is summarized and prospected. This study also discloses the references related to graph-based knowledge distillation on GitHub. Please refer to https://github.com/liujing1023/Graph-based-Knowledge-Distillation.
    2024,35(2):711-738, DOI: 10.13328/j.cnki.jos.007006
    [Abstract] (301) [HTML] (46) [PDF 1.99 M] (430)
    Abstract:
    In recent years, reinforcement learning methods based on environmental interactions have achieved great success in robotic applications, providing a practical and feasible solution for optimizing the behavior control strategies of robots. However, collecting interactive samples in the real world can lead to problems such as high cost and low efficiency. Therefore, the simulation environment is widely used in the training process of robot reinforcement learning. By obtaining a large number of training samples at a low cost in the virtual simulation environment for strategy training and transferring learning strategies to the real world, the security, reliability, and real-time problems in the real robot training process can be alleviated. However, due to the difference between the simulation environment and the real environment, it is often difficult to obtain ideal performance when directly transferring the strategy trained in the simulation environment to the real robot. To solve this problem, sim-to-real transfer reinforcement learning methods are proposed to reduce the environmental gap, so as to achieve effective strategy transfer. According to the direction of information flow in the process of transfer reinforcement learning and the different objects targeted by intelligent methods, this survey first proposes a sim-to-real transfer reinforcement learning framework, based on which the existing related work is then divided into three categories: the model optimization methods focusing on the real environment, the knowledge transfer methods focusing on the simulation environment, and the iterative policy promotion methods focusing on both simulation and real environments. Then, the representative technologies and related work in each category are described. Finally, the opportunities and challenges in this field are briefly discussed.
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    Available online:  February 05, 2024 , DOI: 10.13328/j.cnki.jos.007031
    Abstract:
    The utilization range of Internet of Things (IoT) devices is expanding. Model checking is an effective approach to improve the reliability and security of such devices. However, the commonly adopted model checking methods cannot well describe the cross-space movement and communication behavior common in such devices. To this end, this study proposes a modeling and verification method for the mobile and communication behavior of IoT devices to verify their spatio-temporal properties. Additionally, push/pull action and global communication mechanism are integrated into ambient calculus to propose the ambient calculus with global communication (ACGC) and provide a model checking algorithm for ACGC against the ambient logic. Then, the modeling language for mobility and communication (MLMC) is put forward to describe mobile and communication behavior of IoT devices. Additionally, a method to convert the MLMC-based description into an ACGC model is given. Furthermore, a model checking tool ACGCCk is implemented to verify whether the properties of IoT devices are satisfied. Meanwhile, some optimizations are conducted to accelerate the checking. Finally, the effectiveness of the proposed method is demonstrated by case study and experimental analysis.
    Available online:  February 05, 2024 , DOI: 10.13328/j.cnki.jos.007045
    Abstract:
    Network congestion control algorithms are the key factor indetermining network transport performance. In recent years, the spreading network, the growing network bandwidth, and the increasing user requirements for network performance have brought challenges to the design of congestion control algorithms. To adapt to different network environments, many novel design ideas of congestion control algorithms have been proposed recently, which have greatly improved the performance of networks and user experience. This study reviews innovative congestion control algorithm design ideas and classifies them into four major categories: reservation scheduling, direct measurement, machine learning-based learning, and iterative detection. It introduces the corresponding representative congestion control algorithms, and further compares and analyzes the advantages and disadvantages of various congestion control ideas and methods. Finally, the study looks forward to future development direction on congestion control to inspire research in this field.
    Available online:  February 05, 2024 , DOI: 10.13328/j.cnki.jos.007055
    Abstract:
    Detecting aligned double joint photographic experts group (JPEG) compression is a challenging task in digital image forensics. Previous studies have proposed methods that can effectively detect aligned double JPEG compression, but these methods mostly rely on features extracted during the JPEG decompression process. If the aligned double compressed JPEG image is saved in BMP format, these methods may be difficult to be directly applied. To address this issue, this study proposes a quantization step estimation method based on dual thresholds, which allows for the acquisition of quantization tables and the extraction of features. Furthermore, the study defines a minimum error based on the unique properties of JPEG compression with a quality factor of 100, and by removing the minimum error from the features, the feature detection performance of the proposed method is further improved. Finally, the study extracts first-order relative error features based on the convergence properties of the de-quantized JPEG coefficients, which further enhances the detection performance of the proposed method at lower quality factors. Experimental results demonstrate that the proposed method outperforms current state-of-the-art algorithms at different quality factors.
    Available online:  February 05, 2024 , DOI: 10.13328/j.cnki.jos.007062
    Abstract:
    Spoken language understanding is a key task in task-based dialogue systems, mainly composed of two sub-tasks: slot filling and intent detection. Currently, the mainstream method is to jointly model slot filling and intent detection. Although this method has achieved good results in both slot filling and intent detection, there are still issues with error propagation in the interaction process between intent detection and slot filling in joint modeling, as well as the incorrect correspondence between multi-intent information and slot information in multi-intent scenarios. In response to these problems, this study proposes a joint model for multi-intent detection and slot filling based on graph attention networks (WISM). The WISM established a word-level one-to-one mapping relationship between fine-grained intentions and slots to correct incorrect correspondence between multi-intent information and slots. By constructing an interaction graph of word-intent-semantic slots and utilizing a fine-grained graph attention network to establish bidirectional connections between the two tasks, the problem of error propagation during the interaction process can be reduced. Experimental results on the MixSINPS and MixATIS datasets showed that, compared with the latest existing models, WISM has improved semantic accuracy by 2.58% and 3.53%, respectively. This model not only improves accuracy but also verifies the one-to-one correspondence between multi-intent and semantic slots.
    Available online:  February 05, 2024 , DOI: 10.13328/j.cnki.jos.007064
    Abstract:
    Temporal graph is a type of graph where each edge is associated with a timestamp. Seasonal-bursting subgraph is a dense subgraph characterized by burstiness over multiple time periods, which can applied for activity discovery and group relationship analysis in social networks. Unfortunately, most previous studies for subgraph mining in temporal networks ignore the seasonal or bursting features of subgraphs. To this end, this study proposes a maximal ($\omega,\theta $)-dense subgraph model to represent a seasonal-bursting subgraph in temporal networks. Specially, the maximal ($\omega,\theta $)-dense subgraph is a subgraph that accumulates its density at the fastest speed during at least $ \omega $ particular periods of length no less than $ \theta $ on the temporal graph. To compute all seasonal bursting subgraphs efficiently, the study first models the mining problem as a mixed integer programming problem, which consists of finding the densest subgraph and the maximum burstiness segment. Then corresponding solutions are given for each subproblem, respectively. The study further conceives two optimization strategies by exploiting key-core and dynamic programming algorithms to boost performance. The results of experiments show that the proposed model is indeed able to identify many seasonal-bursting subgraphs. The efficiency, scalability, and effectiveness of the proposed algorithms are also verified on five real-life datasets.
    Available online:  February 05, 2024 , DOI: 10.13328/j.cnki.jos.007067
    Abstract:
    The code search method based on deep learning realizes the code search task by calculating the similarity of the corresponding representation of the code and the description statement. However, this manner does not consider the real probability distribution of relevance between the code and the description. To solve this problem, this study proposes a code search method based on a generative adversarial game that combines the correlation between the code and the description in the classical probability model with the feature extraction in the vector space model. Then the generative adversarial game is adopted to apply the probability distribution between the code and the description to the alternate training of the generator and discriminator. Meanwhile, the code encoder and the description encoder are optimized, and high-quality code representation and description statement representation are generated for the code search task. Finally, experimental verification is carried out on the public dataset, and the results show that the proposed method improves the Recall@10, MRR@10, and NDCG@10 metrics by 8.4%, 32.5%, and 24.3% respectively compared to the DeepCS method.
    Available online:  January 31, 2024 , DOI: 10.13328/j.cnki.jos.007066
    Abstract:
    Raft is one of the most popular distributed consensus protocols. Since it was proposed in 2014, Raft and its variants have been widely used in different kinds of distributed systems. To prove the correctness of the Raft protocol, developers use the TLA+ formal specification to model and verify its design. However, due to the gap between the abstract formal specification and practical implementation, distributed systems that implement the Raft protocol can still violate the protocol design and introduce intricate bugs. This study proposes a novel testing technique based on TLA+ formal specification to unearth bugs in Raft implementations. To be specific, the study maps the formal specification to the corresponding system implementation and then uses the specification-defined state space to guide the testing in the implementations. To evaluate the feasibility and effectiveness of the proposed approach, the study applies it on two different Raft implementations and finds 3 previously unknown bugs.
    Available online:  January 31, 2024 , DOI: 10.13328/j.cnki.jos.007063
    Abstract:
    With the development of Internet information technology, large-scale graphs have widely emerged in social networks, computer networks, and biological information networks. In view of the storage and performance limitations of traditional graph data management technology when dealing with large-scale graphs, distributed management technology has become a hotspot in industry and academia fields. The core decomposition is adopted to get core numbers of vertices in a graph and plays a key role in many applications, including community search, protein structure analysis, and network structure visualization. The existing distributed core decomposition algorithm applied a broadcast message delivery mechanism based on the vertex-centric mode, which may generate a large amount of redundant communication and computation overhead and lead to memory overflow when processing large-scale graphs. To address these issues, this study proposes novel distributed core decomposition algorithms based on global activation and peeling calculation frameworks, respectively. In addition, there are several strategies designed to improve algorithm performance. Based on the locality of the vertex core number, the study proposes a new message-pruning strategy and a new worker-centric computing mode, thereby improving the efficiency of our algorithms. To verify those strategies, this study deploys the proposed models and algorithms on the distributed cluster of the National Supercomputing Center in Changsha, and the effectiveness and efficiency of the proposed methods are evaluated through a large number of experiments on real and synthetic data sets. The total time performance of the algorithm is improved by 37% to 98%.
    Available online:  January 31, 2024 , DOI: 10.13328/j.cnki.jos.007056
    Abstract:
    Time series forecasting models have been widely used in various domains of daily life, and the attack against these models is related to the security of data in applications. At present, adversarial attacks on time series mostly perform large-scale perturbation at the global level, which leads to the easy perception of adversarial samples. At the same time, the effectiveness of adversarial attacks decreases significantly with the magnitude shrinkage of the perturbation. Therefore, how to generate imperceptible adversarial samples while maintaining a competitive performance of attack is an urgent problem that needs to be solved in the current adversarial attack field of time series forecasting. This study first proposes a local perturbation strategy based on sliding windows to narrow the perturbation interval of the adversarial sample. Second, it employs the differential evolutionary algorithm to find the optimal attack points and combine the segmentation function to partition the perturbation interval to further reduce the perturbation range and complete the semi-white-box attack. The comparison experiments with existing adversarial attack methods on several different deep learning models show that the proposed method can generate less perceptible adversarial samples and effectively change the prediction trend of the model. The proposed method achieves sound attack results in four challenging tasks, namely stock trading, electricity consumption, sunspot observation, and temperature prediction.
    Available online:  January 31, 2024 , DOI: 10.13328/j.cnki.jos.007052
    Abstract:
    The performance of image classification algorithms is limited by the diversity of visual information and the influence of background noise. Existing works usually apply cross-modal constraints or heterogeneous feature alignment algorithms to learn visual representations with strong discrimination. However, the difference in feature distribution caused by modal heterogeneity limits the effective learning of visual representations. To address this problem, this study proposes an image classification framework (CMIF) based on cross-modal semantic information inference and fusion and introduces the semantic description of images and statistical knowledge as privileged information. The study uses the privileged information learning paradigm to guide the mapping of image features from visual space to semantic space in the training stage, and a class-aware information selection (CIS) algorithm is proposed to learn the cross-modal enhanced representation of images. In view of the heterogeneous feature differences in representation learning, the partial heterogeneous alignment (PHA) algorithm is used to achieve cross-modal alignment of visual features and semantic features extracted from privileged information. In order to further suppress the interference caused by visual noise in semantic space, the CIS algorithm based on graph fusion is selected to reconstruct the key information in the semantic representation, so as to form an effective supplement to the visual prediction information. Experiments on the cross-modal classification datasets VireoFood-172 and NUS-WIDE show that CMIF can learn robust semantic features of images, and it has achieved stable performance improvement on the convolution-based ResNet-50 and Transform-based ViT image classification models as a general framework.
    Available online:  January 31, 2024 , DOI: 10.13328/j.cnki.jos.007053
    Abstract:
    Freehand sketches can intuitively present users’ creative intention by drawing simple lines and enable users to express their thinking process and design inspiration or produce target images or videos. With the development of deep learning methods, sketch-based visual content generation performs cross-domain feature mapping by learning the feature distribution between sketches and visual objects (images and videos), enabling the automated generation of sketches from images and the automated generation of images or videos from sketches. Compared with traditional artificial creation, it effectively improves the efficiency and diversity of generation, which has become one of the most important research directions in computer vision and graphics and plays an important role in design, visual creation, etc. Therefore, this study presents an overview of the research progress and future development of deep learning methods for sketch-based visual content generation. The study classifies the existing work into sketch-based image generation and sketch-based video generation according to different visual objects and analyzes the generation models in detail with a combination of specific tasks including cross-domain generation between sketch and visual content, style transfer, and editing of visual content. Then, it summarizes and compares the commonly used datasets and points out sketch propagation methods to address in sufficient sketch data and evaluation methods of generated models. Furthermore, the study prospects the research trend based on the challenges faced by the sketch in the application of visual content generation and the future development direction of generated models.
    Available online:  January 24, 2024 , DOI: 10.13328/j.cnki.jos.007051
    Abstract:
    A heterogeneous graph is a graph with multiple types of nodes and edges, also known as a heterogeneous information network, which is often used to model systems with rich features and association patterns in the real world. Link prediction between heterogeneous nodes is a fundamental task in network analysis. In recent years, the development of heterogeneous graph neural network (HGNN) has greatly advanced the task of link prediction, which is usually regarded as a feature similarity analysis between nodes or a binary classification problem based on paired node features. However, when learning node feature representations, existing HGNNs usually only focus on the associations between adjacent nodes or the meta-path-based structural information. This not only makes these HGNNs difficult to capture the semantic information of the ring structure inherent in heterogeneous graphs but also ignores the complementarity of structural information at different levels. To solve the above issues, this study proposes a cascade graph convolution network based on multi-level graph structures (CGCN-MGS), which is composed of graph neural networks based on three graph structures of different levels: neighboring, meta-path, and ring structures. CGCN-MGS can mine rich and complementary information from multi-level features and improve the representation ability of the learned node features on the semantics and structure information of nodes. Experimental results on several benchmark datasets show that CGCN-MGS can achieve state-of-the-art performance on the link prediction of heterogeneous graphs.
    Available online:  January 24, 2024 , DOI: 10.13328/j.cnki.jos.007080
    Abstract:
    Malware detection is a hotspot of cyberspace security research, such as Windows malware detection and Android malware detection. With the development of machine learning and deep learning, some outstanding algorithms in the fields of image recognition and natural language processing have been applied to malware detection. These algorithms have shown excellent learning performance with a large amount of data. However, there are some challenging problems in malware detection that have not been solved effectively. For instance, conventional learning methods cannot achieve effective detection based on a few novel malware. Therefore, few-shot learning (FSL) is adopted to solve the few-shot for malware detection (FSMD) problems. This study extracts the problem definition and the general process of FSMD by the related research. According to the principle of the method, FSMD methods are divided into methods based on data augmentation, methods based on meta-learning, and hybrid methods combining multiple technologies. Then, the study discusses the characteristics of each FSMD method. Finally, the background, technology, and application prospects of FSMD are proposed.
    Available online:  January 24, 2024 , DOI: 10.13328/j.cnki.jos.007042
    Abstract:
    In recent years, machine learning has always been a research hotspot, and has been applied to various fields with an important role played. However, as the data amount continues to increase, the training time of machine learning algorithms is getting longer. Meanwhile, quantum computers demonstrate a powerful computing ability. Therefore, researchers try to solve the problem of long machine learning training time, which leads to the emergence of quantum machine learning. Quantum machine learning algorithms have been proposed, including quantum principal component analysis, quantum support vector machine, and quantum deep learning. Additionally, experiments have proven that quantum machine learning algorithms have a significant acceleration effect, leading to a gradual upward trend in research on quantum machine learning. This study reviews research on quantum machine learning algorithms. First, the fundamental concepts of quantum computing are introduced. Then, five quantum machine learning algorithms are presented, including quantum supervised learning, quantum unsupervised learning, quantum semi-supervised learning, quantum reinforcement learning, and quantum deep learning. Next, related applications of quantum machine learning are demonstrated with the algorithm experiments provided. Finally, the relevant summary and prospect of future study are discussed.
    Available online:  January 24, 2024 , DOI: 10.13328/j.cnki.jos.007068
    Abstract:
    Currently, most of the published image steganalysis methods are designed for grayscale images, which cannot effectively detect color images widely used in social media. To solve this problem, this study proposes a color image steganalysis method based on central difference convolution and attention enhancement. The proposed method first designs a backbone flow consisting of three stages: preprocessing, feature extraction, and feature classification. In the preprocessing stage, the input color image is color channel-separated, and the residual images after SRM filtering are concatenated through each channel. In the feature extraction stage, the study constructs three convolutional blocks based on central difference convolution to extract deeper steganalysis feature maps. In the classification stage, the study uses global covariance pooling and two fully connected layers with dropout operation to classify the cover and stego images. Additionally, to further enhance the feature expression ability of the backbone flow at different stages, it introduces a residual spatial attention enhancement module and a channel attention enhancement module at the early and late stages of the backbone flow, respectively. Specifically, the residual spatial attention enhancement module first uses Gabor filter kernels to perform channel-separated convolution on the input image and then obtains the effective information of the residual feature map through the spatial attention mechanism. The channel attention enhancement module enhances the final feature classification ability of the model by obtaining the dependence relationship between channels. A large number of comparative experiments have been conducted, and the results show that the proposed method can significantly improve the detection performance of color image steganography and achieve the best results currently. In addition, the study also conducts corresponding ablation experiments to verify the rationality of the proposed network architecture.
    Available online:  January 24, 2024 , DOI: 10.13328/j.cnki.jos.007050
    Abstract:
    A directed acyclic graph (DAG)-based blockchain adopts a parallel topology and can significantly improve system performance compared with conventional chain-based blockchains with a serial topology. As a result, it has attracted wide attention from the industry. However, the storage model and the consensus protocol of the existing DAG-based blockchains are highly coupled, which lacks the flexibility to meet diversified application demands. Furthermore, most DAG-based blockchains lack flexibility at the consensus protocol level and are limited to probabilistic consensus protocols, which is difficult to take into account confirmation latency and security and is especially unfriendly to delay-sensitive applications. Therefore, this study presents the elastic DAG-based blockchain, namely ElasticDAG. The core idea is to decouple the storage model and the consensus protocol, enabling them to proceed in parallel and independently, so as to flexibly adapt to diversified applications. In order to improve the throughput and activity of the system, an adaptive block confirmation strategy and an epoch-based block ordering algorithm are designed for the storage model. In response to the need to reduce transaction confirmation latency, a low-latency DAG blockchain hybrid consensus protocol is designed. Experimental results demonstrate that the ElasticDAG prototype in WAN can achieve a throughput exceeding 11 Mb/s, and it yields a confirmation latency of tens of seconds. Compared with OHIE and Haootia, ElasticDAG can reduce confirmation latency by 17 times and improve security from 91.04% to 99.999 914% while maintaining the same throughput and consensus latency.
    Available online:  January 24, 2024 , DOI: 10.13328/j.cnki.jos.007059
    Abstract:
    As big data and computing power rapidly develop, deep learning has made significant breakthroughs and rapidly become a field with numerous practical application scenarios and active research topics. In response to the growing demand for the development of deep learning tasks, deep learning frameworks have arisen. Acting as an intermediate component between application scenarios and hardware platforms, deep learning frameworks facilitate the development of deep learning applications, enabling users to efficiently construct diverse deep neural network (DNN) models, and deeply adapt to various computing hardware, meeting the computational needs across different computing architectures and environments. Any issues that arise within deep learning frameworks, which serve as the fundamental software in the realm of artificial intelligence, can have severe consequences. Even a single bug in the code can trigger widespread failures within models built upon the framework, thereby posing a serious threat to the safety of deep learning systems. As the first review exclusively focuses on the testing of deep learning frameworks, this study initially introduces the developmental history and basic architectures of deep learning frameworks. Subsequently, by systematically examining 55 academic papers directly related to the testing of deep learning frameworks, the study systematically analyzes and summarizes bug characteristics, key technologies for testing, and methods based on various input forms for testing. The study explores how to combine key technologies to address research problems. Lastly, it summarizes the unresolved difficulties in the testing of deep learning frameworks and provides insights into promising research directions for the future. This study can offer valuable references and guidance to individuals involved in the research field of deep learning framework testing, ultimately promoting the sustained development and maturity of deep learning frameworks.
    Available online:  January 17, 2024 , DOI: 10.13328/j.cnki.jos.007049
    Abstract:
    In the white-box attack context, an attacker can access the implementation process of the cryptographic algorithm, observe the dynamic execution and internal details of the algorithm, and modify it arbitrarily. In 2002, Chow et al. proposed the concept of white-box cipher and pointed out the white-box implementation of the AES algorithm and DES algorithm by using lookup table technology, which is called the CEJO framework. The white-box implementation obfuscates the existing cryptographic algorithms, protects the key in the form of software under white-box attack, and ensures the correctness of the algorithm results. SIMON is a lightweight block cipher algorithm, which is widely used in Internet of Things devices because of its great software and hardware performance. It is of great practical significance to study the white-box implementation of this algorithm. This study presents two white-box implementations of the SIMON algorithm. The first scheme (SIMON-CEJO) uses the classical CEJO framework to protect the lookup tables by using network codings, so as to confuse the key. In this scheme, the occupied memory space is 369.016 KB. The security analysis shows that the SIMON-CEJO scheme can resist BGE attack and affine equivalent algorithm attack, but it fails to resist differential computing analysis. The second scheme (SIMON-Masking) uses the encoding method proposed by Battistello et al. to encode the plaintext information and key information, and it uses the homomorphism of encoding to convert the XOR operation and AND operation into modular multiplication and table lookup operation. Finally, the corresponding ciphertext result is obtained by decoding. During the operation of the algorithm, the Boolean mask is added to the AND operation. The randomness of the codings protects the real key information and improves the ability of the scheme to resist differential computing analysis and other attacks. SIMON-Masking occupies 655.81 KB of memory space, and the time complexity of the second-order differential computing based on the Legendre symbol is O(n2klog2p). The comparison results of the two schemes show that the classical CEJO framework cannot effectively defend against differential computing analysis, but using new coding and adding masks are effective white-box implementation methods.
    Available online:  January 17, 2024 , DOI: 10.13328/j.cnki.jos.007058
    Abstract:
    Due to the continuous advancements in the field of deep learning, there is growing interest in extending relational databases with collaborative query processing (CQP) techniques to handle advanced analytical queries involving structured and unstructured data. State-of-the-art CQP methods employ user-defined functions (UDFs) to implement deep neural network (NN) models for processing unstructured data while utilizing relational operations for structured data. UDF-based approaches simplify query composition, allowing users to submit analytical queries with a single SQL statement. However, they require manual selection of appropriate and efficient models based on desired performance metrics during ad-hoc data analysis, posing significant challenges to users. To address this issue, this research proposes a CQP technique based on declarative inference functions (DIF), which constructs a complete CQP framework by optimizing model selection, execution strategies, and device bindings across multiple query execution paths. Leveraging the cost model and optimization rules designed in this study, the query processor is capable of estimating the cost of different query plans and automatically selecting the optimal physical query plan. Experimental results on four datasets validate the effectiveness and efficiency of the proposed DIF-based CQP approach.
    Available online:  January 17, 2024 , DOI: 10.13328/j.cnki.jos.007043
    Abstract:
    The graphical user interface (GUI/UI) provides a visual bridge between the application and its end users, and users can use the application through interactive operations. With the development of mobile applications, GUI, which combines aesthetics and interaction design, has become more and more complex, and users are increasingly concerned about the accessibility and availability of applications. However, the complexity of GUI also brings great challenges to its design and implementation. Due to user-defined settings for mobile devices and different device models and screen resolutions, UI display issues frequently occur. For example, due to software or hardware compatibility, when rendering interfaces on different devices, there will always be display issues such as text overlap, component masking, and image loss. They have a negative impact on the availability and accessibility of applications, resulting in poor user experience. Unfortunately, little is known about the causes of UI display issues of mobile applications. In order to cope with this challenge, this study collects 6729 screenshots of applications with UI display issues from Baidu crowdtesting platform and 1016 screenshots of applications provided by issue reports in GitHub and identifies nine types of UI display issues using the theme analysis method. Through the analysis of 1061 UI issue reports from GitHub and the corresponding defective code, the essence and causes of UI display issues are summarized. The research found that (1) 62.1% of the total screenshots in crowdtesting dataset are defective screenshots displayed on the UI; (2) the reason for the UI display issues is that the font scaling setting does not match the adaptive setting of components to a great extent; (3) the layout setting of the interface will lead to display issues; (4) If the hardware acceleration is not turned on, the normal display of the interface will be affected.
    Available online:  January 10, 2024 , DOI: 10.13328/j.cnki.jos.007039
    Abstract:
    Graph neural network (GNN) is a framework for directly characterizing graph structured data by deep learning, and has caught increasing attention in recent years. However, the traditional GNN based on message passing aggregation (MP-GNN) ignores the smoothing speed of different nodes and aggregates the neighbor information indiscriminately, which is prone to the over-smoothing phenomenon. Thus, this study proposes a graph kernel neural network classification method KENN based on linear structural entropy. KENN firstly adopts the graph kernel method to encode node subgraph structure, determines isomorphism among subgraphs, and then utilizes the isomorphism coefficient to define the smoothing coefficient among different neighbors. Secondly, it extracts the graph structural information based on the low-complexity linear structural entropy to deepen and enrich the structural expression capability of the graph data. This study puts forward a graph kernel neural network classification method by deeply integrating linear structural entropy, graph kernel and GNN, which can solve the sparse node features of biomolecular data and information redundancy generated by leveraging node degree as features in social network data. It also enables the GNN to adaptively adjust its ability to characterize the graph structural features and makes GNN beyond the upper bound of MP-GNN (WL test). Finally, experiments on seven public graph classification datasets verify that the proposed model outperforms other benchmark models.
    Available online:  January 10, 2024 , DOI: 10.13328/j.cnki.jos.007040
    Abstract:
    Named entity recognition (NER) is a fundamental task in information extraction and aims to locate the boundaries of entities in a sentence and classify them. In response to the fuzzy boundaries of nested entities based on span detection models, this study proposes a nested NER model based on span boundary perception. Firstly, it utilizes a bidirectional affine attention mechanism to capture the semantic relevance among word tokens and then generates a span semantic representation matrix. Secondly, it designs a second-order diagonal neighborhood difference operator and establishes a span semantic difference mechanism to extract semantic difference information among spans. Additionally, a span boundary perception mechanism is introduced to employ the local feature extraction ability of sliding windows to enhance the span boundary semantic differences, thereby accurately locating the entity span. The model is validated on three benchmark datasets of ACE04, ACE05, and Genia. The experimental results show that the proposed model outperforms related work in entity recognition accuracy. Additionally, the study conducts ablation experiments and case studies to verify the effectiveness of the proposed semantic difference mechanism and span boundary perception mechanism, providing new ideas and empirical evidence for further research on NER.
    Available online:  January 10, 2024 , DOI: 10.13328/j.cnki.jos.007037
    Abstract:
    Ubiquitous computing for human-cyber-physical integration is becoming a new requirement and trend in software development. Based on this new computing paradigm, human-cyber-physical applications further extend software technology to the effective utilization of offline resources, including physical devices and human resources. As a typical human-cyber-physical scenario, the collaboration between the device and human resources in the physical world features resource selectivity, high task frequency, and worker dynamics. Traditional resource scheduling techniques cannot meet the scheduling requirements of this task type (referred to as DHRC task). Thus, this study proposes an optimal scheduling method for collaborative tasks between device and human resources. This method includes two stages of device resource scheduling and human resource scheduling. In the device resource scheduling stage, a device resource scheduling algorithm based on NSGA-II is proposed to optimize task resource selection by comprehensively considering such factors as task distance, device load, and the worker number around the device location. In the human resource scheduling stage, a human resource scheduling algorithm based on DPSO is put forward to optimize the worker selection and corresponding path planning according to such factors as worker location and collaboration dependency. Experiments in a simulated environment show that the algorithm in the first stage is equivalent in efficiency and superior in utility to the compared algorithm (discrete particle swarm optimization algorithm). The algorithm in the second stage is superior in efficiency and utility to the compared algorithm (the genetic algorithm improved by the tournament mechanism).
    Available online:  January 10, 2024 , DOI: 10.13328/j.cnki.jos.007048
    Abstract:
    Database management systems (DBMSs) are the infrastructure for efficient storage, management, and analysis of data, playing a pivotal role in modern data-intensive applications. Vulnerabilities in DBMSs pose a great threat to the security of data and the operation of applications. Fuzzing is one of the most popular dynamic vulnerability detection techniques and has been applied to analyze DBMSs, uncovering many vulnerabilities. This study analyzes the requirements and the difficulties involved in testing a DBMS and proposes a foundational framework for DBMS fuzzing. It also analyzes the challenges encountered by DBMS fuzzers and identifies the dimensions that necessitate support. It introduces typical DBMS fuzzers from the perspective of discovering different types of vulnerabilities and summarizes key techniques in DBMS fuzzing, including SQL statement synthesis, code coverage tracking, and test oracle construction. Several popular DBMS fuzzers are evaluated in terms of coverage, syntax and semantic correctness of the generated test cases, and the ability to find vulnerabilities. Finally, it presents the problems faced by current DBMS fuzzing research and practices and prospects for future research directions in DBMS fuzzing.
    Available online:  January 10, 2024 , DOI: 10.13328/j.cnki.jos.007060
    Abstract:
    To solve the problems of users’ private key security, this study proposes a user-oriented and practical private key protection framework by combining secret sharing and edge computing mode. Based on this framework, it designs a private key protection scheme for the SM2 public-key cryptographic system. In this scheme, a user’s SM2 private key is divided into two shares via a secret sharing scheme and kept by the user’s device and the edge server respectively. The public-key cryptographic task requested by Web3 applications is executed cooperatively by the user’s device and the edge server without having to recover the original private key. After the user’s device or the edge server is attacked, a key updating protocol will be executed among them to update the private key shares and scrap the one that may have been leaked. Experiment results show that the computing time of the new scheme is acceptable for common devices (smartphones, laptops, etc.) in the real world.
    Available online:  January 03, 2024 , DOI: 10.13328/j.cnki.jos.007057
    Abstract:
    Currently, sentiment analysis research is generally based on big data-driven models, which heavily rely on expensive annotation and computational costs. Therefore, research on sentiment analysis in low-resource scenarios is particularly urgent. However, existing research on sentiment analysis in low-resource scenarios mainly focuses on a single task, making it difficult for models to acquire external task knowledge. Therefore, this study constructs successive sentiment analysis in low-resource scenarios, aiming to allow models to learn multiple sentiment analysis tasks over time by continual learning methods. This can make full use of data from different tasks and learn sentiment information from different tasks, thus alleviating the problem of insufficient training data for a single task. There are two core problems with successive sentiment analysis in low-resource scenarios. One is preserving sentiment information for a single task, and the other is fusing sentiment information between different tasks. To solve these two problems, this study proposes continual attention modeling for successive sentiment analysis in low-resource scenarios. Sentiment masked Adapter (SMA) is first constructed, which is used to generate hard attention emotion masks for different tasks. This can preserve sentiment information for different tasks and mitigate catastrophic forgetting. Secondly, dynamic sentiment attention (DSA) is proposed, which dynamically fuses features extracted by different Adapters based on the current time step and task similarity. This can fuse sentiment information between different tasks. Experimental results on multiple datasets show that the proposed approach significantly outperforms the state-of-the-art benchmark approaches. Additionally, experimental analysis indicates that the proposed approach has the best sentiment information retention ability and sentiment information fusion ability compared to other benchmark approaches while maintaining high operational efficiency.
    Available online:  January 03, 2024 , DOI: 10.13328/j.cnki.jos.007046
    Abstract:
    Smart contracts are computer programs running in the contract layer of the blockchain, which can be used to manage cryptocurrencies and data on the blockchain, realize diverse business logic, and expand the application of the blockchain. A large number of assets are stored in smart contracts, which attract attackers to steal the assets and obtain economic benefits via security vulnerabilities. In recent years, with the frequent occurrence of smart contract security incidents (such as TheDAO and Parity security incidents), the security vulnerability detection technique for smart contracts has become a hot research topic. This study proposes a research framework for detecting security vulnerabilities of smart contracts and analyzes the research progress of existing vulnerability detection techniques from three aspects: vulnerability discovery and identification, vulnerability analysis and detection, and dataset and evaluation indicators. Firstly, the basic process of collecting security vulnerability information is sorted out, and the security vulnerabilities are classified into 13 types according to their basic characteristics. A classification framework for security vulnerabilities of smart contracts is proposed. Secondly, existing techniques are studied in terms of symbolic execution, fuzzing testing, machine learning, formal verification, and static analysis, and the advantages and limitations of each technique are analyzed. Thirdly, the commonly used datasets and evaluation indicators are summarized. Finally, potential research directions for security vulnerability detection of smart contracts in the future are discussed.
    Available online:  January 03, 2024 , DOI: 10.13328/j.cnki.jos.007044
    Abstract:
    Due to the complex features of multi-view data, multi-view outlier detection has become a very challenging research topic in outlier detection. There are three types of outliers in multi-view data, namely class outliers, attribute outliers, and class-attribute outliers. Most of the early multi-view outlier detection methods are based on the assumption of clustering, which makes it difficult to detect outliers when there is no clustering structure in the data. In recent years, many multi-view outlier detection methods use the multi-view consistent nearest neighbor assumption instead of the clustering assumption, but they still suffer from the problem of inefficient detection of new data. In addition, most existing multi-view outlier detection methods are unsupervised, which are affected by outliers during model learning and do not work well when dealing with datasets with high outlier rates. To address these issues, this study proposes an intra-view reconstruction and cross-view generation network for effective multi-view outlier detection to detect the three types of outliers, which consists of two modules: intra-view reconstruction and cross-view generation. By training with normal data, the proposed method can fully capture the features of each view in the normal data and reconstruct and generate the corresponding views better. In addition, a new outlier calculation method is proposed to calculate the corresponding outlier scores for each sample to efficiently detect new data. Extensive experimental results show that the proposed method significantly outperforms existing methods. It is known that this is the first work to apply a deep model based on generative adversarial networks to multi-view outlier detection.
    Available online:  December 27, 2023 , DOI: 10.13328/j.cnki.jos.007032
    Abstract:
    Federated learning has caught much attention because it can solve data islands. However, it also faces challenges such as the risk of privacy leakage and performance degradation due to model heterogeneity under non-independent and identically distributed data. To this end, this study proposes a personalized federated learning method based on Bregman divergence and differential privacy (FedBDP). This method employs Bregman divergence to measure the differences between local and global parameters and adopt it as a regularization term to update the loss function, thereby reducing model differences to improve model accuracy. Meanwhile, adaptive differential privacy technology is utilized to perturb local model parameters, and the attenuation coefficient is defined to dynamically adjust the level of the differential privacy noise in each round, and thus reasonably allocate the privacy noise level and improve the model availability. Theoretical analysis shows that FedBDP satisfies convergence conditions under both strongly convex and non-convex smooth functions. Experimental results demonstrate that the FedBDP method can guarantee accuracy in the MNIST and CIFAR10 datasets on the premise of satisfying differential privacy.
    Available online:  December 27, 2023 , DOI: 10.13328/j.cnki.jos.007038
    Abstract:
    Penetration testing is an important means to discover the weaknesses of significant network information systems and protect network security. Traditional penetration testing relies heavily on manual labor and has high technical requirements for testers, limiting the popularization depth and breadth. By introducing artificial intelligence technology into the whole penetration testing process, automated penetration testing lowers the technical threshold of penetration testing based on greatly solving the problem of heavy dependence on manual labor. Automated penetration testing can be mainly divided into model-based and rule-based automated penetration testing, and the research of the two has their respective focuses. The former utilizes model algorithms to simulate hacker attacks with attention paid to attack scene perception and attack decision-making models. The latter concentrates on how to efficiently adapt attack rules and attack scenarios. This study mainly analyzes the implementation principles of automated penetration testing from three aspects of attack scenario modeling, penetration testing modeling, and decision-making reasoning model. Finally, the future development direction of automated penetration is explored from the dimensions of attack-defense confrontation and vulnerability combination utilization.
    Available online:  December 27, 2023 , DOI: 10.13328/j.cnki.jos.007036
    Abstract:
    Partitioned DM (deadline-monotonic) scheduling of sporadic real-time tasks is a classic research problem. This study proposes a partitioned scheduling algorithm PDM-FFD (partitioned deadline-monotonic first-fit decrease) with higher processor utilization for constrained-deadline sporadic tasks. In PDM-FFD, firstly tasks are sorted in non-decreasing order according to the relative deadline, then the first-fit strategy is utilized to select the processor core to allocate tasks, and each core adopts DM scheduling policy. Finally, a tighter schedulability determination method is obtained by analyzing the task interference time to determine the task schedulability. This study proves that the speedup factor of PDM-FFD is $3 - (3\Delta + 1)/(m + \Delta )$ and the time complexity is ${\rm{O}}({n^2}) + {\rm{O}}(nm)$. $\Delta =\displaystyle{\sum }_{{\tau }_{j}\in \tau }{C}_{j} \times {u}_{j}/{D}_{{\rm{max}}}$ where ${\tau _j}$ belongs to the task set $\tau $, ${C_j}$is the worst-case execution time, ${u_j}$is the utilization, ${D_{{\rm{max}}}}$ is the maximum relative deadline, n is the task number, and m is the processor core number. The speedup factor of PDM-FFD is strictly less than $3 - 1/m$, which outperforms the existing multi-core partitioned scheduling algorithm FBB-FFD. Experiments show that PDM-FFD improves processor utilization by 18.5% compared to other available algorithms on a four-core processor. The PDM-FFD performance improves with the increasing processor core number, task set utilization, and task number. Due to high performance, PDM-FFD can be widely utilized in typical real-time systems such as resource-constrained spacecraft, autonomous vehicles, and industrial robots.
    Available online:  December 20, 2023 , DOI: 10.13328/j.cnki.jos.007047
    Abstract:
    As the scale of open-source artificial intelligence (AI) systems expands, software development and maintenance become difficult. GitHub is one of the most important hosting platforms for open-source projects in the open-source community. Developers can easily participate in the development of open-source projects through pull request systems provided by GitHub. The description of pull requests can help the core teams of the project understand the content of the pull requests and the intention of the developers and promote the acceptance of the pull request. At present, a considerable proportion of developers do not provide a description for the pull request, which not only increases the workload of the core team but also is not conducive to the maintenance of the project in the future. This study proposes a method named PRSim to automatically generate descriptions for pull requests. This method extracts features including commit messages, comment updates, and code changes from pull requests, builds a syntax modification tree, and uses a tree-structured autoencoder to find other pull requests with similar code changes. Then, with the help of the description of a similar pull request, it summarizes commit messages and comment updates through an encoder-decoder network to generate the description of a new pull request. The experimental results show that the generation effect of PRSim reaches 36.47%, 27.69%, and 35.37% in terms of the F1 score of metrics Rouge-1, Rouge-2, and Rouge-L, respectively, which is 34.3%, 75.2%, and 55.3% higher than LeadCM, 16.2%, 22.9%, and 16.8% higher than Attn+PG+RL, and 23.5%, 72.0%, and 24.8% higher than PRHAN.
    Available online:  December 20, 2023 , DOI: 10.13328/j.cnki.jos.007061
    Abstract:
    With the development of the intelligent information era, applications of deep neural networks in various fields of human society, especially deployments in safety-critical systems such as automatic driving and military defense, have aroused concern from academic and industrial communities on the erroneous behaviors that deep neural networks may exhibit. Although neural network verification and neural network testing can provide qualitative or quantitative conclusions about erroneous behaviors, such post-analysis cannot prevent their occurrence. How to repair the pre-trained neural networks that feature wrong behavior is still a very challenging problem. To this end, deep neural network repair/patching comes into being, aiming at eliminating the unexpected predictions generated by defective neural networks and making the neural networks meet certain specification properties. So far, there are three typical neural network repair paradigms: retraining, fine tuning without fault localization, and fine tuning with fault localization. This study introduces the development of deep neural networks and the necessity of deep neural network repair, clarifies some similar concepts, and identifies the challenges of deep neural network repair. In addition, it investigates the existing neural network repair strategies in detail and compares the internal relationships and differences among these strategies. Moreover, the study explores and sorts out the evaluation metrics and benchmark tests commonly used in neural network repair strategies. Finally, it forecasts the feasible research directions that should be paid attention to in the future development of neural network repair strategies.
    Available online:  December 06, 2023 , DOI: 10.13328/j.cnki.jos.007033
    Abstract:
    In natural scenes, logos such as trademarks and traffic signs are susceptible to shooting angle, carrier deformation, and scale changes, which reduces logo detection accuracy. Thus, this study proposes an attention guided logo detection and recognition network (AGLDN) to jointly optimize the model robustness for multi-scale and complex deformation. First, a logo synthesis dataset is established by image collection and mask generation of logo templates, image selection of logo background, and logo image generation. Then, based on RetinaNet and FPN, multi-scale features are extracted and high-level semantic feature mapping is formed. Finally, the attention mechanism guided network is employed to focus on the logo area, and the influence of logo deformation on feature robustness is suppressed to improve logo detection and recognition. Experimental results show that the proposed method can reduce the influence of scale changes and non-rigid deformation, and improve detection accuracy.
    Available online:  December 06, 2023 , DOI: 10.13328/j.cnki.jos.007034
    Abstract:
    Unlimited by the state and space, the formal verification technology based on mechanized theorem proof is an important method to ensure software correctness and avoid serious loss from potential software bugs. LLRB (left-leaning red-black trees) is a variant of binary search trees, and its structure has an additional left-leaning constraint over the traditional red-black trees. During verification, conventional proof strategies cannot be employed, which requires more manual intervention and effort. Thus, the LLRB correctness verification is widely acknowledged as a challenging problem. To this end, based on the Isabelle verification framework for the binary search tree algorithm, this study refines the additional property part of the framework and provides a concrete verification scheme. The LLRB insertion and deletion operations are functionally modeled in Isabelle, with modular treatment of the LLRB invariants. Subsequently, the function correctness is verified. This is the first mechanized verification of functional LLRB insertion and deletion algorithms in Isabelle. Compared to the current Dafny verification of the LLRB algorithm, the theorem number is reduced from 158 to 84, and it is unnecessary for constructing intermediate assertions, which alleviates the verification burden. Meanwhile, this study provides references for functional modeling and verification of complex tree structure algorithms.
    Available online:  December 06, 2023 , DOI: 10.13328/j.cnki.jos.007035
    Abstract:
    Detecting latent topics in social media texts is a meaningful task, and the short and informal posts will cause serious data sparsity. Additionally, models based on variational auto-encoders (VAEs) ignore the social relationships among users during topic inference and VAE assumes that each input data point is independent. This results in the lack of correlation information between the inferred latent topic variables and incoherent topics. Social network structure information can not only provide clues for aggregating contextual messages but also indicate topic correlation among users. Therefore, this study proposes to utilize the microblog topic model based on message passing and graph prior distribution. This model can encode richer context information by graph convolution network (GCN) and integrate the interactive relationship of users by graph prior distribution during VAE topic inference to better understand the complex correlation among multiple data points and mine social media topic information. The experiments on three actual datasets validate the effectiveness of the proposed model.
    Available online:  November 29, 2023 , DOI: 10.13328/j.cnki.jos.007003
    Abstract:
    In the field of cyber security, the mendacious domains generated by the domain generation algorithm (DGA) are called DGA domains. Similar to real domains, they are usually a random combination of characters or numbers, which makes DGA domains highly camouflaged. Hackers take advantage of the disguised nature of DGA domains to carry out cyber attacks, so as to bypass security detection. How to effectively detect DGA domains has become a research hotspot. Traditional statistical machine learning detection methods require the manual construction of domain feature sets. However, the quality of domain features constructed manually or semi-automatically varies, which affects the accuracy of detection. In view of the powerful automatic feature extraction and representation capability of deep neural networks, a DGA domain detection method based on multi-view contrastive learning (MCL4DGA) is proposed. Different from existing methods, it incorporates attentional neural networks, convolutional neural networks, and recurrent neural networks to effectively capture global, local, and bidirectional multi-view feature dependencies of domain sequences. Besides, the self-supervision signals derived by contrastive learning can enhance the expressiveness between multi-view feature learning encoders and thus improve the accuracy of detection. The effectiveness of the proposed method is verified by experimental comparison with current methods on a real dataset.
    Available online:  November 29, 2023 , DOI: 10.13328/j.cnki.jos.007005
    Abstract:
    Nowadays, deep neural network (DNN) is widely used in autonomous driving, medical diagnosis, speech recognition, face recognition, and other safety-critical fields. Therefore, DNN testing is critical to ensure the quality of DNN. However, labeling test cases to judge whether the DNN model predictions are correct is costly. Therefore, selecting test cases that reveal incorrect behavior of DNN models and labeling them earlier can help developers debug DNN models as soon as possible, thus improving the efficiency of DNN testing and ensuring the quality of DNN models. This study proposes a test case selection method based on data mutation, namely DMS. In this method, a data mutation operator is designed and implemented to generate a mutation model to simulate model defects and capture the dynamic pattern of test case bug-revealing, so as to evaluate the ability of test case bug-revealing. Experiments are conducted on the combination of 25 deep learning test sets and models. The results show that DMS is significantly better than the existing test case selection methods in terms of both the proportion of bug-revealing and the diversity of bug-revealing directions in the selected samples. Specifically, taking the original test set as the candidate set, DMS can filter out 53.85%–99.22% of all bug-revealing test cases when selecting 10% of the test cases. Moreover, when 5% of the test cases are selected, the selected cases by DMS can cover almost all bug-revealing directions. Compared with the eight comparison methods, DMS finds 12.38%–71.81% more bug-revealing cases on average, which proves the significant effectiveness of DMS in the task of test case selection.
    Available online:  November 29, 2023 , DOI: 10.13328/j.cnki.jos.007007
    Abstract:
    In current real life where data sources are diverse, and manual labeling is difficult, semi-supervised multi-view classification algorithms have important research significance in various fields. In recent years, graph neural networks-based semi-supervised multi-view classification algorithms have achieved great progress. However, most of the existing graph neural networks carry out multi-view information fusion only in the classification stage, while neglecting the multi-view information interaction between the same sample during the training stage. To solve the above issue, this study proposes a model for semi-supervised classification, named multi-view interaction graph convolutional network (MIGCN). The Transformer Encoder module is introduced to the graph convolution layer trained on different views, which aims to adaptively acquire complementary information between different views for the same sample during the training stage. More importantly, the study introduces the consistency constraint loss to make the similar relationship of the final feature expressions of different views as similar as possible. This operation can make graph convolutional neural networks during the classification stage better utilize the consistency and complementarity information between different views reasonably, and then it can further improve the robust performance of the multi-view fusion feature. Extensive experiments on several real-world multi-view datasets demonstrate that compared with the graph-based semi-supervised multi-view classification model, MIGCN can better learn the essential features of multi-view data, thereby improving the accuracy of semi-supervised multi-view classification.
    Available online:  November 22, 2023 , DOI: 10.13328/j.cnki.jos.006968
    Abstract:
    Apache Flink is one of the most popular stream computing platforms and has many applications in industry. Complex event processing (CEP) is one of the important usage scenarios of stream computation. Apache Flink defines and implements a language for complex event processing (referred to as FlinkCEP). FlinkCEP includes rich syntactic features, not only the usual features of filtering, connecting, and looping, but also the advanced features of iterative conditions and after-match skip strategies. The semantics of FlinkCEP is complex, no language specification of FlinkCEP defines its semantics precisely, so it can only be understood by checking the implementation details. This motivates the definition of formal semantics for FlinkCEP so that the developers could understand its semantics precisely. This study proposes an automaton model called data stream transducers (DST) for FlinkCEP, where the data variables are applied to capture the iterative conditions, the data stream variables are adopted to store the outputs, and transition priorities are introduced to capture the after-match skip strategies. DST is leveraged to define the formal semantics of FlinkCEP and design the query evaluation algorithms based on the formal semantics. Moreover, a prototype of the CEP engine is implemented. Finally, test case sets are generated, which cover the syntactic features of FlinkCEP more comprehensively. They are utilized to conduct comparison experiments against the actual results of FlinkCEP on the Flink platform. The experimental results show that the proposed formal semantics of FlinkCEP conforms to the actual semantics of FlinkCEP in the vast majority of the cases. Furthermore, the inconsistencies between the formal and the actual semantics are analyzed and it is discovered that the Flink implementation of FlinkCEP may not deal with the group patterns correctly.
    Available online:  November 15, 2023 , DOI: 10.13328/j.cnki.jos.007002
    Abstract:
    Temporal knowledge graph reasoning aims to fill in missing links or facts in knowledge graphs, where each fact is associated with a specific timestamp. The dynamic variational framework based on variational autoencoder is particularly effective for this task. By jointly modeling entities and relations using Gaussian distributions, this method not only offers high interpretability but also solves complex probability distribution problems. However, traditional variational autoencoder-based methods often suffer from overfitting during training, which limits their ability to accurately capture the semantic evolution of entities over time. To address this challenge, this study proposes a new temporal knowledge graph reasoning model based on a diffusion probability distribution approach. Specifically, the model uses a bi-directional iterative process to divide the entity semantic modeling process into multiple sub-modules. Each sub-module uses a forward noisy transformation and a backward Gaussian sampling to model a small-scale evolution process of entity semantics. Compared with the variational autoencoder-based method, this study can obtain more accurate modeling by learning the dynamic representation of entity semantics in the metric space over time through the joint modeling of multiple submodules. Compared with the variational autoencoder-based method, the model improves by 4.18% and 1.87% on the Yago11k dataset and Wikidata12k dataset for evaluating the MRR of the indicator and by 1.63% and 2.48% on the ICEWS14 and ICEWS05-15 datasets, respectively.
    Available online:  November 15, 2023 , DOI: 10.13328/j.cnki.jos.006993
    Abstract:
    Text-based person retrieval is a developing downstream task of cross-modal retrieval and derives from conventional person re-identification, which plays a vital role in public safety and person search. In view of the problem of lacking query images in traditional person re-identification, the main challenge of this task is that it combines two different modalities and requires that the model have the capability of learning both image content and textual semantics. To narrow the semantic gap between pedestrian images and text descriptions, the traditional methods usually split image features and text features mechanically and only focus on cross-modal alignment, which ignores the potential relations between the person image and description and leads to inaccurate cross-modal alignment. To address the above issues, this study proposes a novel relation alignment-based cross-modal person retrieval network. First, the attention mechanism is used to construct the self-attention matrix and the cross-modal attention matrix, in which the attention matrix is regarded as the distribution of response values between different feature sequences. Then, two different matrix construction methods are used to reconstruct the intra-modal attention matrix and the cross-modal attention matrix respectively. Among them, the element-by-element reconstruction of the intra-modal attention matrix can well excavate the potential relationships of intra-modal. Moreover, by taking the cross-modal information as a bridge, the holistic reconstruction of the cross-modal attention matrix can fully excavate the potential information from different modalities and narrow the semantic gap. Finally, the model is jointly trained with a cross-modal projection matching loss and a KL divergence loss, which helps achieve the mutual promotion between modalities. Quantitative and qualitative results on a public text-based person search dataset (CUHK-PEDES) demonstrate that the proposed method performs favorably against state-of-the-art text-based person search methods.
    Available online:  November 15, 2023 , DOI: 10.13328/j.cnki.jos.006997
    Abstract:
    Safety-critical embedded software usually has rigorous time constraints over the runtime behaviors, raising additional requirements for enforcing security properties. To protect the information flow security of embedded software and mitigate the limitations of the existing simplex verification approaches and their potential false positives, this study first proposes a new timed noninterference property, i.e., timed SIR-NNI, based on the security requirement of a realistic scenario. Then the study presents an information flow security verification approach that unifies the verification of multiple timed noninterference properties, i.e., timed BNNI, timed BSNNI, and timed SIR-NNI. Based on the different timed noninterference requirements, the approach constructs the refined automata and test automata from the timed automata under verification. The study uses UPPAAL’s reachability analysis to implement the refinement relation check and the security verification. The verification tool, i.e., TINIVER, extracts timed automata from SysML’s sequential diagrams or C++ source code to conduct the verification procedure. The verification results of TINIVER on existing timed automata models and security properties justify the usability of the proposed approach. The security verifications on the typical flight-mode switch models of the UAV flight control systems ArduPilot and PX4 demonstrate the practicability and scalability of the proposed approach. Besides, the approach is effective in mitigating the false positives of a state-of-the-art verification approach.
    Available online:  November 15, 2023 , DOI: 10.13328/j.cnki.jos.006995
    Abstract:
    Multi-view clustering has attracted more and more attention in the fields of image processing, data mining, and machine learning. Existing multi-view clustering algorithms have two shortcomings. One is that in the process of graph construction, only the pairwise relationship between each view data is considered to generate an affinity matrix, which lacks the characterization of neighborhood relationships; the second is that existing methods separate the process of multi-view information fusion and clustering, thereby reducing the clustering performance of the algorithm. Therefore, this study proposes a more accurate and robust joint spectral embedding multi-view clustering algorithm based on bipartite graphs. Firstly, based on the multi-view subspace clustering idea,bipartite graphs are constructed, and similar graphs are generated.Then the spectral embedding matrix of similar graphs is used to perform graph fusion. Secondly, by considering the importance of each view during the fusion process, weight constraints are applied, and an indicator matrix is introduced to obtain the final clustering result. A model is proposed to optimize the bipartite graph, embedding matrix, and clustering indicator matrix within a single framework. In addition, a fast optimization strategy for solving the model is provided, which decomposes the optimization problem into small module subproblems and efficiently solves them through iterative steps. The proposed algorithm and existing multi-view clustering algorithms have been experimentally analyzed on real data sets. Experimental results show that the proposed algorithm is more effective and robust in dealing with multi-view clustering problems compared with existing methods.
    Available online:  November 08, 2023 , DOI: 10.13328/j.cnki.jos.006994
    Abstract:
    With the development of mobile services’ computing and sensing abilities, spatial crowdsourcing, which is based on location information, comes into being. There are many challenges to improving the performance of task assignments, one of which is how to assign workers the tasks that they are interested in. Existing research methods only focus on workers’ temporal preference but ignore the impact of spatial factors on workers’ preference, and they only focus on long-term preference but ignore workers’ short-term preference and face the problem of inaccurate predictions caused by sparse historical data. This study analyzes the task assignment problem based on long-term and short-term spatio-temporal preference. By comprehensively considering workers’ preferences from both long-term and short-term perspectives, as well as temporal and spatial dimensions, the quality of task assignment is improved in task assignment success rate and completion efficiency. In order to improve the accuracy of spatio-temporal preference prediction, the study proposes a sliced imputation-based context-aware tensor decomposition algorithm (SICTD) to reduce the proportion of missing values in preference tensors and calculates short-term spatio-temporal preference through the ST-HITS algorithm and short-term active range of workers under spatio-temporal constraints. In order to maximize the total task reward and the workers’ average preference for completing tasks, the study designs a spatio-temporal preference-aware greedy and Kuhn-Munkres (KM) algorithm to optimize the results of task assignment. Extensive experimental results on real datasets show the effectiveness of the long- and short-term spatio-temporal preference-aware task assignment framework. Compared with baselines, the RMSE prediction error of the proposed SICTD for temporal and spatial preferences is decreased by 22.55% and 24.17%, respectively. In terms of task assignment, the proposed preference-aware KM algorithm significantly outperforms the baseline algorithms, with the workers’ total reward and average preference for completing tasks averagely increased by 40.86% and 22.40%, respectively.
    Available online:  November 08, 2023 , DOI: 10.13328/j.cnki.jos.006988
    Abstract:
    Aiming at the growing threat of distributed denial of service (DDoS) attacks under the rapid popularization of IPv6, this study proposes a two-stage DDoS defense mechanism, including a pre-detection stage to real-time monitor the early appearance of DDoS attacks and a deep-detection stage to accurately filter DDoS traffic after an alarm. First, the IPv6 traffic format is analyzed and the hexadecimal header fields are extracted from PCAP capture files as detection elements. Then, in the pre-detection stage, a lightweight binary convolutional neural network (BCNN) model is introduced and a two-dimensional traffic matrix is designed as model input, which can sensitively perceive the malicious situation caused by mixed DDoS traffic in the network as evidence of DDoS occurrence. After the alarm, the deep-detection stage will intervene with a one-dimensional convolutional neural network (1DCNN) model, which can specifically distinguish the mixed DDoS packets with one-dimensional packet vector as input to issue blocking policies. In the experiment, an IPv6-LAN topology is built and the proposed pure IPv6-DDoS traffic is generated by replaying the CIC-DDoS2019 public set through NAT 4to6. The results show that the proposed mechanism can effectively improve response speed, detection accuracy, and traffic filtering efficiency in DDoS defense. When DDoS traffic only takes 6% and 10% of the total network, BCNN can perceive the occurrence of DDoS with 90.9% and 96.4% accuracy, and the 1DCNN model can distinguish mixed DDoS packets with 99.4% accuracy at the same time.
    Available online:  November 08, 2023 , DOI: 10.13328/j.cnki.jos.006989
    Abstract:
    The smart contract is a decentralized application widely deployed on the blockchain platform, e.g., Ethereum. Due to the economic attributes, the vulnerabilities in smart contracts can potentially cause huge financial losses and destroy the stable ecology of Ethereum. Thus, it is crucial to detect the vulnerabilities in smart contracts before they are deployed to Ethereum. The existing smart contract vulnerability detection methods (e.g., Oyente and Secure) are mostly based on heuristic algorithms. The reusability of these methods is weak in different application scenarios. In addition, they are time-consuming and with low accuracy. In order to improve the effectiveness of vulnerability detection, this study proposes Scruple: a smart contract timestamp vulnerability detection approach based on learning data-flow path. It first obtains all possible propagation chains of timestamp vulnerabilities, then refines the propagation chains, uses a graph pre-training model to learn the relationship in the propagation chains, and finally detects whether a smart contract has timestamp vulnerabilities using the learned model. Compared with the existing detection methods, Scruple has a stronger vulnerability capture ability and generalization ability. Meanwhile, learning the propagation chain is not only well-directed but also can avoid an unnecessarily deep hierarchy of programs for the convergence of vulnerabilities. To verify the effectiveness of Scruple, this study uses real-world distinct smart contracts to compare Scruple with 13 state-of-the-art smart contract vulnerability detection methods. The experimental results show that Scruple can achieve 96% accuracy, 90% recall, and 93% F1-score in detecting timestamp vulnerabilities. In other words, the average improvement of Scruple over 13 methods using the three metrics is 59%, 46%, and 57% respectively. It means that Scruple has substantially improved in detecting timestamp vulnerabilities.
    Available online:  November 08, 2023 , DOI: 10.13328/j.cnki.jos.007001
    Abstract:
    As an important production factor, data need to be exchanged between different entities to create value. In this process, data integrity needs to be ensured, or in other words, data cannot be tampered without authorization, or otherwise, it may lead to extremely serious consequences. The existing work realizes data evidence preservation by combining distributed ledger with data encryption and verification technology to ensure the integrity of data to be exchanged in transmission, storage, and other related data processing phrases. However, such work is difficult to confirm the integrity of the data provided by the data supplier. Once the data supplier provides forged data, all subsequent integrity assurance will be meaningless. Therefore, this study proposes a method for verifying the integrity of data services based on remote attestation. By using the trusted execution environment as the trust anchor, this method can measure and verify the integrity of the static code, execution process, and execution result of a specific data service. It also optimizes the integrity verification of a specific data service through program slicing, thus extending the scope of data integrity assurance to the time point when the data supplier provides data. A series of experiments are carried out on 25 data services of three real Java information systems to validate the proposed method.
    Available online:  November 01, 2023 , DOI: 10.13328/j.cnki.jos.006986
    Abstract:
    Distributed storage system is receiving more and more attention in mobile network scenarios. Data placement, a key technology of distributed storage, is crucial to improve the success rate of distributed data storage. However, due to unstable wireless signals and fluctuating network bandwidth in mobile environments, the traditional data placement strategies, such as random placement strategy and storage-aware placement strategy, have low success rates of data transmission because both of them do not take network bandwidth into account during data placement. To solve the problem faced by mobile distributed storage systems, this study proposes a bandwidth-aware adaptive data placement strategy (BADP). The main breakthrough is that BADP adopts the group mobility model to sense the network bandwidth of nodes and takes the network bandwidth as an important factor for data placement, thus selecting nodes with good performance to achieve adaptive data placement and improve the success of data transmission. BADP consists of three design features: (1) adopting the group mobility model to sense the network bandwidth of nodes; (2) managing node information in groups to reduce communication overhead, and taking advantage of the heap to build a node selection tree; (3) selecting nodes with good performance using adaptive data placement to improve the success rate of data transmission. Experiments show that when the network changes dynamically, BADP gains at least 30.6% and 34.6% improvements in the success rate of data transmission compared with random placement strategy and storage-aware placement strategy. At the same time, it consistently keeps communication overhead low.
    Available online:  November 01, 2023 , DOI: 10.13328/j.cnki.jos.006987
    Abstract:
    Internet users need to resolve through DNS before accessing network applications. DNS security is the first portal to ensure the normal operation of the network. If the security of DNS cannot be effectively guaranteed, even if the level of security protection measures of other network systems is high, attackers can attack the DNS system to make the network unusable. At present, DNS malignant events still have an upward trend, and the development of DNS attack detection and defense technology still cannot meet practical needs. From the perspective of recursive servers that directly serve users’ DNS requests, this study comprehensively summarizes the security problems faced in the DNS process through two classification methods, including various security events caused by attacks or system vulnerabilities, different detection methods for various security events, and various defense and protection technologies. When summarizing various security events, detection and defense protection technologies, the study analyzes the characteristics of the corresponding typical methods and prospects for the future research direction of the DNS security field.
    Available online:  October 25, 2023 , DOI: 10.13328/j.cnki.jos.006992
    Abstract:
    GitHub is a well-known open-source software development community that supports developers using the issue tracking system in each open-source project on GitHub to address issues. During the discussion of an issue about a defect, the developer may point out issues from other projects correlated to the defect, which are called cross-project issues, so as to provide reference information for fixing the defect. However, there are more than 200 million open-source projects and 1.2 billion issues on the GitHub platform, making it time-consuming to identify and acquire cross-project issues manually. This study presents a cross-project issue recommendation method CPIRecom for open-source software defects. This study builds a pre-selection set by filtering issues based on the number of historical issue pairs and the time interval for reporting issues. Then, the study also proposes an accurate recommendation model, which extracts textual features based on the pre-trained model of BERT, analyzes features of projects, calculates the relevant probability between defects and issues from the pre-selection set based on a random forest classifier, and obtains the recommendation list according to the ranking. This study simulates the application of CPIRecom method on GitHub platform. The mean reciprocal rank of CPIRecom method reaches 0.603, and the Recall@5 reaches 0.715 on the simulative test set.
    Available online:  October 25, 2023 , DOI: 10.13328/j.cnki.jos.007000
    Abstract:
    Fuzzy C-means (FCM) clustering algorithm has become one of the commonly used image segmentation techniques with its low learning cost and algorithm overhead. However, the conventional FCM clustering algorithm is sensitive to noise in images. Recently, many of improved FCM algorithms have been proposed to improve the noise robustness of the conventional FCM clustering algorithm, but often at a cost of detail loss on the image. This study presents an improved FCM clustering algorithm based on Lie group theory and applies it to image segmentation. The proposed algorithm constructs matrix Lie group features for the pixels of an image, which summarizes the low-level image features of each pixel and its relationship with other pixels in the neighborhood window. By doing this, the proposed method transforms the clustering problem of measuring the Euclidean distances between pixels into calculating the geodesic distances between Lie group features of pixels on the Lie group manifold. Aiming at the problem of updating the clustering center and fuzzy membership matrix on the Lie group manifold, the proposed method uses an adaptive fuzzy weighted objective function, which improves the generalization and stability of the algorithm. The effectiveness of the proposed method is verified by comparing with conventional FCM and several classic improved algorithms on the experiments of three types of medical images.
    Available online:  October 25, 2023 , DOI: 10.13328/j.cnki.jos.006966
    Abstract:
    The current authentication protocol based on username and password has been difficult to meet the increasing security requirements. Specifically, users choose different passwords to access different online services, which greatly increases the user’s memory burden. In addition, password authentication has low security and faces many known attacks. To solve such problems, this study proposes a user-centric two-factor authentication key agreement protocol UC-2FAKA based on the Pointcheval-Sanders signature. Firstly, to prevent the leakage of authentication factors, passwords, and biometric two-factor credentials are constructed based on the Pointcheval-Sanders signature. The identity is authenticated to the service provider (SP) in a zero-knowledge proof manner. Secondly, using a user-centric single sign-on (SSO) architecture, users can request identity credentials by registering with an identity provider (IDP) to log in different SPs to avoid IDP or SP tracking or linking users. Thirdly, the Diffie-Hellman key exchange is used to authenticate SP identities and negotiate communication keys to ensure subsequent communication security. Finally, comprehensive security analysis and performance comparison of the proposed protocol are carried out. The results show that the proposed protocol can resist various known attacks, and the proposed protocol performs better in communication overhead and computational overhead.
    Available online:  October 25, 2023 , DOI: 10.13328/j.cnki.jos.006970
    Abstract:
    Existing hypergraph network representation methods need to analyze the full batch nodes and hyperedges to recursively extend the neighbors across layers, which brings huge computational costs and leads to lower generalization accuracy due to over-expansion. To solve this problem, this study proposes a hypergraph network representation method based on importance sampling. First, the method treats nodes and hyperedges as two sets of independent identically distributed samples that satisfy specific probability measures and interprets the structural feature interactions of the hypergraph in an integral form. Second, it designs a neighbor importance sampling rule with learnable parameters and calculates sampling probabilities based on the physical relations and features of nodes and hyperedges. A fixed number of objects are recursively acquired layer by layer to construct a smaller sampled adjacency matrix. Finally, the spatial features of the entire hypergraph are approximated using Monte Carlo methods. In addition, with the advantage of physically informed neural networks, the sampling variance that needs to be reduced is added to the hypergraph neural network as a physical constraint to obtain sampling rules with better generalization capability. Extensive experiments on multiple datasets show that the method proposed in this study can obtain more accurate hypergraph representation results with a faster convergence rate.
    Available online:  October 18, 2023 , DOI: 10.13328/j.cnki.jos.006971
    Abstract:
    Fast vulnerability root cause analysis is crucial for patching vulnerabilities and has always been a hotspot in academia and industry. The existing vulnerability root cause analysis methods based on the statistical feature analysis of a large number of test sample execution records have problems such as random noise and missing important logical correlation instructions. According to the test set measurement in this study, the proportion of random noise in the existing statistical methods reaches more than 61%. To solve the above problems, this study proposes a vulnerability root cause analysis method based on the local path graph, which extracts vulnerability-related information such as the inter-function call graph and intra-function control flow transfer graph from the execution paths. The local path graph is utilized for eliminating irrelevant instruction (i.e., noise instructions) elimination, constructing the logic relations for vulnerability root cause relevant points, and adding missing critical instructions. An automated root cause analysis system for binary software, LGBRoot, has been implemented. The effectiveness of the system has been evaluated on a dataset of 20 public CVE memory corruption vulnerabilities. The average time for single-sample root cause analysis is 12.4 seconds. The experimental data show that the system can automatically eliminate 56.2% of noise instructions, and mend as well as visualize the 20 logical structures of vulnerability root cause relevant points, speeding up the vulnerability analysis of analysts.
    Available online:  October 18, 2023 , DOI: 10.13328/j.cnki.jos.006991
    Abstract:
    Conformance checking is one of the important scenarios in the field of process mining, and its goal is to determine whether the actual running business behavior is consistent with the desired behavior and then provide a basis for business process management decisions. Traditional methods of conformance checking face the problems of too many metrics and low efficiency. In addition, the existing methods for checking the conformance between process text and process model rely heavily on expert-defined knowledge. Therefore, this study proposes a process text-oriented conformance checking method. Firstly, the study generates graph traces based on the execution semantics of the process model and obtains the structural features by the word vector model from graph traces. At the same time, Hoffman trees are introduced to reduce the computational effort. Then, the word vector representation of the process text and the activities is performed. The study also uses the Siamese mechanism to improve training efficiency. Finally, all the features of the text and the model are fused, and then the consistency score between the text and the model is predicted using a fully connected layer. Experiments show that the average absolute error value of the method in this study is two percentage points lower than that of existing methods.
    Available online:  October 18, 2023 , DOI: 10.13328/j.cnki.jos.006976
    Abstract:
    Disassembly of binary codes is hard but necessary for improving the security of binary software. One of the major reasons for the difficult binary disassembly is that the compilers create many jump tables in the binary code for efficiency. In order to solve the targets of the jump table, mainstream disassembly tools use various strategies. However, the details of the implementation of these strategies and their effectiveness are not well studied. To help researchers to well understand the algorithm implementation and performance of disassembly tools, this study first systematically summarizes the strategies used by disassembly tools to solve jump tables; then the study builds an automatic framework for testing jump tables, based on which a large-scale testsuite on jump tables (2410455 jump tables) can be generated. Lastly, this study evaluates the performance of the disassembly tools in solving jump tables on the testsuite and manually analyzes the errors introduced by each strategy of the disassembly tools. In addition, this study finds six bugs in the implementation of the disassembly tools benefiting from the systematic summary of the implementation of the disassembly tool algorithm.
    Available online:  October 18, 2023 , DOI: 10.13328/j.cnki.jos.006977
    Abstract:
    The database performance is affected by the database configuration parameters. The quality of parameter settings will directly affect the performance of the database. Therefore, the quality of the database parameter tuning method is important. However, traditional database parameter tuning methods have many limitations, such as the inability to make full use of historical parameter tuning data, wasting time and human resources, and so on. The counterfactual interpretation methods aim to change the original prediction to the expected prediction by making small modifications to the original data. The method plays a role of suggestion, and this can be used for database configuration optimization, namely, making small modifications to the database configuration to optimize the performance of the database. Therefore, this study proposes a counterfactual interpretation method for database configuration optimization. For databases with poor performance under specific load conditions, this method can modify the database configuration and generate corresponding database configuration counterfactuals to optimize database performance. This study conducts two kinds of experiments to evaluate the counterfactual interpretation method and verify the effect of optimizing the database. The experimental results show that the counterfactual interpretation methods proposed in this study are better than other typical counterfactual interpretation methods in terms of various evaluation indicators, and the generated counterfactuals can effectively improve database performance.
    Available online:  October 11, 2023 , DOI: 10.13328/j.cnki.jos.006978
    Abstract:
    There are a lot of two-party threshold schemes for SM2 digital signatures proposed in recent years, which can significantly enhance the security of private keys for SM2 digital signatures. According to different methods of key splitting, public schemes can be divided into two types: multiplicative key splitting and additive key splitting. Further, these public schemes can be subdivided into various two-party threshold schemes according to different constructions of the signature random number. This study proposes the framework of two-party threshold schemes for SM2 digital signature, which provides a safe basic calculation process of two-party threshold schemes and introduces the signature random number that can be constructed variously. With the proposed framework and various constructions of the random number, this study achieves the instantiation of the framework, obtaining a variety of two-party threshold schemes for SM2 digital signature. The instantiation includes 23 known two-party threshold schemes, as well as a variety of new schemes.
    Available online:  October 11, 2023 , DOI: 10.13328/j.cnki.jos.006990
    Abstract:
    The informationization 3.0 represented by deep mining and fusion applications of big data is starting, and the software in the traditional static environment is evolving into complex software in the human-cyber-physical ternary environment which is open and dynamic. How to realize the trusted, manageable, and controllable data interconnection on the untrusted and uncontrollable Internet is an urgent problem to be solved. The Internet of Data technical system represented by digital object architecture and identi?er resolution technology provides a feasible solution for these challenges. In order to solve the problems such as low transmission efficiency, high coordination cost, and security management issues in the process of data sharing on the Internet, this study proposes identi?er resolution standard specifications for human-cyber-physical ternary environments. Moreover, to meet the demands that data resources owned by different entities need to be discoverable, accessible, understandable, trustworthy, and interoperable in the human-cyber-physical ternary environment, this study designs the identi?er resolution protocol and implements the identi?er/resolution prototype system for human-cyber-physical ternary environments. At last, this study tests the performance of the prototype system, and the validity of the system is verified by applying it to application scenarios.
    Available online:  October 11, 2023 , DOI: 10.13328/j.cnki.jos.006982
    Abstract:
    Static analysis tools often suffer from high false positive rates of reported alarms, despite their ability to aid developers in detecting potential defects early in the software development life cycle. To improve the availability of these tools, many automated warning identification techniques have been proposed to assist developers in classifying false positive alarms. However, existing approaches mainly focus on using hand-engineered features or statement-level abstract syntax tree token sequences to represent the defective code, failing to capture semantics from the reported alarms. To overcome the limitations of traditional approaches, this study employs deep neural networks with powerful feature extraction and representation abilities to generate code semantics from control flow graph paths for warning identification. The control flow graph abstractly represents the execution process of a given program. Thus, the generated path sequences of the control flow graph can guide the deep neural networks to learn semantic information about the potential defect more accurately. In this study, the pre-trained language model is fine-tuned to encode the path sequences and capture the semantic representations for model building. Finally, the study conducts extensive experiments on eight open-source projects to verify the effectiveness of the proposed approach by comparing it with the state-of-the-art baselines.
    Available online:  October 11, 2023 , DOI: 10.13328/j.cnki.jos.006965
    Abstract:
    The major challenges traditional operating system (OS) design faces are the increasing number, diversity, and distribution scope of resources to be managed and the frequent changes in system state. However, the structures of existing OSs have become the biggest obstacle to solving the above problems as (1) tight coupling and centralization of the structure lead to poor flexibility and scalability and separate OS ecology; (2) contradiction between various capabilities, e.g., security and performance, due to the unitary isolation mechanism such as kernel-user isolation. Therefore, this study combines the hierarchical software bus (softbus) principles with isolation mechanisms to organize the OS and proposes a new OS model termed Yggdrasil. Yggdrasil decomposes an OS into component nodes connected by softbuses, whose communications are standardized to message passing via the softbus. To support the division of isolated states such as supervisor mode and different software hierarchies, Yggdrasil introduces bridge nodes for cascading and controlled communication between softbuses, and enhances the logical representation capability and scalability of OS through self-similar topology. Additionally, the simplicity and hierarchy of the softbus help to achieve decentralization. To verify the feasibility of Yggdrasil, the study builds hierarchical softbus model for OS (HiBuOS) and demonstrates the feasibility of developing a new OS based on Yggdrasil’s ideas through three specific designs: (1) designing and planning a hierarchical softbus structure according to the scale and requirements of the target operating system; (2) selecting specific isolation and communication mechanisms to instantiate bridge nodes and softbuses; (3) realizing OS services based on the hierarchical softbus style. Finally, the evaluation shows that HiBuOS has notable potential and advantages to enhance system scalability, security, performance, and ecological development without significant performance loss.
    Available online:  October 11, 2023 , DOI: 10.13328/j.cnki.jos.006974
    Abstract:
    The functions are the smallest naming unit of aggregation behavior in most traditional programming languages. The readability of function names plays a vital role in programmers’ understanding of program functions and the interaction between different modules. Low-quality function names may confuse developers, increase the smell in the code, and then result in software defects caused by API misuse. Therefore, a method of function name consistency checking and recommendation based on deep learning is proposed, which is named DMName. Firstly, for the given source code of the target function, the internal context, interactive context, sibling context, and closed context are constructed respectively, and the context information tag sequence is obtained after merging them. Then the tag sequence is converted into the context representation vector sequence by using the word embedding technology FastText and input into the encoder of the seq2seq model. The copy mechanism and coverage mechanism are utilized to solve the OOV problem and the repeated decoding problem, respectively. Finally, the vector sequence of the prediction result of the target function name is output, and the consistency of the function name is predicted with the help of the two-channel CNN classifier. If the function name is inconsistent, the recommended function name can be obtained by direct mapping according to the vector space similarity matching. The experimental results show that the F1-measure of DMName in function name consistency check and recommendation reaches 82.65% and 73.31% respectively, which is 2.01% and 2.96% higher than the current optimal DeepName. Finally, the DMName is verified in the large-scale open-source project, namely lancia in GitHub. A total of 16 function name inconsistency problems are found, and reasonable name recommendations are made, which further confirms the effectiveness of DMName.
    Available online:  September 27, 2023 , DOI: 10.13328/j.cnki.jos.006967
    Abstract:
    With the rapid development of neural network technology, neural networks have been widely applied in safety-critical fields such as autonomous driving, intelligent manufacturing, and medical diagnosis. Thus, it is crucial to ensure the trustworthiness of neural networks. However, due to the vulnerability of neural networks, slight perturbation often leads to wrong results. Therefore, it is vital to use formal verification methods to ensure the safety and trustworthiness of neural networks. Current verification methods for neural networks are mainly concerned with the accuracy of the analysis, while apt to ignore operational efficiency. When verifying the safety properties of complex networks, the large-scale state space may lead to problems such as infeasibility or unsolvability. To reduce the state space of neural networks and improve the verification efficiency, this study presents a formal verification method for neural networks based on divide and conquer considering over-approximation errors. The method uses the reachability analysis technique to calculate the upper and lower bounds of nonlinear nodes and uses an improved symbolic linear relaxation method to reduce over-approximation errors during the boundary calculation of nonlinear nodes. The constraints of nodes are refined by calculating the direct and indirect effects of their over-approximation errors. Thereby, the original verification problem is split into a set of sub-problems whose mixed integer linear programming (MILP) formulation has a smaller number of constraints. The method is implemented as a tool named NNVerifier, whose properties are verified and evaluated through experiments on four ReLU-based fully-connected benchmark networks trained on three classic datasets. The experimental results show that the verification efficiency of the NNVerifier is 37.18% higher than that of the existing complete verification methods.
    Available online:  September 27, 2023 , DOI: 10.13328/j.cnki.jos.006957
    Abstract:
    As one of the ten block cipher algorithms selected for the second round of the 2018 National Cryptographic Algorithm Design Contest, Feistel-based block cipher (FBC) is an efficient and lightweight block cipher algorithm with a four-branch and two-fold Feistel structure. In this study, the FBC algorithm is abstracted as the FBC model, and the pseudorandomness and super-pseudorandomness of the model are studied. It is assumed that the FBC round functions are independent random functions, and a method to find the minimal number of FBC rounds is provided, which will keep FBC indistinguishable from a random permutation. Finally, the study comes to the conclusion that under the chosen-plaintext attack, four rounds of FBC are indistinguishable from random permutation, so the model has pseudorandomness; under the adaptive chosen-plaintext and ciphertext attack, five rounds of FBC are indistinguishable from random permutation, so the model has super-pseudorandomness.
    Available online:  September 27, 2023 , DOI: 10.13328/j.cnki.jos.006958
    Abstract:
    Few-shot learning aims at simulating the ability of human beings to quickly learn new things with only few samples, which is of great significance for deep learning tasks when samples are limited. However, in many practical tasks with limited computing resources, the model scale may still limit a wider application of few-shot learning. This study presents a realistic requirement for lightweight tasks for few-shot learning. As a widely used auxiliary strategy in deep learning, knowledge distillation transfers knowledge between models by using additional supervised information, which has practical application in both improving model accuracy and reducing model scale. This study first verifies the effectiveness of the knowledge distillation strategy in model lightweight for few-shot learning. Then according to the characteristics of few-shot learning, two new distillation methods for few-shot learning are designed: (1) distillation based on image local features; (2) distillation based on auxiliary classifiers. Experiments on miniImageNet and TieredImageNet datasets demonstrate that the new distillation methods are significantly superior to traditional knowledge distillation in few-shot learning tasks. The source code is available from https://github.com/cjy97/FSLKD.
    Available online:  September 27, 2023 , DOI: 10.13328/j.cnki.jos.006972
    Abstract:
    Subset repair for inconsistent data is an important research problem in the field of data cleaning. Most of the existing methods are based on integrity constraint rules and adopt the principle of the minimum number of deleted tuples for subset repair. However, these methods take no account of the quality of deleted tuples, and the repair accuracy is low. Therefore, this study proposes a subset repair method combining rules and probabilities. The probability of inconsistent tuples is modeled so that the average probability of correct tuples is greater than that of wrong tuples, and the optimal subset repair with the smallest sum of the probability of deleted tuples is calculated. In addition, in order to reduce the time overhead of calculating the probability of inconsistent tuples, this study proposes an efficient error detection method to reduce the size of inconsistent tuples. Experimental results on real data and synthetic data verify that the proposed method outperforms the state-of-the-art subset repair method in terms of accuracy.
    Available online:  September 27, 2023 , DOI: 10.13328/j.cnki.jos.006973
    Abstract:
    In recent years, software system security issues are attracting increasing attention. The security threats existing in systems can be easily exploited by attackers. Attackers usually attack systems by using various attacking techniques, such as password brute force cracking, phishing, and SQL injection. Threat modeling is a method of structurally analyzing, identifying, and processing threats. Traditional tests mainly focus on testing code defects, which take place in the late stage of software development. It is not able to well connect the results from early threat modeling and analysis for building secure software. Threat modeling tools in the industry lack the function of generating security tests. In order to tackle this problem, this study proposes a framework that is able to generate security test cases from threat models and designs and implements a tool prototype. In order to facilitate tests, this study improves the traditional attack tree model and performs compliance checks. Test scenarios can be automatically generated from the model. The test scenarios are evaluated according to the probabilities of attack nodes, and the scenarios of the threats with higher probabilities will be tested first. The defense nodes are evaluated, and the defense scheme with higher profit is selected to alleviate the threats, so as to improve the system’s security design. By setting parameters for attack nodes, test scenarios can be specified as test cases. In the early stage of software development, with the inputs of the threats identified by threat modeling, test cases can be generated through this framework and tool to guide subsequent security development and test design, which improves the integration of security technology in software design and development. The case study applies this framework and tool in test generation for very high security risks, which shows their effectiveness.
    Available online:  September 27, 2023 , DOI: 10.13328/j.cnki.jos.006998
    Abstract:
    Multimodal sentiment analysis is a task that uses subjective information from multiple modalities to analyze sentiment. Exploring how to effectively learn the interaction between modalities has always been an essential task in multimodal analysis. In recent research, it is found that the learning rate of different modalities is unbalanced, leading to the convergence of one modality while the rest of the modalities are under-fitting, which weakens the effect of multimodal collaborative decision-making. In order to combine multiple modalities more effectively and learn the multimodal sentiment features with rich expression, this study proposes a multimodal sentiment analysis method based on adaptive weight fusion. The method of adaptive weight fusion is divided into two phases. The first phase is to adaptively change the fusion weights of unimodal feature representations according to the difference of unimodal learning gradients to dynamically balance the modal learning rate. The study calls this phase balanced fusion (B-fusion). The second phase is to eliminate the impact of the fusion weights of B-fusion on task analysis, propose the modal attention to explore the contributions of modalities to the task, and dynamically allocate the fusion weight to each modality. The study calls this phase attention fusion (A-fusion). The experimental results show that the introduction of the B-fusion method into existing multimodal sentiment analysis methods can effectively improve the accuracy of sentiment analysis. The ablation experiment results show that adding the A-fusion method to B-fusion can effectively reduce the impact of B-fusion weights on the task, which is conducive to improving the analysis results of sentiment analysis. Compared with the existing multimodal sentiment analysis models, the proposed method has a simpler structure, lower computational consumption, and better task accuracy than these comparison models, which shows that the method has high efficiency and excellent performance in multimodal sentiment analysis tasks.
    Available online:  September 27, 2023 , DOI: 10.13328/j.cnki.jos.006999
    Abstract:
    Revealing the complex relations among emotions is an important fundamental study in cognitive psychology. From the perspective of natural language processing, the key to exploring the relations among emotions lies in the embedded representation of emotional categories. Recently, there has been some interest in obtaining a category representation in the emotion space that can characterize emotion relations. However, the existing methods for emotion category representations have several drawbacks. For example, fixed dimensionality, the dimensionality of the emotion category representation, depends on the selected dataset. In order to obtain better representations for the emotion categories, this study introduces a supervised contrastive learning representation method. In the previous supervised contrastive learning, the similarity between samples depends on the similarity of the annotated labels of the samples. In order to better reflect the complex relations among different emotion categories, the study further proposes a partially similar supervised contrastive learning representation method, which suggests that samples of different emotion categories (e.g., anger and annoyance) may also be partially similar to each other. Finally, the study organizes a series of experiments to verify the ability of the proposed method and the other five benchmark methods in representing the relationship between emotion categories. The experimental results show that the proposed method achieves satisfactory results for the emotion category representations.
    Available online:  September 20, 2023 , DOI: 10.13328/j.cnki.jos.006955
    Abstract:
    The detection of the human respiration waveform in the sleep state is crucial for applications in intelligent health care as well as medical and healthcare in that different respiration waveform patterns can be examined to analyze sleep quality and monitor respiratory diseases. Traditional respiration sensing methods based on contact devices cause various inconveniences to users. In contrast, contactless sensing methods are more suitable for continuous monitoring. However, the randomness of the device deployment, sleep posture, and human motion during sleep severely restrict the application of contactless respiration sensing solutions in daily life. For this reason, the study proposes a detection method for the human respiration waveform in the sleep state based on impulse radio-ultra wide band (IR-UWB). On the basis of the periodic changes in the propagation path of the wireless pulse signal caused by the undulation of the human chest during respiration in the sleep state, the proposed method generates a fine-grained human respiration waveform and thereby achieves the real-time output of the respiration waveform and high-precision respiratory rate estimation. Specifically, to obtain the position of the human chest during respiration from the received wireless radio-frequency (RF) signals, this study proposes the indicator respiration energy ratio based on IR-UWB signals to estimate the target position. Then, it puts forward a vector projection method based on the in-phase/quadrature (I/Q) complex plane and a method of projection signal selection based on the circumferential position of the respiration vector to extract the characteristic human respiration waveform from the reflected signal. Finally, a variational encoder-decoder network is leveraged to achieve the fine-grained recovery of the respiratory waveform in the sleep state. Extensive experiments and tests are conducted under different conditions, and the results show that the human respiration waveforms monitored by the proposed method in the sleep state are highly similar to the actual waveforms captured by commercial respiratory belts. The average error of the proposed method in estimating the human respiratory rate is 0.229 bpm, indicating that the method can achieve high-precision detection of the human respiration waveform in the sleep state.
    Available online:  September 20, 2023 , DOI: 10.13328/j.cnki.jos.006956
    Abstract:
    It is essential to detect out-of-distribution (OOD) training set samples for a safe and reliable machine learning system. Likelihood-based generative models are popular methods to detect OOD samples because they do not require sample labels during training. However, recent studies show that likelihoods sometimes fail to detect OOD samples, and the failure reason and solutions are under explored, especially for text data. Therefore, this study investigates the text failure reason from the views of the model and data: insufficient generalization of the generative model and prior probability bias of the text. To tackle the above problems, the study proposes a new OOD text detection method, namely Pobe. To address insufficient generalization of the generative model, the study increases the model generalization via KNN retrieval. Next, to address the prior probability bias of the text, the study designs a strategy to calibrate the bias and improve the influence of probability bias on OOD detection by a pre-trained language model and demonstrates the effectiveness of the strategy according to Bayes’ theorem. Experimental results over a wide range of datasets show the effectiveness of the proposed method. Specifically, the average AUROC is over 99%, and FPR95 is below 1% under eight datasets.
    Available online:  September 20, 2023 , DOI: 10.13328/j.cnki.jos.006838
    Abstract:
    Attendance may be for private purposes, which is not associated with an organization, such as keeping a personal travel log, or it is for business needs, which is part of organizational attendance and sometimes associated with multiple organizations. Therefore, the recording, sharing, and analysis of attendance data require elaborate management. The HAO attendance system is a lightweight and mobile attendance platform. It takes the user and organization as two starting points and is driven by HAO intelligence consisting of human intelligence (HI), artificial intelligence (AI), and organizational intelligence (OI). This study builds the knowledge graph of the HAO attendance system and puts forward the closed-loop authority management structure of the HAO attendance system, supplemented by the privacy authority management method from coarse-gained to fine-gained level to ensure refined attendance management and protect the users’ privacy, thereby promoting the intelligent transformation of a new-generation attendance system. For organizational attendance analysis, a four-element scoring method and a four-element attendance reporting method are designed to calculate employee attendance scores, generate accurate and comprehensive attendance reports, provide decision-making support for organizations, and inspire the vitality of both organizations and individuals, so as to build intelligent organizations with organizational intelligence.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006948
    Abstract:
    Forgetting is the biggest problem of artificial neural networks in incremental learning and is thus called “catastrophic forgetting”. In contrast, humans can continuously acquire new knowledge and retain most of the frequently used old knowledge. This continuous “incremental learning” ability of human without extensive forgetting is related to the partitioned learning structure and memory replay ability of the human brain. To simulate this structure and ability, the study proposes an incremental learning approach of “recency bias-avoiding self-learning mask (SLM)-based partitioned incremental learning”, or ASPIL for short. ASPIL involves the two stages of regional isolation and regional integration, which are alternately iterated to accomplish continuous incremental learning. Specifically, this study proposes the “Bayesian network (BN)-based sparse regional isolation” method to isolate the new learning process from the existing knowledge and thereby avoid the interference with the existing knowledge. For regional integration, SLM and dual-branch fusion (GBF) methods are proposed. The SLM method can accurately extracts new knowledge and improves the adaptability of the network to new knowledge, while the GBF method integrates the old and new knowledge to achieve the goal of fostering unified and high-precision cognition. During training, a regularization term for Margin Loss is proposed to avoid the “recency bias”, thereby ensuring the further balance of the old knowledge and the avoidance of the bias towards the new knowledge. To evaluate the effectiveness of the proposed method, this study also presents systematic ablation experiments performed on the standard incremental learning datasets CIFAR-100 and miniImageNet and compares the proposed method with a series of well-known state-of-the-art methods. The experimental results show that the method proposed in this study improves the memory ability of the artificial neural network and outperforms the latest well-known methods by more than 5.27% in average identification rate.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006949
    Abstract:
    Deep neural networks can be affected by well-designed backdoor attacks during training. Such attacks are an attack method that controls the model output during tests by injecting data with backdoor labels into the training set. The attacked model performs normally on a clean test set but will be misclassified as the attack target class when the backdoor labels are recognized. The currently available backdoor attack methods have poor invisibility and are still expected to achieve a higher attack success rate. A backdoor attack method based on singular value decomposition is proposed to address the above limitations. The method proposed can be implemented in two ways: One is to directly set some singular values of the picture to zero, and the obtained picture is compressed to a certain extent and can be used as an effective backdoor triggering label. The other is to inject the singular vector information of the attack target class into the left and right singular vectors of the picture, which can also achieve an effective backdoor attack. The backdoor pictures obtained in the two kinds of processing ways are basically the same as the original picture from a visual point of view. According to the experiments, the proposed method proves that singular value decomposition can be effectively leveraged in backdoor attack algorithms to attack neural networks with considerably high success rates on multiple datasets.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006959
    Abstract:
    The openness and ease-of-use of Python make it one of the most commonly used programming languages. The PyPI ecosystem formed by Python not only provides convenience for developers but also becomes an important target for attackers to launch vulnerability attacks. Thus, after discovering Python vulnerabilities, it is critical to deal with Python vulnerabilities by accurately and comprehensively assessing the impact scope of the vulnerabilities. However, the current assessment methods of Python vulnerability impact scope mainly rely on the dependency analysis of packet granularity, which will produce a large number of false positives. On the other hand, existing Python program analysis methods of function granularity have accuracy problems due to context insensitivity and produce false positives when applied to assess the impact scope of vulnerabilities. This study proposes a vulnerability impact scope assessment method for the PyPI ecosystem based on static analysis, namely PyVul++. First, it builds the index of the PyPI ecosystem, then finds the candidate packets affected by the vulnerability through vulnerability function identification, and confirms the vulnerability packets through vulnerability trigger condition. PyVul++ realizes vulnerability impact scope assessment of function granularity, improves the call analysis of function granularity for Python code, and outperforms other tools on the PyCG benchmark (accuracy of 86.71% and recall of 83.20%). PyVul++ is used to assess the impact scope of 10 Python CVE vulnerabilities on the PyPI ecosystem (385855 packets) and finds more vulnerability packets and reduces false positives compared with other tools such as pip-audit. In addition, PyVul++ newly finds that 11 packets in the current PyPI ecosystem still have security issues of referencing unpatched vulnerable functions in 10 assessment experiments of Python CVE vulnerability impact scope.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006964
    Abstract:
    The domain name plays an important role in cybercrimes. Existing malicious domain name detection methods are not only difficult to use with rich topology and attribute information but also require a large amount of label data, resulting in limited detection effects and high costs. To address this problem, this study proposes a malicious domain name detection method based on graph contrastive learning. The domain name and IP address are taken as two types of nodes in a heterogeneous graph, and the feature matrix of corresponding nodes is established according to their attributes. Three types of meta paths are constructed based on the inclusion relationship between domain names, the measure of similarity, and the correspondence between domain names and IP addresses. In the pre-training stage, the contrast learning model based on the asymmetric encoder is applied to avoid the damage to graph structure and semantics caused by graph data augmentation operation and reduce the demand for computing resources. By using the inductive graph neural network graph encoders HeteroSAGE and HeteroGAT, a node-centric mini-batch training strategy is adopted to explore the aggregation relationship between the target node and its neighbor nodes, which solves the problem of poor applicability of the transductive graph neural networks such as GCN in dynamic scenarios. The downstream classification detection task contrastively utilizes logistic regression and random forest algorithms. Experimental results on publicly available data sets show that detection performance is improved by two to six percentage points compared with that of related works.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006945
    Abstract:
    Jacobi computation is a kind of stencil computation, which has been widely applied in the field of scientific computing. The performance optimization of Jacobi computation is a classic topic, where loop tiling is an effective optimization method. The existing loop tiling methods mainly focus on the impact of tiling on parallel communication and program locality and fail to consider other factors such as load balancing and vectorization. This study analyzes and compares several tiling methods based on multi-core computing architecture and chooses an advanced hexagonal tiling as the main method to accelerate Jacobi computation. For tile size selection, this study proposes a hexagonal tile size selection algorithm called Hexagon_TSS by comprehensively considering the impact of tiling on load balancing, vectorization efficiency, and locality. The experimental results show that the L1 data cache miss rate can be reduced to 5.46% of original serial program computation in the best case by Hexagon_TSS, and the maximum speedup reaches 24.48. The proposed method also has excellent scalability.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006947
    Abstract:
    Software change prediction, aimed at identifying change-prone modules, can help software managers and developers allocate resources efficiently and reduce maintenance overhead. Extracting effective features from the code plays a vital role in the construction of accurate prediction models. In recent years, researchers have shifted from traditional hand-crafted features to semantic features with powerful representation capabilities for prediction. They extracted semantic features from abstract syntax tree (AST) node sequences to build models. However, existing studies have ignored the structural information in the AST and the rich semantic information in the code. How to extract the semantic features of the code is still a challenging problem. For this reason, the study proposes a change prediction method based on hybrid graph representation. To start with, the model combines AST, control flow graph (CFG), data flow graph (DFG), and other structural information to construct the program graph representation of the code. Then, it uses the graph neural network to learn the semantic features of the program graph and the features obtained to predict change-proneness. The model can integrate various semantic information to represent the code better. The effectiveness of the proposed method is verified by comparing it with the latest change prediction methods on various change datasets.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006928
    Abstract:
    Detecting out-of-distribution (OOD) samples outside the training set distribution is crucial for deploying deep neural network (DNN) classifiers in the open environment. OOD sample detection is a binary classification problem, which is to classify the input samples into the in-distribution (ID) or OOD categories. Then, the detector itself can be re-bypassed by malicious adversarial attacks. These OOD samples with malicious perturbations are called adversarial OOD samples. Building robust OOD detectors to detect adversarial OOD samples is more challenging. Existing methods usually train DNN through adversarial OOD samples within the neighborhood of auxiliary clean OOD samples to learn separable and robust representations to malicious perturbations. However, due to the distributional differences between the auxiliary OOD training set and original ID training set, training adversarial OOD samples is not effective enough to ensure the robustness of ID boundary against adversarial perturbations. Adversarial ID samples generated from within the neighborhood of (clean) ID samples are closer to the ID boundary and are also effective in improving the adversarial robustness of the ID boundary. This study proposes a semi-supervised adversarial training approach, DiTing, to build robust OOD detectors to detect clean and adversarial OOD samples. This approach treats the adversarial ID samples as auxiliary near-OOD samples and trains them jointly with other auxiliary clean and adversarial OOD samples to improve the robustness of OOD detection. Experiments show that DiTing has a significant advantage in detecting adversarial OOD samples generated by strong attacks while maintaining state-of-the-art performance in classifying clean ID samples and detecting clean OOD samples. Code is available at https://gitee.com/zhiyang3344/diting.
    Available online:  September 06, 2023 , DOI: 10.13328/j.cnki.jos.006963
    Abstract:
    Aspect-level sentiment classification task, which aims to determine the sentiment polarity of a given aspect, has attracted increasing attention due to its broad applications. The key to this task is to identify contextual descriptions relevant to the given aspect and predict the aspect-related sentiment orientation of the author according to the context. Statistically, it is found that close to 30% of reviews convey a clear sentiment orientation without any explicit sentiment description of the given aspect, which is called implicit sentiment expression. Recent attention mechanism-based neural network methods have gained great achievement in sentiment analysis. However, this kind of method can only capture explicit aspect-related sentiment descriptions but fails to effectively explore and analyze implicit sentiment, and it often models aspect words and sentence contexts separately, which makes the expression of aspect words lack contextual semantics. To solve the above two problems, this study proposes an aspect-level sentiment classification method that integrates local aspect information and global sentence context information and improves the classification performance of the model by curriculum learning according to different classification difficulties of implicit and explicit sentiment sentences. Experimental results show that the proposed method not only has a high accuracy in identifying the aspect-related sentiment orientation of explicit sentiment sentences but also can effectively learn the sentiment categories of implicit sentiment sentences.
    Available online:  September 06, 2023 , DOI: 10.13328/j.cnki.jos.006960
    Abstract:
    Thanks to the low storage cost and high retrieval speed, graph-based unsupervised cross-modal hash learning has attracted much attention from academic and industrial researchers and has been an indispensable tool for cross-modal retrieval. However, the high computational complexity of graph structures prevents its application in large-scale multi-modal applications. This study mainly attempts to solve two important challenges facing graph-based unsupervised cross-modal hash learning: 1) How to efficiently construct graphs in unsupervised cross-modal hash learning? 2) How to handle the discrete optimization in cross-modal hash learning? To address such two problems, this study presents anchor-based cross-modal learning and a differentiable hash layer. To be specific, the study first randomly samples some image-text pairs from the training set as anchor sets and uses the anchor sets as the agent to compute the graph matrix of each batch of data. The graph matrix is used to guide cross-modal hash learning, thus remarkably reducing the space and time cost; second, the proposed differentiable hash layer directly adopts binary coding for computation during network forward propagation and produces gradient to update the network without continuous-value relaxation during backpropagation, thus embracing better hash encoding performance. Finally, the study introduces cross-modal ranking loss to consider the ranking results in the training process and improve the cross-modal retrieval accuracy. To verify the effectiveness of the proposed algorithm, the study compares the algorithm with 10 cross-modal hash algorithms on three general data sets.
    Available online:  September 06, 2023 , DOI: 10.13328/j.cnki.jos.006969
    Abstract:
    As an essential component of real-time system design, priority is utilized to resolve conflicts in resource sharing and design for safety. For real-time systems that introduce priorities, each task is assigned a priority, which leads to the possibility of low-priority tasks being preempted by high-priority tasks at runtime, thus creating a preemptive scheduling problem for real-time systems. Existing research on this problem lacks a modeling and automatic verification method that can visually represent the priority of tasks and the dependencies between tasks. To this end, a preemptive priority timed automata (PPTA) is proposed and a preemptive priority timed automata network (PPTAN) is introduced. First, the priority of a task is represented by adding the priority of migration to the timed automata, and then the migration is adopted to correlate tasks with dependencies so that PPTA can be applied to model real-time tasks with priority. The blocking position is also added to the timed automata, so PPTAN can be used to model the priority preemptive scheduling problem. Second, a model-based transformation method is proposed to map the PPTA to the automatic verification tool UPPAAL. Finally, by modeling an example of a multi-core multi-task real-time system and comparing it with other models, it is shown that this model is not only suitable for modeling the priority preemptive scheduling problem but also for accurately verifying and analyzing it.
    Available online:  September 06, 2023 , DOI: 10.13328/j.cnki.jos.006979
    Abstract:
    When prototypical networks are directly applied to few-shot named entity recognition (FEW-NER), there are the following problems: Non-entities do not have strong semantic relationships with each other, and using the same way to construct the prototype for both entities and non-entities will make non-entity prototypes fail to accurately represent the semantic characteristics of non-entities; using only the average entity vector as the computing method of the prototype will make it difficult to capture similar entities with different semantic features. To address these problems, this study proposes a FEW-NER based on fine-grained prototypical networks (FNFP) to improve the annotation effect of FEW-NER. Firstly, different non-entity prototypes are constructed for different query sets to capture the key semantic features of non-entities in sentences and obtain finer-grained prototypes to improve the recognition effect of non-entities. Then, an inconsistent metric module is designed to measure the inconsistency between similar entities, and different metric functions are applied to entities and non-entities, so as to reduce the feature representation between similar samples and improve the feature representation of the prototype. Finally, a Viterbi decoder is introduced to capture the label transformation relationship and optimize the final annotation sequence. The experimental results show that the performance of the proposed method is improved compared with that of the large-scale FEW-NER dataset, namely FEW-NERD; and the generalization ability of this method in different domain scenarios is verified on the cross-domain dataset.
    Available online:  September 06, 2023 , DOI: 10.13328/j.cnki.jos.006980
    Abstract:
    A large number of bug reports are generated during software development and maintenance, which can help developers to locate bugs. Information retrieval based bug localization (IRBL) analyzes the similarity of bug reports and source code files to locate bugs, achieving high accuracy at the file and function levels. However, a lot of labor and time costs are consumed to find bugs from suspicious files and function fragments due to the coarse location granularity of IRBL. This study proposes a statement level software bug localization method based on historical bug information retrieval, STMTLocator. Firstly, it retrieves historical bug reports which are similar to the bug report of the program under test and extracts the bug statements from the historical bug reports. Then, it retrieves the suspicious files according to the text similarity between the source code files and the bug report of the program under test, and extracts the suspicious statements from the suspicious files. Finally, it calculates the similarity between the suspicious statements and the historical bug statements, and arranges them in descending order to localize bug statements. To evaluate the bug localization performance of STMTLocator, comparative experiments are conducted on the Defects4J and JIRA dataset with Top@N, MRR, and other evaluation metrics. The experimental results show that STMTLocator is nearly four times than the static bug localization method BugLocator in terms of MRR and locates 7 more bug statements for Top@1. The average time used by STMTLocator to locate a bug version is reduced by 98.37% and 63.41% than dynamic bug localization methods Metallaxis and DStar, and STMTLocator has a significant advantage of not requiring the construction and execution of test cases.
    Available online:  August 30, 2023 , DOI: 10.13328/j.cnki.jos.006961
    Abstract:
    Fault localization collects and analyzes the runtime information of test case sets to evaluate the suspiciousness of each statement of being faulty. Test case sets are constructed by the data from the input domain and have two types, i.e., passing test cases and failing ones. Since failing test cases generally account for a very small portion of the input domain, and their distribution is usually random, the number of failing test cases is much fewer than that of passing ones. Previous work has shown that the lack of failing test cases leads to a class-imbalanced problem of test case sets, which severely hampers fault localization effectiveness. To address this problem, this study proposes a model-domain data augmentation approach using generative adversarial networks for fault localization. Based on the model domain (i.e., spectrum information of fault localization) rather than the traditional input domain (i.e., program input), this approach uses the generative adversarial network to synthesize the model-domain failing test cases covering the minimum suspicious set, so as to address the class-imbalanced problem from the model domain. The experimental results show that the proposed approach significantly improves the effectiveness of 12 representative fault localization approaches.
    Available online:  August 30, 2023 , DOI: 10.13328/j.cnki.jos.006962
    Abstract:
    With the rapid development of Internet information technologies, the explosive growth of online learning resources has caused the problem of “information overload” and “learning disorientation”. In the absence of expert guidance, it is difficult for users to identify their learning demands and select the appropriate content from the vast amount of learning resources. Educational domain recommendation methods have received a lot of attention from researchers in recent years because they can provide personalized recommendations of learning resources based on the historical learning behaviors of users. However, the existing educational domain recommendation methods ignore the modeling of complex relationships among knowledge points in learning demand perception and fail to consider the dynamic changes of users’ learning demands, which leads to inaccurate learning resource recommendations. To address the above problems, this study proposes a knowledge point recommendation method based on static and dynamic learning demand perception, which models users’ learning behaviors under complex knowledge association by combining static perception and dynamic perception. For static learning demand perception, this study innovatively designs an attentional graph convolutional network based on the first-course-following meta-path guidance of knowledge points, which can accurately capture users’ static learning demands at the fine-grained knowledge point level by modeling the complex constraints of the first-course-following relationship between knowledge points and eliminating the interference of other non-learning demand factors. For dynamic learning demand perception, the method aggregates knowledge point embeddings to characterize users’ knowledge levels at different moments by taking courses as units and then uses a recurrent neural network to encode users’ knowledge level sequences, which can effectively explore the dynamic learning demands hidden in users’ knowledge level changes. Finally, this study fuses the obtained static and dynamic learning demands, models the compatibility between static and dynamic learning demands in the same framework, and promotes the complementarity of these two learning demands to achieve fine-grained and personalized knowledge point recommendations. Experiments show that the proposed method can effectively perceive users’ learning demands, provide personalized knowledge point recommendations on two publicly available datasets, and outperform the mainstream recommendation methods in terms of various evaluation metrics.
    Available online:  August 30, 2023 , DOI: 10.13328/j.cnki.jos.006923
    Abstract:
    Kernel heap vulnerability is currently one of the main threats to operating system security. User-space attackers can leak or modify sensitive kernel information, disrupt kernel control flow, and even gain root privilege by triggering a vulnerability. However, due to the rapid increase in the number and complexity of vulnerabilities, it often takes a long time from when a vulnerability is first reported to when the developer issues a patch, and kernel mitigation mechanisms currently adopted are usually steadily bypassed. Therefore, this study proposes an eBPF-based dynamic mitigation framework for kernel heap vulnerabilities, so as to reduce kernel security risks during the time window fixing. The framework adopts data object space randomization to assign random addresses to the data objects involved in vulnerability reports at each allocation. In addition, it takes full advantage of the dynamic and secure features of eBPF to inject space-randomized objects into the kernel during runtime, so the attacker cannot place any attack payload accurately, and the heap vulnerabilities are almost unexploitable. This study evaluates 40 real kernel heap vulnerabilities and collects 12 attacks that bypass the existing mitigation mechanisms for further analysis and tests. As a result, it verifies that the dynamic mitigation framework provides sufficient security. Performance tests show that even under severe conditions, the four types of data objects only cause performance loss of about 1% and negligible memory loss to the system, and there is almost no additional performance loss when the number of protected objects increases. Compared with related work, the mechanism in this study has a wider scope of application and stronger security, and it does not require vulnerability patches issued by security experts. Furthermore, it can generate mitigation procedures according to vulnerability reports and has a broad application prospect.
    Available online:  August 30, 2023 , DOI: 10.13328/j.cnki.jos.006925
    Abstract:
    Regular expressions are widely used in various areas of computer science. However, due to the complex syntax and the use of a large number of meta-characters, regular expressions are quite error-prone when defined and used by developers. Testing is a practical and effective way to ensure the semantic correctness of regular expressions. The most common method is to generate a set of character strings according to the tested expression and check whether they comply with the intended language. Most of the existing test data generation focuses only on positive strings. However, empirical study shows that a majority of errors during actual development are manifested by the fact that the defined language is smaller than the intended one. In addition, such errors can only be detected by negative strings. This study investigates the generation of negative strings from regular expressions based on mutation. The study first obtains a set of mutants by injecting defects into the tested expression through mutation and then selects a negative character string in the complementary set of the language defined by the tested expression to reveal the error simulated by the corresponding mutant. In order to simulate complex defects and avoid the problem that the negative strings cannot be obtained due to the specialization of mutants, a second-order mutation mechanism is adopted. Meanwhile, optimization techniques such as redundant mutant elimination and mutation operator selection are used to reduce the mutants, so as to control the size of the finally generated test set. The experimental results show that the proposed algorithm can generate negative test strings with a moderate size and have strong error detection ability compared with the existing tools.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006926
    Abstract:
    Hypergraphs are generalized representations of ordinary graphs, which are common in many application areas, including the Internet, bioinformatics, and social networks. The independent set problem is a fundamental research problem in the field of graph analysis. Most of the traditional independent set algorithms are targeted for ordinary graph data, and how to achieve efficient maximum independent set mining on hypergraph data is an urgent problem to be solved. To address this problem, this study proposes a definition of hypergraph independent sets. Firstly, two properties of hypergraph independent set search are analyzed, and then a basic algorithm based on the greedy strategy is proposed. Then a pruning framework for hypergraph approximate maximum independent set search is proposed, i.e., a combination of exact pruning and approximate pruning, which reduces the size of the graph by the exact pruning strategy and speeds up the search by the approximate pruning strategy. In addition, four efficient pruning strategies are proposed in this study, and a theoretical proof of each pruning strategy is presented. Finally, experiments are conducted on 10 real hypergraph data sets, and the results show that the pruning algorithm can efficiently search for hypergraph maximum independent sets that are closer to the real results.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006927
    Abstract:
    Entity recognition is a key technology for information extraction. Compared with ordinary text, the entity recognition of Chinese medical text is often faced with a large number of nested entities. Previous methods of entity recognition often ignore the entity nesting rules unique to medical text and directly use sequence annotation methods. Therefore, a Chinese entity recognition method that incorporates entity nesting rules is proposed. This method transforms the entity recognition task into a joint training task of entity boundary recognition and boundary first-tail relationship recognition in the training process and filters the results by combining the entity nesting rules summarized from actual medical text in the decoding process. In this way, the recognition results are in line with the composition law of the nested combinations of inner and outer entities in the actual text. Good results have been achieved in public experiments on entity recognition of medical text. Experiments on the dataset show that the proposed method is significantly superior to the existing methods in terms of nested-type entity recognition performance, and the overall accuracy is increased by 0.5% compared with the state-of-the-art methods.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006929
    Abstract:
    As a privacy-preserving digital identity authentication technology, anonymous credentials not only authenticate the validity of the users’ digital identity but also protect the privacy of their identity. Anonymous credentials are widely applied in anonymous authentication, anonymous tokens, and decentralized digital identity systems. Existing anonymous credentials usually adopt the commitment-signature-proof paradigm, which requires that the adopted signature scheme should have the re-randomization property, such as CL signatures, PS signatures, and structure-preserving signatures (SPS). In practical applications, ECDSA, Schnorr, and SM2 are widely employed for digital identity authentication, but they lack the protection of user identity privacy. Therefore, it is of certain practical significance to construct anonymous credentials compatible with ECDSA, Schnorr, SM2, and other digital signatures, and protect identity privacy during the authentication. This study explores anonymous credentials based on SM2 digital signature. Pedersen commitment is utilized to commit the user attributes in the registration phase. Meanwhile, according to the structural characteristics of SM2, the signed message is H(m), and the equivalence between the Pedersen commitment message and the hash commitment message is proven. This study also employs ZKB++ technology to prove the equivalence of algebraic and non-algebraic statements. The commitment message is transformed to achieve the cross-domain proof and issue the users’ credentials based on the SM2 digital signature. In the showing phase of anonymous credentials, the zero-knowledge proof is combined to prove the possession of an SM2 signature and ensure the anonymity of credentials. This study provides the construction of an anonymous credential protocol based on SM2 digital signature and proves the security of this protocol. Finally, it also verifies the effectiveness and feasibility of the protocol by analyzing the computational complexity of the protocol and testing the algorithm execution efficiency.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006930
    Abstract:
    Since the Snowden revelations, threats from backdoor attacks represented by algorithm substitution attack (ASA) have been widely concerned. This kind of attack subverts the process of the algorithm that tampers with the cryptographic protocol participants in an undetectable manner, which embeds backdoors to obtain secrets. Building a cryptographic reverse firewall (CRF) for protocol participants is a well-known and feasible approach against ASA. Identity-based encryption (IBE), as a quite applicable public key infrastructure, has vital importance to be protected by appropriate CRF schemes. However, the existing work only realizes the CRF re-randomization, ignoring the security risk of sending users’ private keys directly to the third-party CRF. Given the above problem, the formal definition and security model of security properties of CRF applicable to IBE are proposed. Then, the formal definition of rerandomizable and key-malleable secure channel free IBE (RKM-SCF-IBE) and the method of transforming traditional IBE to RKM-SFC-IBE are presented. In addition, an approach to increasing anonymity is also given. Finally, a generic provably secure framework of CRF construction for IBE is proposed based on RKM-SFC-IBE, with several instantiations from classic IBE schemes in the standard model and simulation results with optimization methods. Compared with existing work, the proposed scheme is proven secure under a more complete security model with a generic approach to building CRF for IBE schemes and clarifies the basic principles when constructing CRF for more expressive encryption schemes.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006931
    Abstract:
    Accurately extracting two types of information including elements and clauses in contract texts can effectively improve the contract review efficiency and provide facilitation services for all trading parties. However, current contract information extraction methods generally train single-task models to extract elements and clauses separately, whereas they do not dig deep into the characteristics of contract texts, ignoring the relevance among different tasks. Therefore, this study employs a deep neural network structure to study the correlation between the two tasks of element extraction and clause extraction and proposes a multitask learning method. Firstly, the primary multitask learning model is built for contract information extraction by combining the above two tasks. Then, the model is optimized and attention mechanism is adopted to further explore the correlation. Additionally, an Attention-based dynamic multitask-learning model is built. Finally, based on the above two methods, adynamic multitask learning model with lexical knowledge is proposed for the complex semantic environment in contract texts. The experimental results show that the method can fully capture the shared features among tasks and yield better information extraction results than the single-task model. It can solve the nested entity among elements and clauses in contract texts, and realize the joint information extraction of contract elements and clauses. In addition, to verify the robustness of the proposed method, this study conducts experiments on public datasets in various fields, and the results show that the proposed method is superior to baseline methods.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006932
    Abstract:
    Adversarial texts are malicious samples that can cause deep learning classifiers to make errors. The adversary creates an adversarial text that can deceive the target model by adding subtle perturbations to the original text that are imperceptible to humans. The study of adversarial text generation methods can evaluate the robustness of deep neural networks and contribute to the subsequent robustness improvement of the model. Among the current adversarial text generation methods designed for Chinese text, few attack the robust Chinese BERT model as the target model. For Chinese text classification tasks, this study proposes an attack method against Chinese BERT, that is Chinese BERT Tricker. This method adopts a character-level word importance scoring method, important Chinese character positioning. Meanwhile, a word-level perturbation method for Chinese based on the masked language model with two types of strategies is designed to achieve the replacement of important words. Experimental results show that for the text classification tasks, the proposed method can significantly reduce the classification accuracy of the Chinese BERT model to less than 40% on two real datasets, and it outperforms other baseline methods in terms of multiple attack performance.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006936
    Abstract:
    As a new learning paradigm to solve the problem of label ambiguity, label distribution learning (LDL) has received much attention in recent years. To further improve the prediction performance of LDL, this study proposes an LDL based on deep forest and heterogeneous ensemble (LDLDF), which uses the cascade structure of deep forest to simulate deep learning models with multi-layer processing structure and combines multiple heterogeneous classifiers in the cascade layer to increase the diversity of ensemble. Compared with other existing LDL methods, LDLDF can process information layer by layer and learn better feature representations to mine rich semantic information in data, and it has better representation learning ability and generalization ability. In addition, by considering the degradation problem of deep models, LDLDF adopts a layer feature reuse mechanism to reduce the training error of the model, which effectively utilizes the prediction ability of each layer in the deep model. Sufficient experimental results show that LDLDF is superior to other methods.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006937
    Abstract:
    Object detection is widely used in various fields such as autonomous driving, industry, and medical care. Using the object detection algorithm to solve key tasks in different fields has gradually become the main method. However, the robustness of the object detection model based on deep learning is seriously insufficient under the attack of adversarial samples. It is easy to make the model prediction wrong by adding the adversarial samples constructed by small perturbations, which greatly limits the application of the object detection model in key security fields. In practical applications, the models are black-box models. Related research on black-box attacks against object detection models is relatively lacking, and there are many problems such as incomplete robustness evaluation, low attack success rate of black-box, and high resource consumption. To address the aforementioned issues, this study proposes a black-box object detection attack algorithm based on a generative adversarial network. The algorithm uses the generative network fused with an attention mechanism to output the adversarial perturbations and employs the alternative model loss and the category attention loss to optimize the generated network parameters, which can support two scenarios of target attack and vanish attack. A large number of experiments are conducted on the Pascal VOC and the MSCOCO datasets. The results demonstrate that the proposed method has a higher black-box transferable attack success rate and can perform transferable attacks between different datasets.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006941
    Abstract:
    The transport layer is a key component in the network protocol stack, which is responsible for providing end-to-end services for applications between different hosts. Existing transport layer protocols such as TCP provide users with some basic security protection mechanisms, e.g., error controls and acknowledgments, which ensures the consistency of datagrams sent and received by applications between different hosts to a certain extent. However, these security protection mechanisms of the transport layer have serious flaws. For example, the sequence number of TCP datagrams is easy to be guessed and inferred, and the calculation of the datagram’s checksum depends on the vulnerable sum of the complement algorithm. As a result, the existing transport layer security mechanisms cannot guarantee the integrity and security of the datagram, which allows a remote attacker to craft a fake datagram and inject it into the target network stream, thus poisoning the target network stream. The attack against the transport layer occurs at the basic layers of the network protocol stack, which can bypass the security protection mechanisms enforced at the upper application layer and thus cause serious damage to the network infrastructure. After investigating various attacks over network protocols and the related security vulnerabilities in recent years, this study proposes a method for enhancing the security of the transport layer? based on lightweight chain verification, namely LightCTL. Based on the hash verification, LightCTL enables both sides of a TCP connection to create a mutually verifiable consensus on transport layer datagrams, so as to prevent attackers or middlemen from stealing and forging sensitive information. As a result, LightCTL can successfully foil various attacks against the network protocol stack, including TCP connection reset attacks based on sequence number inferring, TCP hijacking attacks, SYN flooding attacks, man-in-the-middle attacks, and datagram replay attacks. Besides, LightCTL does not need to modify the protocol stack of intermediate network devices such as routers. It only needs to modify the checksum and the related parts of the end protocol stack. Therefore, LightCTL can be easily deployed and significantly improves the security of network systems.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006951
    Abstract:
    Fact verification is intended to check whether a textual statement is supported by a given piece of evidence. Due to the structural dependence and implicit content of tables, the task of fact verification with tables as the evidence still faces many challenges. Existing literature has either used logical expressions to parse statements based on tabular evidence or designed table-aware neural networks to encode statement-table pairs and thereby accomplish table-based fact verification tasks. However, these approaches fail to fully utilize the implicit tabular information behind the statements, which leads to the degraded inference performance of the model. Moreover, Chinese statements based on tabular evidence have more complex syntax and semantics, which also adds to the difficulties in model inference. For this reason, the study proposes a method of fact verification with Chinese tabular data based on the capsule heterogeneous graph attention network (CapsHAN). This method can fully understand the structure and semantics of statements. On this basis, the tabular information implied by the statements is mined and utilized to effectively improve the accuracy of table-based fact verification tasks. Specifically, a heterogeneous graph is constructed by performing syntactic dependency parsing and named entity recognition of statements. Subsequently, the graph is learned and understood by the heterogeneous graph attention network and the capsule graph neural network, and the obtained textual representation of the statements is sliced together with the textual representation of the encoded tables. Finally, the result is predicted. Further, this study also attempts to address the problem that the datasets of fact verification based on Chinese tables are scarce and thus unable to support the performance evaluation of table-based fact verification methods. For this purpose, the study transforms the mainstream English table-based fact verification datasets TABFACT and INFOTABS into Chinese and constructs a dataset that is based on the uniform content label (UCL) national standard and specifically tailored to the characteristics of Chinese tabular data. This dataset, namely, UCLDS, takes Wikipedia infoboxes as evidence of manually annotated natural language statements and labels them into three classes: entailed, contradictory, and neutral. UCLDS outperforms the traditional datasets TABFACT and INFOTABS in supporting both single-table and multi-table inference. The experimental results on the above three Chinese benchmark datasets show that the proposed model outperforms the baseline model invariably, demonstrating its superiority for Chinese table-based fact verification tasks.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006924
    Abstract:
    Software defect localization refers to the activity of finding program elements that are related to software failure. The existing defect localization techniques, however, can only produce localization results at the function or statement level. These coarse-grained localization results can affect the efficiency and effectiveness of manual debugging and automatic software defect repair. This study focuses on the fine-grained identification of specific code tokens that lead to software defects. The study establishes abstract syntax tree paths for code tokens and proposes a fine-grained defect localization model based on a pointer neural network to predict specific code tokens of defects and specific operation behaviors of repairing the tokens. A large number of defect patch data sets in open-source projects contain a large amount of trainable data, and the paths constructed based on abstract syntax trees can effectively capture the program’s structural information. Experimental results show that the model trained in this study can accurately predict defect code tokens and is significantly better than the baseline methods based on statistics and machine learning. In addition, in order to verify that fine-grained defect localization results can contribute to automatic defect repair, two kinds of program repair processes are designed based on the fine-grained defect localization results. The processes are implemented by using code completion tools to predict the correct token or by following heuristic rules to find appropriate code repair elements. The results show that both methods can effectively solve the overfitting problem in automatic software defect repair.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006914
    Abstract:
    The training of high-precision federated learning models consumes a large number of users’ local resources. The users who participate in the training can gain illegal profits by selling the jointly trained model without others’ permission. In order to protect the property rights of federated learning models, this study proposes a federated learning watermark based on backdoor (FLWB) by using the feature that deep learning backdoor technology maintains the accuracy of main tasks and only causes misclassification in a small number of trigger set samples. FLWB allows users who participate in the training to embed their own private watermarks in the local model and then map the private backdoor watermarks to the global model through the model aggregation in the cloud as the global watermark for federated learning. Then a stepwise training method is designed to enhance the expression effect of private backdoor watermarks in the global model so that FLWB can accommodate the private watermarks of the users without affecting the accuracy of the global model. Theoretical analysis proves the security of FLWB, and experiments verify that the global model can effectively accommodate the private watermarks of the users who participate in the training by only causing an accuracy loss of 1% of the main tasks through the stepwise training method. Finally, FLWB is tested by model compression and fine-tuning attacks. The results show that more than 80% of the watermarks can be retained when the model is compressed to 30% by FLWB, and more than 90% of the watermarks can be retained under four different fine-tuning attacks, which indicates the excellent robustness of FLWB.
    Available online:  August 16, 2023 , DOI: 10.13328/j.cnki.jos.006904
    Abstract:
    Code review is one of the best practices widely used in modern software development, which is crucial for ensuring software quality and strengthening engineering capability. Code review comments (CRCs) are one of the main and most important outputs of code reviews. CRCs are not only the reviewers’ perceptions of code quality but also the references for authors to fix code defects and improve quality. Nowadays, although a number of software organizations have developed guidelines for performing code reviews, there are still few effective methods for evaluating the quality of CRCs. To provide an explainable and automated quality evaluation of CRCs, this study conducts a series of empirical studies such as literature reviews and case analyses. Based on the results of the empirical studies, the study proposes a multi-label learning-based approach for evaluating the quality of CRCs. Experiments are carried out by using a large software enterprise-specific dataset that includes a total of 17 000 CRCs from 34 commercial projects. The results indicate that the proposed approach can effectively evaluate the quality attributes and grades of CRCs. The study also provides some modeling experiences such as CRC labeling and verification, so as to help software organizations struggling with code reviews better implement the proposed approach.
    Available online:  August 16, 2023 , DOI: 10.13328/j.cnki.jos.006939
    Abstract:
    Internet transport-layer protocols rely on the feedback information provided by the acknowledgment (ACK) mechanism to achieve functions such as congestion control and reliable transmission. According to the evolution of Internet transmission protocols, the ACK mechanisms of transmission control are reviewed. The unsolved problems among the mechanisms are discussed. Based on the elements of “type-trigger-information”, the ACK mechanism based on demand and its design principle are proposed, and the coupling relationship between the ACK mechanism and other transmission protocol submodules (e.g., congestion control, packet loss recovery, etc.) is emphatically analyzed. Subsequently, according to the design principle, the TACK mechanism, a feasible ACK mechanism based on demand, is elaborated, and relative concepts are systematically clarified. Finally, several meaningful research directions are provided according to the challenges encountered by the ACK mechanism based on demand.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006912
    Abstract:
    Adaptor signature, also known as scriptless script, is an important cryptographic technique that can be used to solve the problems of poor scalability and low transaction throughput in blockchain applications such as cryptocurrency. An adaptor signature can be seen as an extension of a digital signature on hard relations, and it ties together the authorization with witness extraction and has many advantages in blockchain applications, such as (1) low on-chain cost; (2) improved fungibility of transactions; (3) advanced functionality beyond the limitation of the blockchain’s scripting language. SM2 signature is the Chinese national standard signature algorithm and has been widely used in various important information systems. This work designs an efficient SM2-based adaptor signature with batch proofs and gives security proofs under the random oracle model. The scheme avoids to generate zero-knowledge proofs used in the pre-signing phase based on the structure of SM2 signature and is more efficient than existing ECDSA/SM2-based adaptor signature. Specifically, the efficiency of pre-signature generation is increased by 4 times, and the efficiency of pre-signature verification is increased by 3 times. Then, based on distributed SM2 signature, this work develops distributed SM2-based adaptor signature which can avoid the single point of failure and improve the security of signing key. Finally, in real-world applications, this work gives a secure and efficient batch atomic swap protocol for one-to-many scenarios based on SM2-based adaptor signature.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006915
    Abstract:
    Driven by mature data mining technologies, the recommendation system has been able to efficiently utilize explicit and implicit information such as score data and behavior traces and then combine the information with complex and advanced deep learning technologies to achieve sound results. Meanwhile, its application requirements also drive the in-depth mining and utilization of basic data and the load reduction of technical requirements to become research hotspots. On this basis, a lightweight recommendation model, namely LG_APIF is proposed, which uses the graph convolutional network (GCN) method to deeply integrate information. According to behavior memory, the model employs Ebbinghaus forgetting curve to simulate the users’ interest change process and adopts linear regression and other relatively lightweight traditional methods to mine adaptive periods and other depth information of items. In addition, it analyzes users’ current interest distribution and calculates the interest value of the item to obtain users’ potential interest type. It further constructs the graph structure of the user-type-item triplet and uses GCN technology after load reduction to generate the final item recommendation list. The experiments have verified the effectiveness of the proposed method. Through the comparison with eight classical models on the datasets of Last.fm, Douban, Yelp, and MovieLens, it is found that the Precision, Recall, and NDCG of the proposed method are improved, with an average improvement of 2.11% on Precision, 1.01% on Recall, and 1.48% on NDCG, respectively.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006921
    Abstract:
    With the development of modern information technology, people’s demand for high resolution and realistic visual perception of image display devices has increased, which has put forward higher requirements for computer software and hardware and brought many challenges to rendering technology in terms of performance and workload. Using machine learning technologies such as deep neural networks to improve the quality and performance of rendered images has become a popular research method in computer graphics, while upsampling low-resolution images through network inference to obtain clearer high-resolution images is an important way to improve image generation performance and ensure high-resolution details. The geometry buffers (G-buffers) generated by the rendering engine in the rendering process contain much semantic information, which help the network learn scene information and features effectively and then improve the quality of upsampling results. In this study, a super-resolution method for rendered contents in low resolution based on deep neural networks is designed. In addition to the color image of the current frame, the method uses high-resolution G-buffers to assist in the calculation and reconstruct the high-resolution content details. The method also leverages a new strategy to fuse the features of high-resolution buffers and low-resolution images, which implements a multi-scale fusion of different feature information in a specific fusion module. Experiments demonstrate the effectiveness of the proposed fusion strategy and module, and the proposed method shows obvious advantages, especially in maintaining high-resolution details, when compared with other image super-resolution methods.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006922
    Abstract:
    SMT solver is an important system software. Therefore, bugs in the SMT solver may lead to the function failure of software relying on it and even bring security incidents. However, fixing bugs in the SMT solver is time-consuming because developers need to spend a lot of effort in understanding and finding the root cause of the bugs. Although many studies on software bug localization have been proposed, there is no systematic work to automatically locate bugs in the SMT solver. Therefore, this study proposes a bug localization method for the SMT solver based on multi-source spectrums, namely SMTLOC. First, for a given bug in the SMT solver, SMTLOC proposes an enumeration-based algorithm to mutate the formula that triggers the bug by generating a set of witness formulas that will not trigger the bug but has a similar execution trace with the formula that triggers the corresponding bug. Then, according to the execution trace of the witness formulas and the source code information of the SMT solver, SMTLOC develops a technique based on the coverage spectrum and historical spectrum to calculate the suspiciousness of files, thus locating the files that contain the bug. In order to evaluate the effectiveness of SMTLOC, 60 bugs in the SMT solver are collected. Experimental results show that SMTLOC is superior to the traditional spectrum bug localization method and can locate 46.67% of the bugs in TOP-5 files, and the localization effect is improved by 133.33%.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006905
    Abstract:
    Machine learning methods can be well combined with software testing to enhance test effect, but few scholars have applied it to test data generation. In order to further improve the efficiency of test data generation, a chained model combining support vector machine (SVM) and extreme gradient boosting (XGBoost) is proposed, and multi-path test data generation is realized by a genetic algorithm based on the chained model. Firstly, this study uses certain samples to train several sub-models (i.e., SVM and XGBoost) for predicting the state of path nodes, filters the optimal sub-models based on the prediction accuracy value of the sub-models, and links the optimal sub-models in sequence according to the order of the path nodes, so as to form a chained model, namely chained SVM and XGBoost (C-SVMXGBoost). When using the genetic algorithm to generate test cases, the study makes use of the chained model that is trained instead of the instrumentation method to obtain the test data coverage path (i.e., predicted path), finds the path set with the predicted path similar to the target path, performs instrumentation verification on the predicted path with similar path sets, obtains accurate paths, and calculates fitness values. In the crossover and mutation process, excellent test cases with a large path level depth in the sample set are introduced for reuse to generate test data covering the target path. Finally, individuals with higher fitness during the evolutionary generation are saved, and C-SVMXGBoost is updated, so as to further improve the test efficiency. Experiments show that C-SVMXGBoost is more suitable for solving the path prediction problem and improving the test efficiency than other chained models. Moreover, compared with the existing classical methods, the proposed method can increase the coverage rate by up to 15%. The mean evolutionary algebra is also reduced, and the reduction percentage can reach 65% on programs of large size.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006839
    Abstract:
    As an automatic search tool, mixed integer linear programming (MILP) is widely used to search for differential, linear, integral, and other cryptographic properties of block ciphers. In this study, a new technique of constructing MILP models based on a dynamic selection strategy is proposed, which uses different constraint inequalities to describe the propagation of cryptographic properties under different conditions. Specifically, according to the different Hamming weights of the input division property, this study adopts different methods to construct MILP models of the division property propagation with linear layers. Finally, this technique is applied to search for integral distinguishers of uBlock and Saturnin algorithms. The experimental results show that the proposed technique can obtain an 8-round integral distinguisher with 32 more balance bits than the previous optimal integral distinguisher for the uBlock128 algorithm. In addition, this study gets 9- and 10-round integral distinguishers for uBlock128 and uBlock256 algorithms which are one round longer than the previous optimal integral distinguishers. For the Saturnin256 algorithm, the study finds a 9-round integral distinguisher which is one round longer than the previous optimal integral distinguisher.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006840
    Abstract:
    Hierarchical topic model is an important tool to organize topic hierarchy. Most of the existing hierarchical topic models provide tree-structured prior distributions for document topics by introducing the nCRP construction method into the topic models, but they cannot acquire a topic hierarchy with clear domain meanings, referred to as domain topic hierarchy. Meanwhile, there are not only hierarchical relationships among domain topics but also sub-topic aspect sharing relationships under different parent topics. There is no appropriate model that yields such domain topic hierarchy in the current research on topic relationships. In order to automatically and effectively mine the hierarchical and correlated relationships of domain topics from domain texts, improvements are put forward as follows. Firstly, this study improves the nCRP construction method through the topic sharing mechanism and proposes the nCRP+ hierarchical construction method to provide a tree-structured prior distribution with hierarchical topic aspect sharing for topics generated from topic models. Then the reallocated hierarchical Dirichlet processes (rHDP) are developed based on nCRP+ and HDP models, and an rHDP model is proposed. By employing the domain taxonomy, word semantics, and domain representation of topic words, the study defines domain knowledge, including the domain membership degree based on the voting mechanism, the semantic relevance between words and domain topics, and the contribution degree of hierarchical topic words. Finally, domain knowledge is used to improve the allocation processes of domain topics and topic words in the rHDP model, and rHDP with domain knowledge (rHDP_DK) model is proposed to improve the sampling process. The experimental results show that hierarchical topic models based on nCRP+ are superior to those based on nCRP (hLDA and nHDP) and neural topic model (TSNTM) in terms of evaluation metrics. The topic hierarchy, built by the rHDP_DK model, is characterized by clear domain topic hierarchy and explicit domain differences among related sub-topics. Furthermore, the model will provide a general automatic mining framework for domain topic hierarchy.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006841
    Abstract:
    In multi-label learning, each sample is associated with multiple labels. The key task is how to use the correlation between labels when building the model. Multi-label deep forest (MLDF) algorithm attempts to mine the correlation between labels by using layer-by-layer representation learning under the framework of deep ensemble learning and use the obtained label probability representation to improve prediction accuracy. However, on the one hand, the label probability representation is highly correlated with the label information, which will lead to its low diversity. As the depth of the deep forest increases, the performance will decline. On the other hand, the calculation of label probability requires the storage of forest structures with all layers and the application of these structures one by one in the test stage, which will cause unbearable computational and storage overhead. To solve these problems, this study proposes interaction representation-based MLDF (iMLDF). iMLDF mines the structural information in the feature space from the decision path of the forest model, extracts the feature interaction in the decision tree path by using the random interaction trees, and obtains two interaction representations of feature confidence score and label probability distribution, respectively. On the one hand, iMLDF makes full use of the feature structural information in the forest model to enrich the relevant information between labels. On the other hand, it calculates all the representations through interaction expressions so that the algorithm does not need to store all the forest structures, which greatly improves computational efficiency. The experimental results show that iMLDF algorithm achieves better prediction performance, and the computational efficiency is improved by an order of magnitude compared with MLDF for datasets with massive samples.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006842
    Abstract:
    Graph partitioning is a basic task for distributed graph computing. It is used to divide a large-scale graph into different parts and allocate them to different machines in a cluster. The quality of graph partitioning has a great impact on the performance of distributed graph computing, and graph partitioning aims to minimize edge cuts and load balance. Nowadays, the graph data usually grow dynamically, which needs a partitioning method to process dynamic incremental graphs, so as to ensure the quality of graph partitioning. Although some dynamic graph partitioning algorithms have been presented recently, they cannot process real-time dynamic changes and obtain high-quality graph partitioning results simultaneously. In this study, a dynamic incremental graph partitioning algorithm based on vertex group redistribution (ED-IDGP) is proposed to solve the problem of large-scale dynamic incremental graph partitioning. In ED-IDGP, a dynamic processor is designed to process four different unit update types in real time, and the graph partitioning quality is further improved by executing a local optimizer near the dynamic change in the partition after each unit update. In the local optimizer of ED-IDGP, a vertex group search strategy based on the improved label propagation algorithm is used to search for the vertex group, and a vertex group movement gain formula is proposed to measure the most beneficial vertex group and move it to the target partition for optimization. This study evaluates the performance and efficiency of the ED-IDGP algorithm from different perspectives and metrics on real datasets.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006843
    Abstract:
    As a new granular computing model, partition order product space can describe and solve problems from multiple views and levels. Its problem solving space is a lattice structure composed of multiple problem solving levels, and each problem solving level is composed of multiple one-level views. How to choose the problem solving level in the partition order product space is an NP-hard problem. Therefore, this study proposes a two-stage adaptive genetic algorithm (TSAGA) to find the problem solving level. First, real encoding is used to encode the problem solving level, and then the fitness function is defined according to the classification accuracy and granularity of the problem solving level. The first stage of the algorithm is based on a classical genetic algorithm, and some excellent problem solving levels are pre-selected as part of the initial population of the second stage, so as to optimize the problem solving space. In the second stage of the algorithm, an adaptive selection operator, adaptive crossover operator, and adaptive large-mutation operator are proposed, which can dynamically change with the number of iterations of the current population evolution, so as to further select the problem solving level in the optimized problem solving space. Experimental results demonstrate the effectiveness of the proposed method.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006845
    Abstract:
    Enumerating minimal unsatisfiable subsets (MUS) is an important subproblem in the Boolean satisfiability problem. For an unsatisfiable problem, the MUS enumeration can reflect the key factors resulting in its unsatisfiability. However, enumerating MUS is extremely time-consuming, and different pruning schemes will directly affect the size of the search space and the total number of iterations, thus affecting the algorithm efficiency. This study proposes a novel enhanced pruning scheme, accelerating by critical MSS (ABC), to accelerate the MUS enumeration. According to the relationship among maximal satisfiable subsets (MSS), minimal correction sets (MCS), and MUS, the concepts of cMSS and subMUS are put forward. Additionally, four properties are summarized, namely that each MUS must be a superset of subMUS, and then the feature that MUS and MCS are mutually hitting sets can be effectively employed to avoid the time cost in solving hitting sets of MCS. When the subMUS is unsatisfiable, it will be the only MUS, and the algorithm will terminate in advance; otherwise, the node representing subMUS will be pruned to effectively avoid searching the non-solution space. Meanwhile, the effectiveness of the proposed ABC scheme is proven by theorem, which has been applied to the state-of-the-art algorithms MARCO and MARCO-MAM, respectively. Experimental results on SAT11 MUS benchmarks show the proposed scheme can effectively prune the search space to improve the enumeration efficiency of MUS.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006892
    Abstract:
    The committee consensus and hybrid consensus elect the committee to replace the whole nodes for block validation, which can effectively speed up consensus and improve throughput. However, malicious attacks and bribes can easily lead to committee corruption, affect consensus results, and even cause system paralysis. Although the existing work proposes the reputation mechanism to reduce the possibility of committee corruption, it has high overhead and poor reliability and cannot reduce the impact of corruption on the system. Therefore, this study proposes a dynamic blockchain consensus with pre-validation (DBCP). DBCP realizes reliable reputation evaluation of the committee through pre-validation with little overhead, which can eliminate malicious nodes from the committee in time. If serious corruption has undermined the consensus result, DBCP will transfer the authority of block validation to the whole nodes through dynamic consensus and eliminate the committee nodes that give wrong suggestions to avoid system paralysis. When the committee iterates to the high-credibility state, DBCP will hand over the authority of block validation to the committee, and the whole nodes will accept the consensus result from the committee without verifying the block to speed up the consensus. The experimental results show that the throughput of DBCP is two orders of magnitude higher than that of Bitcoin and similar to that of Byzcoin. In addition, DBCP can quickly deal with committee corruption within a block cycle, demonstrating better security than Byzcoin.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006837
    Abstract:
    Depth ambiguity is an important challenge for multi-person three-dimensional (3D) pose estimation of single-frame images, and extracting contexts from an image has great potential for alleviating depth ambiguity. Current top-down approaches usually model key point relationships based on human detection, which not only easily results in key point shifting or mismatching but also affects the reliability of absolute depth estimation using human scale factor because of a coarse-grained human bounding box with large background noise. Bottom-up approaches directly detect human key points from an image and then restore the 3D human pose one by one. However, the approaches are at a disadvantage in relative depth estimation although the scene context can be obtained explicitly. This study proposes a new two-branch network, in which human context based on key point region proposal and scene context based on 3D space are extracted by top-down and bottom-up branches, respectively. The human context extraction method with noise resistance is proposed to describe the human by modeling key point region proposal. The dynamic sparse key point relationship for pose association is modeled to eliminate weak connections and reduce noise propagation. A scene context extraction method from a bird’s-eye-view is proposed. The human position layout in 3D space is obtained by modeling the image’s depth features and mapping them to a bird’s-eye-view plane. A network fusing human and scene contexts is designed to predict absolute human depth. The experiments are carried out on public datasets, namely MuPoTS-3D and Human3.6M, and results show that compared with those by the state-of-the-art models, the relative and absolute position accuracies of 3D key points by the proposed HSC-Pose are improved by at least 2.2% and 0.5%, respectively, and the position error of mean roots of the key points is reduced by at least 4.2 mm.
    Available online:  July 26, 2023 , DOI: 10.13328/j.cnki.jos.006918
    Abstract:
    Third-party library (TPL) detection is an upstream task in the domain of Android application security analysis, and its detection accuracy has a significant impact on its downstream tasks including malware detection, repackaged application detection, and privacy leakage detection. To improve detection accuracy and efficiency, this study proposes a package structure and signature-based TPL detection method, named LibPass, by leveraging the idea of pairwise comparison. LibPass combines primary module identification, TPL candidate identification, and fine-grained detection in a streamlined way. The primary module identification aims at improving detection efficiency by distinguishing the binary code of the main program from that of the introduced TPL. On this basis, a two-stage detection method consisting of TPL candidate identification and fine-grained detection is proposed. The TPL candidate identification leverages the stability of package structure features to deal with obfuscation of applications to improve detection accuracy and identifies candidate TPLs by rapidly comparing package structure signatures to reduce the number of pairwise comparisons, so as to improve the detection efficiency. The fine-grained detection accurately identifies the TPL of a specific version by a finer-grained but more costly pairwise comparison among candidate TPLs. In order to validate the performance and the efficiency of the detection method, three benchmark datasets are built to evaluate different detection capabilities, and experiments are conducted on these datasets. The experimental results are deeply analyzed in terms of detection performance, detection efficiency, and obfuscation resistance, and it is found that LibPass has high detection accuracy and efficiency and can deal with various common obfuscation operations.
    Available online:  July 26, 2023 , DOI: 10.13328/j.cnki.jos.006919
    Abstract:
    Memory error vulnerabilities (e.g., buffer overflow) are often caused by improper use of memory copy functions. The identification of memory copy functions in binary programs is beneficial for finding memory error vulnerabilities. However, current methods for identifying memory copy functions in binary programs mainly rely on static analysis to extract functions’ features, control flow, data flow, and other information, with a high false positive and false negative. This study proposes a novel technique, namely CPSeeker, based on hybrid static and dynamic analysis to improve the effectiveness of identifying memory copy functions. CPSeeker combines the advantages of static analysis and dynamic analysis, collects the global static information and local execution information of functions in stages, and fuses the extracted information to identify memory copy functions in binary programs. The experimental results show that CPSeeker outperforms the state-of-the-art BootStomp, SaTC, CPYFinder, and Gemini in identifying memory copy functions, despite its increased runtime consumption, and its F1 value reaches 0.96. Furthermore, CPSeeker is not affected by the compilation environment (compiler version, compiler type, and compiler optimization level). In addition, CPSeeker has a better performance in actual firmware tests.
    Available online:  July 26, 2023 , DOI: 10.13328/j.cnki.jos.006920
    Abstract:
    The broad-learning-based dynamic fuzzy inference system (BL-DFIS) can automatically assemble simplified fuzzy rules and achieve high accuracy in classification tasks. However, when BL-DFIS works on large and complex datasets, it may generate too many fuzzy rules to achieve satisfactory identification accuracy, which adversely affects its interpretability. In order to circumvent such a bottleneck, a fuzzy neural network called feature-augmented random vector functional-link neural network (FA-RVFLNN) is proposed in this study to achieve excellent trade-off between classification performance and interpretability. In the proposed network, the RVFLNN with original data as input is taken as its primary structure, and BL-DFIS is taken as a performance supplement, which implies that FA-RVFLNN contains direct links to boost the performance of the whole system. The inference mechanism of the primary structure can be explained by a fuzzy logic operator (I-OR), owing to the use of Sigmoid activation functions in the enhancement nodes of this structure. Moreover, the original input data with clear meaning also help to explain the inference rules of the primary structure. With the support of direct links, FA-RVFLNN can learn more useful information through enhancement nodes, feature nodes, and fuzzy nodes. The experimental results indicate that FA-RVFLNN indeed eases the problem of rule explosion caused by excessive enhancement nodes in the primary structure and improves the interpretability of BL-DFIS therein (The average number of fuzzy rules is reduced by about 50%), and is still competitive in terms of generalization performance and network size.
    Available online:  July 26, 2023 , DOI: 10.13328/j.cnki.jos.006940
    Abstract:
    How to improve the accuracy of matching between natural language query input and highly structured programming language source code is a fundamental concern in code search. Accurate extraction of code features is one of the key challenges to improving matching accuracy. The semantics expressed by statements in codes is not only relevant to themselves but also to their contexts. The structural model of the code provides rich contextual information for understanding code functions. This study proposes a code search method based on function multigraph embedding. By using an early fusion strategy, the study fuses the data dependencies of code statements into a control flow graph and constructs a function multigraph to represent the code. The multigraph explicitly expresses the dependency relationships of indirect predecessor and successor nodes that are lacking in the control flow graph through data dependencies and enhances the contextual information of statement nodes. At the same time, in view of the edge heterogeneity of the multigraph, this study uses the relational graph convolutional network to extract the features of the code from the function multigraph. Experiments on a public dataset show that the proposed method can improve the MRR by more than 5% compared with the existing methods based on code text and structural models. The ablation experiments also show that the control flow graph contributes more to the search accuracy than the data dependence graph.
    Available online:  July 12, 2023 , DOI: 10.13328/j.cnki.jos.006909
    Abstract:
    With the popularity of touch devices, pen + touch inputs have become mainstream input modes for mobile officing. However, existing applications mainly take one of them as input, which limits users’ interaction space. In addition, existing pen + touch research mainly focuses on serial pen + touch cooperation and parallel processing of specific interactive tasks and does not systematically consider parallel cooperation mechanism and intention correlation between different inputs. This study first proposes an interaction model based on pen + touch inputs and then defines pen + touch interaction primitives according to users’ behavioral habits in pen + touch cooperation, so as to extend pen + touch interaction space. Furthermore, by using a partially observable Markov decision process (POMDP), the study develops a method of extracting pen + touch input intentions based on time sequence information, so as to incrementally extract the interaction intention of polysemantic interaction primitives. Finally, the study evaluates the advantages of pen + touch inputs through a user experiment.
    Available online:  July 12, 2023 , DOI: 10.13328/j.cnki.jos.006910
    Abstract:
    Code search is an important research topic in natural language processing and software engineering. Developing efficient code search algorithms can significantly improve the code reuse and the working efficiency of software developers. The task of code search is to retrieve code fragments that meet the requirements from the massive code repository by taking the natural language describing the function of the code fragments as input. Although the sequence model-based code search method, namely DeepCS has achieved promising results, it cannot capture the deep semantics of the code. GraphSearchNet, a code search method based on graph embedding, can alleviate this problem, but it does not perform fine-grained matching on codes and texts and ignores the global relationship between code graphs and text graphs. To address the above limitations, this study proposes a code search method based on a relational graph convolutional network, which encodes the constructed text graphs and code graphs, performs fine-grained matching on text query and code fragments at the node level, and applies neural tensor networks to capture their global relationship. Experimental results on two public datasets show that the proposed method achieves higher search accuracy than state-of-the-art baseline models, namely DeepCS and GraphSearchNet.
    Available online:  July 05, 2023 , DOI: 10.13328/j.cnki.jos.006901
    Abstract:
    Hybrid transactional/analytical processing (HTAP) database systems have gained extensive acknowledgment of users due to their full processing support of the mixed workloads in one system, i.e., transactions and analytical queries. Most HTAP database systems tend to maintain multiple data versions or additional replicas to accomplish online analytical processing (OLAP) without downgrading the write performance of online transactional processing (OLTP). This leads to a consistency problem between the data of TP and AP versions. Meanwhile, HTAP database systems face the core challenge of achieving efficient data sharing under resource isolation, and the data-sharing model integrates the trade-off between business requirements for performance and data freshness. To systematically explain the data-sharing model and optimization strategies of existing HTAP database systems, this study first utilizes the consistency models to define the data-sharing model and classify the consistency models for HTAP data sharing into three categories, namely, linear consistency, sequential consistency, and session consistency, according to the differences between TP generated versions and AP query versions. After that, it takes a deep dive into the whole process of data-sharing models from three core issues, i.e., data-version number distribution, data version synchronization, and data version tracking, and provides the implementation methods of different consistency models. Furthermore, this study takes a dozen of classic and popular HTAP database systems as examples for an in-depth interpretation of the implementation methods. Finally, it summarizes and analyzes the optimization strategies of version synchronization, tracking, and recycling modules involved in the data-sharing process and predicts the optimization directions of the data-sharing models. It is concluded that the self-adaptability of the data synchronization scope, self-tuning of the data synchronization cycle, and freshness-bound constraint control under sequential consistency are the possible means for better performance of HTAP database systems and higher freshness.
    Available online:  July 05, 2023 , DOI: 10.13328/j.cnki.jos.006812
    Abstract:
    Security bug reports (SBRs) can describe critical security vulnerabilities in software products. SBR prediction has attracted the increasing attention of researchers to eliminate security attack risks of software products. However, in actual software development scenarios, a new company or new project may need software security bug prediction, without enough marked SBRs for building SBR prediction models in practice. A simple solution is employing the migration model, which means that marked data of other projects can be adopted to build the prediction model. Inspired by two recent studies in this field, this study puts forward a cross-project SBR prediction method integrating knowledge graphs, i.e., knowledge graph of security bug report prediction (KG-SBRP), based on the idea of security keyword filtering. The text information field in SBR is combined with common weakness enumeration (CWE) and common vulnerabilities and exposures (CVE) Details to build a triple rule entity. Then the entity is utilized to build a knowledge graph of security bugs and identify SBRs by combining the entity and relationship recognition. Finally, the data is divided into training sets and test sets for model fitting and performance evaluation. The built model conducts empirical research on seven SBR datasets with different scales. The results show that compared with the current main methods FARSEC and Keyword matrix, the proposed method can increase the performance index F1-score by an average of 11% under cross-project SBR prediction scenarios. In addition, the F1-score value can also grow by an average of 30% in SBR prediction scenarios within a project.
    Available online:  July 05, 2023 , DOI: 10.13328/j.cnki.jos.006906
    Abstract:
    Software product line testing is challenging. The similarity-based testing method can improve testing coverage and fault detection rate by increasing the diversity of test suites. Due to its excellent scalability and satisfactory testing effects, the method has become one of the most important test methods for software product lines. How to generate diverse test cases and how to maintain the diversity of test suites are two key issues in this test method. To handle the above issues, this study proposes a software product line test algorithm based on diverse SAT solvers and novelty search (NS). Specifically, the algorithm simultaneously uses two types of diverse SAT solvers to generate diverse test cases. In particular, in order to improve the diversity of stochastic local search SAT solvers, the study proposes a general strategy that is based on a probability vector to generate candidate solutions. Furthermore, two archiving strategies inspired by the idea of the NS algorithm are designed and applied to maintain both the global and local diversity of the test suites. Ablation and comparison experiments on 50 real software product lines verify the effectiveness of both the diverse SAT solvers and the two archiving strategies, as well as the superiority of the proposed algorithm over other state-of-the-art algorithms.
    Available online:  July 05, 2023 , DOI: 10.13328/j.cnki.jos.006907
    Abstract:
    Business?process?execution language (BPEL) is an executable web service composition language. Compared with traditional programs, BPEL programs are significantly different in terms of programming models and execution modes. These new features make it challenging to locate and fix faults of BPEL programs detected during the testing process. In addition, fault fixing techniques developed for traditional software cannot be used for BPEL programs directly. This study proposes a fault fixing technique for BPEL programs based on template matching, namely BPELRepair from the perspective of mutation analysis. In order to overcome the high computational overhead of the mutation analysis-based fault fixing technique, a set of optimization strategies are proposed from three perspectives, namely patch generation, test case selection, and termination condition. A supporting tool is developed to improve the automation and efficiency of fault fixing for BPEL programs. An empirical study is used to evaluate the effectiveness of the proposed fault fixing technique and optimization strategies. The experimental results show that the proposed technique can successfully fix about 53% of faults of BPEL programs, and the proposed optimization strategies can significantly reduce the overhead in terms of search matching, patch program verification, test case execution, and fault fixing.
    Available online:  July 04, 2023 , DOI: 10.13328/j.cnki.jos.006836
    Abstract:
    Autonomous driving software based on deep neural networks (DNNs) has become the most popular solution. Like traditional software, DNN can also produce incorrect output or unexpected behaviors, and DNN-based autonomous driving software has caused serious accidents, which seriously threaten life and property safety. Therefore, how to effectively test DNN-based autonomous driving software has become an urgent problem. Since it is difficult to predict and understand the behavior of DNNs, traditional software testing methods are no longer applicable. Existing autonomous driving software testing methods are implemented byadding pixel-level perturbations to original images or modifying the whole image to generate test data. The generated test data are quite different from the real images, and the perturbation-based methods are difficult to be understood. To solve the above problem, this study proposes a test data generation method, namely interpretability analysis-based test data generation (IATG). Firstly, it uses the interpretation method for DNNs to generate visual explanations of decisions made by autonomous driving software and chooses objects in the original images that have significant impacts on the decisions. Then, it generates test data by replacing the chosen objects with other objects with the same semantics. The generated test data are more similar to the real image, and the process is more understandable. As an important part of the autonomous driving software’s decision-making module, the steering angle prediction model is used to conduct experiments. Experimental results show that the introduction of the interpretation method effectively enhances the ability of IATG to mislead the steering angle prediction model. Furthermore, when the misleading angle is the same, the test data generated by IATG are more similar to the real image than DeepTest; IATG has a stronger misleading ability than semSensFuzz, and the interpretation analysis based important object selection method of IATG can effectively improve the misleading ability of semSensFuzz.
    Available online:  June 28, 2023 , DOI: 10.13328/j.cnki.jos.006899
    Abstract:
    Knowledge space theory, which uses mathematical language for the knowledge evaluation and learning guide of learners, belongs to the research field of mathematical psychology. Skills and problems are the two basic elements of knowledge space, and an in-depth study of the relationship between them is the inherent requirement of knowledge state description and knowledge structure analysis. In the existing knowledge space theory, no explicit bi-directional mapping between skills and problems has been established, which makes it difficult to put forward a knowledge structure analysis model under intuitive conceptual meanings. Moreover, the partial order relationship between knowledge states has not been clearly obtained, which is not conducive to depicting the differences between knowledge states and planning the learning path of learners. In addition, the existing achievements mainly focus on the classical knowledge space, without considering the uncertainties of data in practical problems. To this end, this study introduces formal concept analysis and fuzzy sets into knowledge space theory and builds the fuzzy concept lattice models for knowledge structure analysis. Specifically, fuzzy concept lattice models of knowledge space and closure space are presented. Firstly, the fuzzy concept lattice of knowledge space is constructed, and it is proved that the extents of all concepts form a knowledge space by the upper bounds of any two concepts. The idea of granule description is introduced to define the skill-induced atomic granules of problems, whose combinations can help determine whether a combination of problems is a state in the knowledge space. On this basis, a method to obtain the fuzzy concepts in the knowledge space from the problem combinations is proposed. Secondly, the fuzzy concept lattice of closure space is established, and it is proved that the extents of all concepts form the closure space by the lower bounds of any two concepts. Similarly, the problem-induced atomic granules of skills are defined, and their combinations can help determine whether a skill combination is the skills required by a knowledge state in the closure space. In this way, a method to obtain the fuzzy concepts in the closure space from the skill combinations is presented. Finally, the effects of the number of problems, the number of skills, the filling factor, and the analysis scale on the sizes of knowledge space and closure space are analyzed by some experiments. The results show that the fuzzy concepts in the knowledge space are different from any existing concept and cannot be derived from other concepts. The fuzzy concepts in the closure space are attribute-oriented one-sided fuzzy concepts in essence. In the formal context of two-valued skills, there is one-to-one correspondence between the states in knowledge space and closure space, but this relationship does not hold in the formal context of fuzzy skills.
    Available online:  June 28, 2023 , DOI: 10.13328/j.cnki.jos.006900
    Abstract:
    Log is an important carrier of a computer system, which records the states of events, and a log system is responsible for log generation, collection, and output. OpenHarmony is a new open-source, distributed operating system for smart devices in all scenarios of a fully-connected world. Prior to the work described in this study, many key subsystems of OpenHarmony, including the log system, had not been built. The open-source feature of OpenHarmony enables third-party developers to contribute core codes. To solve the problem of the lack of a log system of OpenHarmony, this paper mainly does the following work: ① It analyzes the technical architecture, advantages, and disadvantages of today’s popular log systems. ② It clarifies the model specifications of the log system HiLog according to the interconnection feature of heterogeneous devices in OpenHarmony. ③ It designs and implements the first log system HiLog of OpenHarmony and contributes it to the OpenHarmony trunk. ④ It conducts comparative experiments on the key indicators of HiLog. The experimental data show that in terms of basic performance, the throughput of HiLog and Log is 1500 KB/s and 700 KB/s, respectively, which indicates that HiLog has a 114% improvement over the log system of Android. In terms of log persistence, the packet loss of HiLog is less than 6‰ with a compression rate of 3.5% for persistency, much lower than that of Log. In addition, HiLog also has some novel practical functions such as data protection and flow control.
    Available online:  June 28, 2023 , DOI: 10.13328/j.cnki.jos.006895
    Abstract:
    The morphological changes in retina boundaries are important indicators of retinal diseases, and the subtle changes can be captured by images obtained by optical coherence tomography (OCT). The retinal layer boundary segmentation based on OCT images can assist in the clinical judgment of related diseases. In OCT images, due to the diverse morphological changes in retina boundaries, the key boundary-related information, such as contexts and saliency boundaries, is crucial to the judgment and segmentation of layer boundaries. However, existing segmentation methods lack the consideration of the above information, which results in incomplete and discontinuous boundaries. To solve the above problems, this study proposes a coarse-to-fine method for the segmentation of retinal layer boundary in OCT images based on the end-to-end deep neural networks and graph search (GS), which avoids the phenomenon of “faults” common in non-end-to-end methods. In coarse segmentation, the attention global residual network (AGR-Net), an end-to-end deep neural network, is proposed to extract the above key information in a more sufficient and effective way. Specifically, a global feature module (GFM) is designed to capture the global context information of OCT images by scanning from four directions of the images. After that, the channel attention module (CAM) and GFM are sequentially combined and embedded in the backbone network to realize saliency modeling of context information of the retina and its boundaries. This effort effectively solves the problem of wrong segmentation caused by retina deformation and insufficient information extraction in OCT images. In fine segmentation, a GS algorithm is adopted to remove isolated areas or holes from the coarse segmentation results obtained by AGR-Net. In this way, the boundary keeps a fixed topology, and it is continuous and smooth, which further optimizes the overall segmentation results and provides a more complete reference for medical clinical diagnosis. Finally, the performance of the proposed method is evaluated from different perspectives on two public datasets, and the method is compared with the latest methods. The comparative experiments show that the proposed method outperforms the existing methods in terms of segmentation accuracy and stability.
    Available online:  June 28, 2023 , DOI: 10.13328/j.cnki.jos.006893
    Abstract:
    Deep neural networks (DNNs) have made remarkable achievements in many fields, but related studies show that they are vulnerable to adversarial examples. The gradient-based attack is a popular adversarial attack and has attracted wide attention. This study investigates the relationship between gradient-based adversarial attacks and numerical methods for solving ordinary differential equations (ODEs). In addition, it proposes a new adversarial attack based on Runge-Kutta (RK) method, a numerical method for solving ODEs. According to the prediction idea in the RK method, perturbations are added to the original examples first to construct predicted examples, and then the gradients of the loss functions with respect to the original and predicted examples are linearly combined to determine the perturbations to be added for the generation of adversarial examples. Different from the existing adversarial attacks, the proposed adversarial attack employs the prediction idea of the RK method to obtain the future gradient information (i.e., the gradient of the loss function with respect to the predicted examples) and uses it to determine the adversarial perturbations to be added. The proposed attack features good extensibility and can be easily applied to all available gradient-based attacks. Extensive experiments demonstrate that in contrast to the state-of-the-art gradient-based attacks, the proposed RK-based attack boasts higher success rates and better transferability.
    Available online:  June 28, 2023 , DOI: 10.13328/j.cnki.jos.006829
    Abstract:
    Nowadays, deep neural networks (DNNs) have been widely used in various fields. However, research has shown that DNNs are vulnerable to attacks of adversarial examples (AEs), which seriously threaten the development and application of DNNs. Most of the existing adversarial defense methods need to sacrifice part of the original classification accuracy to obtain defense capability and strongly rely on the knowledge provided by the generated AEs, so they cannot balance the effectiveness and efficiency of defense. Therefore, based on manifold learning, this study proposes an origin hypothesis of AEs in attackable space from the feature space perspective and a trap-type ensemble adversarial defense network (Trap-Net). Trap-Net adds trap data to the training data based on the original model and uses the trap-type smoothing loss function to establish the seducing relationship between the target data and trap data, so as to generate trap-type networks. In order to address the problem that most adversarial defense methods sacrifice original classification accuracy, ensemble learning is used to ensemble multiple trap networks, so as to expand attackable target space defined by trap labels in the feature space and reduce the loss of the original classification accuracy. Finally, Trap-Net determines whether the input data are AEs by detecting whether the data hit the attackable target space. Experiments on MNIST, K-MNIST, F-MNIST, CIFAR-10, and CIFAR-100 datasets show that Trap-Net has strong defense generalization of AEs without sacrificing the classification accuracy of clean samples, and the results of experiments validate the adversarial origin hypothesis in attackable space. In the low-perturbation white-box attack scenario, Trap-Net achieves a detection rate of more than 85% for AEs. In the high-perturbation white-box attack and black-box attack scenarios, Trap-Net has a detection rate of almost 100% for AEs. Compared with other detection methods of AEs, Trap-Net is highly effective against white-box and black-box adversarial attacks, and it provides an efficient robustness optimization method for DNNs in adversarial environments.
    Available online:  June 28, 2023 , DOI: 10.13328/j.cnki.jos.006830
    Abstract:
    Dynamic memory allocators are fundamental components of modern applications. They manage free memory and handle user memory requests. Modern general-purpose dynamic memory allocators ensure the balance of performance and memory footprint. However, in view of different memory footprints and optimization goals in application scenarios, a general-purpose memory allocator is not the optimal solution. Special-purpose memory allocators for specific application scenarios usually can better satisfy system requirements. However, they are time-consuming and error-prone to implement. Developers often use the memory allocation framework to build special-purpose dynamic memory allocators. However, the existing memory allocator framework has the problems of poor abstraction ability and insufficient composability and customizability. For this reason, this study proposes a composable and customizable dynamic memory allocator framework, namely mortise, based on function composability by reviewing the dynamic memory allocation process from the perspective of functional programming. The framework abstracts system memory allocation as a composition of hierarchical functions of several multiple decoupled memory allocations, and these functions can provide policies to ensure higher customizability and composability. Mortise is implemented by using standard C. To achieve zero performance overhead of hierarchical function composition, mortise uses the metaprogramming features offered by the C preprocessor. Developers can quickly build a memory allocator for targeted application scenarios by composing and customizing the hierarchical function of allocators. In order to prove the effectiveness of mortise, this study presents three different memory allocator instances, namely tlsfcc, hslab, and wfslab, by using mortise. Specifically, tlsfcc is designed for multi-core embedded application scenarios, which improves the parallel throughput by replacing the synchronization strategy; hslab is a core-aware slab-type allocator, which optimizes performance on heterogeneous hardware by customizing thread cache; wfslab is a low-latency and wait-free/lock-free allocator. This study runs benchmarks to compare these allocators with several existing memory allocators. The experiments are carried out on an 8-core x86/64 platform and an 8-core heterogeneous aarch64 embedded platform, and the experimental results show that tlsfcc achieves a mean speedup of 1.76 and 1.59 on the two platforms compared with the original tlsf allocator; hlsab achieves only 69.6% and 85.0% execution time compared with the tcmalloc with a similar architecture; the worst-case memory request latency of wfslab is the smallest among all memory allocators in the experiment, including the state-of-art lock-free memory allocators: mimalloc and snmalloc.
    Available online:  June 14, 2023 , DOI: 10.13328/j.cnki.jos.006831
    Abstract:
    Spoken language understanding (SLU), as a core component of task-oriented dialogue systems, aims to extract the semantic framework of user queries. In dialogue systems, the SLU component is responsible for identifying user requests and creating a semantic framework that summarizes user requests. SLU usually includes two subtasks: intent detection (ID) and slot filling (SF). ID is regarded as a semantic utterance classification problem that analyzes the semantics of utterance at the sentence level, while SF is viewed as a sequence labeling task that analyzes the semantics of utterance at the word level. Due to the close correlation between intentions and slots, mainstream works employ joint models to exploit shared knowledge across tasks. However, ID and SF are two different tasks with strong correlation, and they represent sentence-level semantic information and word-level information of utterances respectively, which means that the information of the two tasks is heterogeneous and has different granularities. This study proposes a heterogeneous interactive structure for joint ID and SF, which adequately captures the relationship between sentence-level semantic information and word-level information in heterogeneous information for two correlative tasks by adopting self-attention and graph attention networks. Different from ordinary homogeneous structures, the proposed model is a heterogeneous graph architecture containing different types of nodes and links because a heterogeneous graph involves more comprehensive information and rich semantics and can better interactively represent the information between nodes with different granularities. In addition, this study utilizes a window mechanism to accurately represent word-level embedding to better accommodate the local continuity of slot labels. Meanwhile, the study uses a pre-trained model (BERT) to analyze the effect of the proposed model using BERT. The experimental results of the proposed model on two public datasets show that the model achieves an accuracy of 97.98% and 99.11% on the ID task and an F1 score of 96.10% and 96.11% on the SF task, which are superior to the current mainstream methods.
    Available online:  June 14, 2023 , DOI: 10.13328/j.cnki.jos.006833
    Abstract:
    In recent years, RGB-D salient detection method has achieved better performance than RGB salient detection model by virtue of its rich geometric structure and spatial position information in depth maps and thus has been highly concerned by the academic community. However, the existing RGB-D detection model still faces the challenge of improving performance continuously. The emerging Transformer is good at modeling global information, while the convolutional neural network (CNN) is good at extracting local details. Therefore, effectively combining the advantages of CNN and Transformer to mine global and local information will help to improve the accuracy of salient object detection. For this purpose, an RGB-D salient object detection method based on cross-modal interactive fusion and global awareness is proposed in this study. The transformer network is embedded into U-Net to better extract features by combining the global attention mechanism with local convolution. First, with the help of the U-Net encoder-decoder structure, this study efficiently extracts multi-level complementary features and decodes them step by step to generate a salient feature map. Then, the Transformer module is used to learn the global dependency between high-level features to enhance the feature representation, and the progressive upsampling fusion strategy is used to process the input and reduce the introduction of noise information. Moreover, to reduce the negative impact of low-quality depth maps, the study also designs a cross-modal interactive fusion module to realize cross-modal feature fusion. Finally, experimental results on five benchmark datasets show that the proposed algorithm has an excellent performance than other latest algorithms.
    Available online:  June 14, 2023 , DOI: 10.13328/j.cnki.jos.006825
    Abstract:
    A social law is a set of restrictions on the available actions of agents to establish some target properties in a multiagent system. In the strategic case, where the agents have individual rationality and private information, the social law synthesizing problem should be modeled as an algorithmic mechanism design problem instead of a common optimization problem. Minimal side effect is usually a basic requirement for social laws. From the perspective of game theory, minimal side effect closely relates to the concept of maximum social welfare, and synthesizing a social law with minimal side effect can be modeled as an efficient mechanism design problem. Therefore, this study not only needs to find out the efficient social laws with maximum social welfare for the given target property but also pays for the agents to induce incentive compatibility and individual rationality. The study first designs an efficient mechanism based on the VCG mechanism, namely VCG-SLM, and proves that it satisfies all the required formal properties. However, as the computation of VCG-SLM is an FPNP-complete problem, the study proposes an ILP-based implementation of this mechanism (VCG-SLM-ILP), transforms the computation of allocation and payment to ILPs based on the semantics of ATL, and strictly proves its correction, so as to effectively utilize the currently mature industrial-grade integer programming solver and successfully solve the intractable mechanism computing problems.
    Available online:  June 14, 2023 , DOI: 10.13328/j.cnki.jos.006819
    Abstract:
    Federated learning is an effective method to solve the problem of data silos. When the server calculates all gradients, incorrect calculation of global gradients exists due to the inertia and self-interest of the server, so it is necessary to verify the integrity of global gradients. The existing schemes based on cryptographic algorithms are overspending on verification. To solve these problems, this study proposes a rational and verifiable federated learning framework. Firstly, according to game theory, the prisoner contract and betrayal contract are designed to force the server to be honest. Secondly, the scheme uses a replication-based verification scheme to verify the integrity of the global gradient and supports the offline client side. Finally, the analysis proves the correctness of the scheme, and the experiments show that compared with the existing verification algorithms, the proposed scheme reduces the computing overhead of the client side to zero, the number of communication rounds in one iteration is optimized from three to two, and the training overhead is inversely proportional to the offline rate of the client side
    Available online:  June 14, 2023 , DOI: 10.13328/j.cnki.jos.006896
    Abstract:
    The heterogeneous many-core architecture with an ultra-high energy efficiency ratio has become an important development trend of supercomputer architecture. However, the complexity of heterogeneous systems puts forward higher requirements for application development and optimization, and they face many technical challenges such as usability and programmability in the development process. The independently developed new-generation Sunway supercomputer is equipped with a homegrown heterogeneous many-core processor, SW26010Pro. To take full advantage of the performance of the new-generation many-core processors and support the development and optimization of emerging scientific computing applications, this study designs and implements an optimized compiler swLLVM oriented to the SW26010Pro platform. The compiler supports Athread and SDAA dual-mode heterogeneous programming models and provides multi-level storage hierarchy description and SIMD extensions for vector-like operations. In addition, it realizes control-flow vectorization, cost-based node combination, and compiler optimization for multi-level storage hierarchy according to the architecture characteristics of SW26010Pro. The experimental results show that the compiler optimization designed and implemented in this paper achieves significant performance improvements. The average speedup of control-flow vectorization and node combination and optimization is 1.23 and 1.11, respectively, and the memory access optimization achieves a maximum performance improvement of 2.49 times. Finally, a comprehensive evaluation of swLLVM is performed from multiple dimensions on the standard test set SPEC CPU2006. The results show that swLLVM reports an average increase of 9.04% in the performance of floating-point projects, 5.25% in overall performance, and 79.1% in compilation speed and an average decline of 0.12% in the performance of integer projects and 1.15% in the code size compared to SWGCC with the same optimization level.
    Available online:  June 07, 2023 , DOI: 10.13328/j.cnki.jos.006817
    Abstract:
    The ranking function method is the main method for the termination analysis of loops, and it indicates that loop programs can be terminated. In view of single-path linear constraint loop programs, this study presents a new method to analyze the termination of the loops. Based on the calculation of the normal space of the increasing function, this method considers the calculation of the ranking function in the original program space as that in the subspace. Experimental results show that the method can effectively verify the termination of most loop programs in the existing literature.
    Available online:  June 07, 2023 , DOI: 10.13328/j.cnki.jos.006897
    Abstract:
    Multi-behavior recommendation aims to utilize interactive data from multiple behaviors of users to improve recommendation performance. Existing multi-behavior recommendation methods generally directly exploit the multi-behavior data for the shared initialized user representations and involve the mining of user preferences and modeling of relationships among different behaviors in the tasks. However, these methods ignore the data imbalance under different interactive behaviors (the amount of interactive data varies greatly among different behaviors) and the information loss caused by the adaptation to the above two tasks. User preferences refer to the interests that users exhibit in different behaviors (e.g., browsing preferences), and the relationship among behaviors indicates a potential conversion from one behavior to another behavior (e.g., the conversion from browsing to purchasing). In multi-behavior recommendation, the mining of user preferences and the modeling of relationships among different behaviors can be regarded as a two-stage task. On the basis of the above considerations, the model of two-stage learning for multi-behavior recommendation (TSL-MBR for short) is proposed, which decouples the above two tasks with a two-stage strategy. In particular, the model retains the end-to-end structure and learns the two tasks by alternating training with fixed parameters. The first stage is to model user preferences under different behaviors. In this stage, the interactive data from all behaviors (without distinction as to behavior type) are first used to model the global preferences of users to alleviate the problem of data sparsity to the greatest extent. Then, the interactive data of each behavior are used to refine the behavior-specific user preference (local preference) and thus lessen the influence of the data imbalance among different behaviors. The second stage is to model the relationships among different behaviors. In this stage, the mining of user preferences and modeling of relationships among different behaviors are decoupled to relieve the information loss problem caused by adaptation to the two tasks. This two-stage model significantly improves the system’s ability to predict target behaviors. Extensive experimental results show that TSL-MBR can substantially outperform the state-of-the-art baseline models, achieving 103.01% and 33.87% of relative gains on average over the best baseline on the Tmall and Beibei datasets, respectively.
    Available online:  May 24, 2023 , DOI: 10.13328/j.cnki.jos.006894
    Abstract:
    Deep learning has achieved great success in image classification, natural language processing, and speech recognition. Data augmentation can effectively increase the scale and diversity of training data, thereby improving the generalization of deep learning models. However, for a given dataset, a well-designed data augmentation strategy relies heavily on expert experience and domain knowledge and requires repeated attempts, which is time-consuming and labor-intensive. In recent years, automated data augmentation has attracted widespread attention from the academic community and the industry through the automated design of data augmentation strategies. To solve the problem that existing automated data augmentation algorithms cannot strike a good balance between prediction accuracy and search efficiency, this study proposes an efficient automated data augmentation algorithm SGES AA based on a self-guided evolution strategy. First, an effective continuous vector representation method is designed for the data augmentation strategy, and then the automated data augmentation problem is converted into a search problem of continuous strategy vectors. Second, a strategy vector search method based on the self-guided evolution strategy is presented. By introducing historical estimation gradient information to guide the sampling and updating of exploration points, it can effectively avoid the local optimal solution while improving the convergence of the search process. The results of extensive experiments on image, text, and speech datasets show that the proposed algorithm is superior to or matches the current optimal automated data augmentation methods without significantly increasing the time consumption of searches.
    Available online:  May 24, 2023 , DOI: 10.13328/j.cnki.jos.006813
    Abstract:
    Migrating from monolithic systems to microservice systems is one of the mainstream options for the industry to realize the reengineering of legacy systems, and microservice architecture refactoring based on monolithic legacy systems is the key to realizing migration. Currently, academia mainly focuses on the research on microservice identification methods, and there are many industry practices of legacy systems refactored into microservices. However, systematic approaches and efficient and robust tools are insufficient. Therefore, based on earlier research on microservices identification and model-driven development method, this study presents MSA-Lab, an integrated design platform for microservice refactoring of monolithic legacy systems based on the model-driven development approach. MSA-Lab analyzes the method call sequence in the running log of the monolithic legacy system, identifies and clusters classes and data tables for constructing abstract microservices, and generates a system architecture design model including the microservice diagram and microservice sequence diagram. The model has two core components: MSA-Generator for automatic microservice identification and design model generation and MSA-Modeller for visualization, interactive modeling, and model syntax constraint checking of microservice static structure and dynamic behavior models. This study conducts experiments in the MSA-Lab platform for effectiveness, robustness, and function transformation completeness on four open-source projects and carries out performance comparison experiments with three same-type tools. The results show that the platform has excellent effectiveness and robustness, function transform completeness for running logs, and superior performance.
    Available online:  May 17, 2023 , DOI: 10.13328/j.cnki.jos.006826
    Abstract:
    Recently, with the popularity of ubiquitous computing, intelligent sensing technology has become the focus of researchers, and non-contact sensing based on WiFi is more and more popular in academia and industry because of its excellent generality, low deployment cost, and great user experience. The typical non-contact sensing work based on WiFi includes gesture recognition, breath detection, intrusion detection, behavior recognition, etc. For real-life deployment of these works, one of the major challenges is to avoid the interference of irrelevant behaviors in other irrelevant areas, so it is necessary to judge whether the target is in a specific sensing area or not, which means that the system should be able to determine exactly which side of the boundary line the target is on. However, the existing work cannot find a way to accurately monitor a freely set boundary, which hinders the actual implementation of WiFi-based sensing applications. In order to solve this problem, based on the physical essence of electromagnetic wave diffraction and the Fresnel diffraction model, this study finds a signal feature, namely Rayleigh distribution in Fresnel diffraction model (RFD), when the target passes through the link (the line between the WiFi receiver and transmitter antennas) and reveals the mathematical relationship between the signal feature and human activity. Then, the study realizes a boundary monitoring algorithm through line crossing detection by using the link as the boundary and considering the waveform delay caused by antenna spacing and the features of automatic?gain?control (AGC) when the link is blocked. On this basis, the study also implements two practical applications, that is, intrusion detection system and home state detection system. The intrusion detection system achieves a precision of more than 89% and a recall rate of more than 91%, while the home state detection system achieves an accuracy of more than 89%. While verifying the availability and robustness of the boundary monitoring algorithm, the study also shows the great potential of combining the proposed method with other WiFi-based sensing technologies and provides a direction for the actual deployment of WiFi-based sensing technologies.
    Available online:  May 17, 2023 , DOI: 10.13328/j.cnki.jos.006821
    Abstract:
    As challenges such as serious occlusions and deformations coexist, video segmentation with accurate robustness has become one of the hot topics in computer vision. This study proposes a video segmentation method with absorbing Markov chains and skeleton mapping, which progressively produces accurate object contours through the process of pre-segmentation—optimization—improvement. In the phase of pre-segmentation, based on the twin network and the region proposal network, the study obtains regions of interest for objects, constructs the absorbing Markov chains of superpixels in these regions, and calculates the labels of foreground/background of the superpixels. The absorbing Markov chains can perceive and propagate the object features flexibly and effectively and preliminarily pre-segment the target object from the complex scene. In the phase of optimization, the study designs the short-term and long-term spatial-temporal cue models to obtain the short-term variation and the long-term feature of the object, so as to optimize superpixel labels and reduce errors caused by similar objects and noise. In the phase of improvement, to reduce the artifacts and discontinuities of optimization results, this study proposes an automatic generation algorithm for foreground/background skeleton based on superpixel labels and positions and constructs a skeleton mapping network based on encoding and decoding, so as to learn the pixel-level object contour and finally obtain accurate video segmentation results. Many experiments on standard datasets show that the proposed method is superior to the existing mainstream video segmentation methods and can produce segmentation results with higher region similarity and contour accuracy.
    Available online:  May 17, 2023 , DOI: 10.13328/j.cnki.jos.006814
    Abstract:
    Efficient mobile charging scheduling is a key technology to build wireless rechargeable sensor networks (WRSN) which have long life cycle and sustainable operation ability. The existing charging methods based on reinforcement learning only consider the spatial dimension of mobile charging scheduling, i.e., the path planning of mobile chargers (MCs), while leaving out the temporal dimension of the problem, i.e., the adjustment of the charging duration, and thus these methods have suffered some performance limitations. This study proposes a dynamic spatiotemporal charging scheduling scheme based on deep reinforcement learning (SCSD) and establishes a deep reinforcement learning model for dynamic adjustment of charging sequence scheduling and charging duration. In view of the discrete charging sequence planning and continuous charging duration adjustment in mobile charging scheduling, the study uses DQN to optimize the charging sequence for nodes to be charged and calculates and dynamically adjusts the charging duration of the nodes. By optimizing the two dimensions of space and time respectively, the SCSD proposed in this study can effectively improve the charging performance while avoiding the power failure of nodes. Simulation experiments show that SCSD has significant performance advantages over several well-known typical charging schemes.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006815
    Abstract:
    With the development of deep learning and steganography, deep neural networks are widely used in image steganography, especially in a new research direction, namely embedding an image message in an image. The mainstream steganography of embedding an image message in an image based on deep neural networks requires cover images and secret images to be input into a steganographic model to generate stego-images. But recent studies have demonstrated that the steganographic model only needs secret images as input, and then the output secret perturbation is added to cover images, so as to embed secret images. This novel embedding method that does not rely on cover images greatly expands the application scenarios of steganography and realizes the universality of steganography. However, this method currently only verifies the feasibility of embedding and recovering secret images, and the more important evaluation criterion for steganography, namely concealment, has not been considered and verified. This study proposes a high-capacity universal steganography generative adversarial network (USGAN) model based on an attention mechanism. By using the attention module, the USGAN encoder can adjust the perturbation intensity distribution of the pixel position on the channel dimension in the secret image, thereby reducing the influence of the secret perturbation on the cover images. In addition, in this study, the CNN-based steganalyzer is used as the target model of USGAN, and the encoder learns to generate a secret adversarial perturbation through adversarial training with the target model so that the stego-image can become an adversarial example for attacking the steganalyzer at the same time. The experimental results show that the proposed model can not only realize a universal embedding method that does not rely on cover images but also further improves the concealment of steganography.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006816
    Abstract:
    How brains realize learning and perception is an essential question for both artificial intelligence and neuroscience communities. Since the existing artificial neural networks (ANNs) are different from the real brain in terms of structures and computing mechanisms, they cannot be directly used to explore the mechanisms of learning and dealing with perceptual tasks in the real brain. The dendritic neuron model is a computational model to model and simulate the information processing process of neuron dendrites in the brain and is closer to biological reality than ANNs. The use of the dendritic neural network model to deal with and learn perceptual tasks plays an important role in understanding the learning process in the real brain. However, current learning models based on dendritic neural networks mainly focus on simplified dendritic models and are unable to model the entire signal-processing mechanisms of dendrites. To solve this problem, this study proposes a learning model of the biophysically detailed neural network of medium spiny neurons (MSNs). The neural network can fulfill corresponding perceptual tasks through learning. Experimental results show that the proposed model can achieve high performance on the classical image classification task. In addition, the neural network shows strong robustness under noise interference. By further analyzing the network features, this study finds that the neurons in the network after learning show stimulus selectivity, which is a classical phenomenon in neuroscience. This indicates that the proposed model is biologically plausible and implies that stimulus selectivity is an essential property of the brain in fulfilling perceptual tasks through learning.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006820
    Abstract:
    In large-scale and complex software systems, requirement analysis and generation are accomplished through a top-down process, and the construction of tracking relationships between cross-level requirements is very important for project management, development, and evolution. The loosely-coupled contribution approach of open-source systems requires each participant to easily understand the context and state of the requirements, which relies on cross-level requirement tracking. The issue description log is a common way of presenting requirements in open-source systems. It has no fixed template, and its content is diverse (including text, code, and debugging information). Furthermore, the terms can be freely used, and the gap in abstraction level between cross-level requirements is large, which brings great challenges to automatic tracking. In this paper, a correlation feedback method for key feature dimensions is proposed. Through static analysis of the project’s code structure, code-related terms and their correlation strength are extracted, and a code vocabulary base is constructed to alleviate the gap in abstraction level and the inconsistency of terminology between cross-level requirements. By measuring the importance of terms to requirement description and screening key feature dimensions on this basis, the inquiry statement is optimized to effectively reduce the noise of requirement description length, content form, and other aspects. Experiments with two scenarios on three open-source systems suggest that the proposed method outperforms baseline approaches in cross-level requirement tracking and improves F2 value to 29.01%, 7.75.1%, and 59,21% compared with vector space model (VSM), standard Rocchio, and trace bidirectional encoder representations from transformers (BERT), respectively.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006811
    Abstract:
    Basic linear algebra subprogram (BLAS) is one of the most basic and important math libraries. The matrix-matrix operations covered in the level-3 BLAS functions are particularly significant for a standard BLAS library and are widely employed in many large-scale scientific and engineering computing applications. Additionally, level-3 BLAS functions are computing intensive functions and play a vital role in fully exploiting the computing performance of processors. Multi-core parallel optimization technologies are studied for level-3 BLAS functions on SW26010-Pro, a domestic processor. According to the memory hierarchy of SW26010-Pro, this study designs a multi-level blocking algorithm to exploit the parallelism of matrix operations. Then, a data-sharing scheme based on remote memory access (RMA) mechanism is proposed to improve the data transmission efficiency among CPEs. Additionally, it employs triple buffering and parameter tuning to fully optimize the algorithm and hide the memory access costs of direct memory access (DMA) and the communication overhead of RMA. Besides, the study adopts two hardware pipelines and several vectorized arithmetic/memory access instructions of SW26010-Pro and improves the floating-point computing efficiency of level-3 BLAS functions by writing assembly code manually for matrix-matrix multiplication, matrix equation solving, and matrix transposition. The experimental results show that level-3 BLAS functions can significantly improve the performance on SW26010-Pro by leveraging the proposed parallel optimization. The floating-point computing efficiency of single-core level-3 BLAS is up to 92% of the peak performance, while that of multi-core level-3 BLAS is up to 88% of the peak performance.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006824
    Abstract:
    Remaining process time prediction is important for preventing and intervening in abnormal business operations. For predicting the remaining time, existing approaches have achieved high accuracy through deep learning techniques. However, most of these techniques involve complex model structures, and the prediction results are difficult to be explained, namely, unexplainable issues. In addition, the prediction of the remaining time usually uses the key attribute, namely activity, or selects several other attributes as the input features of the predicted model according to the domain knowledge. However, a general feature selection method is missing, which may affect both prediction accuracy and model explainability. To tackle these two challenges, this study introduces a remaining process time prediction framework based on an explainable feature-based hierarchical (EFH) model. Specifically, a feature self-selection strategy is first proposed, and the attributes that have a positive impact on the prediction task are obtained as the input features of the model through the backward feature deletion based on priority and the forward feature selection based on feature importance. Then an EFH model is proposed. The prediction results of each layer are obtained by adding different features layer by layer, so as to explain the relationship between input features and prediction results. The study also uses the light gradient boosting machine (LightGBM) and long short-term memory (LSTM) algorithms to implement the proposed approach, and the framework is general and not limited to the algorithms selected in this study. Finally, the proposed approach is compared with other methods on eight real-life event logs. The experimental results show that the proposed approach can select effective features and improve prediction accuracy. In addition, the prediction results are explained.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006801
    Abstract:
    The Olympic heritage is the treasure of the world. The integration of technology, culture, and art is crucial to the diversified presentation and efficient dissemination of the heritage of the Beijing Winter Olympics. As an important trend form of digital museums in the information era, online exhibition halls lay a good foundation in the research on individual digital museums and interactive technologies, but so far, no systematic, intelligent, interactive, and friendly system of the Winter Olympics digital museum has been built. This study proposes an online exhibition hall construction method with interactive feedback for the Beijing 2022 Winter Olympics. By constructing an interactive exhibition hall with intelligent virtual agent, it has further explored the role of interactive feedback in disseminating intangible cultural heritage in a knowledge dissemination-based digital museum. To explore the influence of audio-visual interactive feedback on spreading Olympic spiritual culture in the exhibition hall and improve the user experience, the study conducts a user experiment with 32 participants. The results show that the constructed exhibition hall can greatly promote the dissemination of Olympic culture and spirit, and the introduction of audio-visual interactive feedback in the exhibition hall can improve users’ perceptual control, thereby improving the user experience.
    Available online:  April 26, 2023 , DOI: 10.13328/j.cnki.jos.006805
    Abstract:
    The uncertainty of tasks in mobile edge computing scenarios makes task offloading and resource allocation more complex and difficult. Therefore, a continuous offloading and resource allocation method of uncertain tasks in mobile edge computing is proposed. Firstly, a continuous offloading model of uncertain tasks in mobile edge computing is built, and the multi-batch processing technology based on duration slice partition is employed to address task uncertainty. A multi-device computing resource coordination mechanism is designed to improve the carrying capacity of computation-intensive tasks. Secondly, an adaptive strategy selection algorithm based on load balancing is put forward to avoid channel congestion and additional energy consumption caused by the over-allocation of computing resources. Finally, the uncertain task scenario model is simulated based on Poisson distribution, and experimental results show that the reduction of time slice length can reduce the total energy consumption of the system. In addition, the proposed algorithm can achieve task offloading and resource allocation more effectively and can reduce energy consumption by up to 11.8% compared with comparison algorithms.
    Available online:  April 26, 2023 , DOI: 10.13328/j.cnki.jos.006807
    [Abstract] (539) [HTML] (0) [PDF 6.17 M] (1235)
    Abstract:
    Emotional dialogue technology focuses on the “emotional quotient” of conversational robots, aiming to give the robots the ability to observe, understand and express emotions as humans do. This technology can be seen as the intersection of emotional computing and dialogue technology, and can simultaneously consider the “intelligent quotient” and “emotional quotient” of conversational robots to realize spiritual companionship, emotional comfort, and psychological guidance for users. Combined with the characteristics of emotions in dialogues, this study provides a comprehensive analysis of emotional dialogue technology: 1) Three important technical points including emotion recognition, emotion management, and emotion expression in dialogue scenarios are shown, and the technology of emotional dialogues in multimodal scenarios is expanded. 2) This study presents the latest research progress on technology points related to emotional dialogues and summarizes the main challenges and possible solutions correspondingly. 3) Data resources for emotional dialogue technologies are introduced. 4) The difficulty and prospect of emotional dialogue technology are pointed out.
    Available online:  April 26, 2023 , DOI: 10.13328/j.cnki.jos.006809
    Abstract:
    In a hybrid cloud environment, enterprise business applications and data are often transferred across different cloud services. For complex and diversified cloud service environments, most hybrid cloud applications adopt access control policies made around only access subjects and adjust the policies manually, which cannot meet the fine-grained dynamic access control requirements at different stages of the data life cycle. This study proposes AHCAC, an adaptive access control method oriented to data life cycle in a hybrid cloud environment. Firstly, the the policy description idea based on key attributes are employed to unify the heterogeneous policies of the full life cycle of data under the hybrid cloud. Especially, the “stage” attribute is introduced to explicitly identify the life-cycle state of data, which is the basis for achieving fine-grained access control oriented to data life cycle. Secondly, in view of the similarity and consistency of access control policy with the same life-cycle stage, the policy distance is defined, and a hierarchical clustering algorithm based on the policy distance is proposed to construct the corresponding data access control policy in each life-cycle stage. Finally, when the life-cycle stage of data is changed, the adaptation and loading of policies of corresponding data stages in the policy evaluation are triggered through key attribute matching, which realizes the adaptive access control oriented to the data life cycle. This study also conducts experiments to verify the effectiveness and feasibility of the proposed method on OpenStack and open-source policy evaluation engine Balana.
    Available online:  December 30, 2022 , DOI: 10.13328/j.cnki.jos.006804
    [Abstract] (1103) [HTML] (0) [PDF 5.38 M] (1691)
    Abstract:
    Stochastic configuration network (SCN), as an emerging incremental neural network model, is different from other randomized neural network methods. It can configure the parameters of hidden layer nodes through supervision mechanisms, thereby ensuring the fast convergence performance of SCN. Due to the advantages of high learning efficiency, low human intervention, and strong generalization ability, SCN has attracted a large number of national and international scholars and developed rapidly since it was proposed in 2017. In this study, SCN research is summarized from the aspects of basic theories, typical algorithm variants, application fields, and future research directions of SCN. Firstly, the algorithm principles, universal approximation capacity, and advantages of SCN are analyzed theoretically. Secondly, typical variants of SCN are studied, such as DeepSCN, 2DSCN, Robust SCN, Ensemble SCN, Distributed SCN, Parallel SCN, and Regularized SCN. Then, the applications of SCN in different fields, including hardware implementation, computer vision, medical data analysis, fault detection and diagnosis, and system modeling and prediction are introduced. Finally, the development potential of SCN in convolutional neural network architectures, semi-supervised learning, unsupervised learning, multi-view learning, fuzzy neural network, and recurrent neural network is pointed out.
    Available online:  October 18, 2017 , DOI:
    [Abstract] (2857) [HTML] (0) [PDF 525.21 K] (4901)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,315-325页. 原文链接如下:https://doi.org/10.1145/3106237.3106242, 读者如需引用该文请标引原文出处。
    Available online:  October 18, 2017 , DOI:
    [Abstract] (2805) [HTML] (0) [PDF 352.38 K] (5979)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,303-314页. 原文链接如下:https://doi.org/10.1145/3106237.3106239, 读者如需引用该文请标引原文出处。
    Available online:  September 11, 2017 , DOI:
    [Abstract] (3318) [HTML] (0) [PDF 276.42 K] (3122)
    Abstract:
    GitHub, a popular social-software-development platform, has fostered a variety of software ecosystems where projects depend on one another and practitioners interact with each other. Projects within an ecosystem often have complex inter-dependencies that impose new challenges in bug reporting and fixing. In this paper, we conduct an empirical study on cross-project correlated bugs, i.e., causally related bugs reported to different projects, focusing on two aspects: 1) how developers track the root causes across projects; and 2) how the downstream developers coordinate to deal with upstream bugs. Through manual inspection of bug reports collected from the scientific Python ecosystem and an online survey with developers, this study reveals the common practices of developers and the various factors in fixing cross-project bugs. These findings provide implications for future software bug analysis in the scope of ecosystem, as well as shed light on the requirements of issue trackers for such bugs.
    Available online:  June 21, 2017 , DOI:
    [Abstract] (3363) [HTML] (0) [PDF 169.43 K] (3154)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在IEEE Transactions on Software Engineering 2017 已录用待发表. 原文链接如下:http://ieeexplore.ieee.org/document/7792694, 读者如需引用该文请标引原文出处。
    Available online:  June 13, 2017 , DOI:
    [Abstract] (4560) [HTML] (0) [PDF 174.91 K] (3569)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 39th International Conference on Software Engineering, Pages 27-37, Buenos Aires, Argentina — May 20 - 28, 2017, IEEE Press Piscataway, NJ, USA ?2017, ISBN: 978-1-5386-3868-2 原文链接如下:http://dl.acm.org/citation.cfm?id=3097373, 读者如需引用该文请标引原文出处。
    Available online:  January 25, 2017 , DOI:
    [Abstract] (3444) [HTML] (0) [PDF 254.98 K] (2958)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 871-882. DOI: https://doi.org/10.1145/2950290.2950364 原文链接如下:http://dl.acm.org/citation.cfm?id=2950364, 读者如需引用该文请标引原文出处。
    Available online:  January 18, 2017 , DOI:
    [Abstract] (3905) [HTML] (0) [PDF 472.29 K] (2963)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Pages 133—143, Seattle WA, USA, November 2016. 原文链接如下:http://dl.acm.org/citation.cfm?id=2950327, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017 , DOI:
    [Abstract] (3664) [HTML] (0) [PDF 293.93 K] (2722)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE'16), 810 – 821, November 13 - 18, 2016. 原文链接如下:https://doi.org/10.1145/2950290.2950310, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017 , DOI:
    [Abstract] (4019) [HTML] (0) [PDF 244.61 K] (3185)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE 2016, 原文链接如下:http://dl.acm.org/citation.cfm?doid=2950290.2950313, 读者如需引用该文请标引原文出处。
    Available online:  December 12, 2016 , DOI:
    [Abstract] (3546) [HTML] (0) [PDF 358.69 K] (3128)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE'16会议上Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 原文链接如下:http://dl.acm.org/citation.cfm?id=2950340, 读者如需引用该文请标引原文出处。
    Available online:  September 30, 2016 , DOI:
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在ASE2016会议上http://ase2016.org/ 原文链接如下:http://dl.acm.org/citation.cfm?id=2970366 读者如需引用该文请标引原文出处。
    Available online:  September 09, 2016 , DOI:
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 俊杰的文章发表在ASE2016会议上,http://ase2016.org/。 原文链接如下:http://dl.acm.org/citation.cfm?doid=2970276.2970300 请读者标引时请引注原文出处。
    Available online:  September 07, 2016 , DOI:
    Abstract:
    CCF 软件工程专业委员会白晓颖教授(清华大学)推荐。 原文发表在 ASE 2016 Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering。 全文链接:http://dx.doi.org/10.1145/2970276.2970307。 重要提示:读者如引用该文时请标注原文出处。
    Available online:  August 29, 2016 , DOI:
    Abstract:
    CCF软件工程专业委员会白晓颖教授(清华大学)推荐。 该论文发表在ACM Transactions on Software Engineering and Methodology (TOSEM, Vol. 25, No. 2, Article 13, May 2016),被ICSE 2016主会邀请为“Journal first”报告, 全文参见http://dl.acm.org/citation.cfm?id=2876443。 论文作者包括北京大学的周明辉,马秀娟,张路和梅宏,以及田纳西大学的Audris Mockus。 重要提示:读者如引用该文时请标注原文出处。
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291, DOI:
    [Abstract] (36972) [HTML] (0) [PDF 832.28 K] (79314)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2010,21(3):427-437, DOI:
    [Abstract] (32799) [HTML] (0) [PDF 308.76 K] (37857)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2011,22(1):71-83, DOI:10.3724/SP.J.1001.2011.03958
    [Abstract] (29713) [HTML] (0) [PDF 781.42 K] (54189)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2016,27(1):45-71, DOI:10.13328/j.cnki.jos.004914
    [Abstract] (28829) [HTML] (2201) [PDF 880.96 K] (30158)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2009,20(5):1337-1348, DOI:
    [Abstract] (27956) [HTML] (0) [PDF 1.06 M] (44055)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2008,19(1):48-61, DOI:
    [Abstract] (27909) [HTML] (0) [PDF 671.39 K] (60675)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2009,20(2):271-289, DOI:
    [Abstract] (26847) [HTML] (0) [PDF 675.56 K] (42305)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2005,16(1):1-7, DOI:
    [Abstract] (22042) [HTML] (0) [PDF 614.61 K] (20177)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2004,15(3):428-442, DOI:
    [Abstract] (20487) [HTML] (0) [PDF 1009.57 K] (16324)
    Abstract:
    With the rapid development of e-business, web applications based on the Web are developed from localization to globalization, from B2C(business-to-customer) to B2B(business-to-business), from centralized fashion to decentralized fashion. Web service is a new application model for decentralized computing, and it is also an effective mechanism for the data and service integration on the web. Thus, web service has become a solution to e-business. It is important and necessary to carry out the research on the new architecture of web services, on the combinations with other good techniques, and on the integration of services. In this paper, a survey presents on various aspects of the research of web services from the basic concepts to the principal research problems and the underlying techniques, including data integration in web services, web service composition, semantic web service, web service discovery, web service security, the solution to web services in the P2P (Peer-to-Peer) computing environment, and the grid service, etc. This paper also presents a summary of the current art of the state of these techniques, a discussion on the future research topics, and the challenges of the web services.
    2010,21(8):1834-1848, DOI:
    [Abstract] (20423) [HTML] (0) [PDF 682.96 K] (54960)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2005,16(5):857-868, DOI:
    [Abstract] (19705) [HTML] (0) [PDF 489.65 K] (29512)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2009,20(1):54-66, DOI:
    [Abstract] (19458) [HTML] (0) [PDF 1.41 M] (49394)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2012,23(4):962-986, DOI:10.3724/SP.J.1001.2012.04175
    [Abstract] (18560) [HTML] (0) [PDF 2.09 M] (31013)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):32-45, DOI:10.3724/SP.J.1001.2012.04091
    [Abstract] (18546) [HTML] (0) [PDF 408.86 K] (30043)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2009,20(3):524-545, DOI:
    [Abstract] (17252) [HTML] (0) [PDF 1.09 M] (21680)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2009,20(1):124-137, DOI:
    [Abstract] (16806) [HTML] (0) [PDF 1.06 M] (21661)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2004,15(8):1208-1219, DOI:
    [Abstract] (16344) [HTML] (0) [PDF 948.49 K] (13568)
    Abstract:
    With the explosive growth of network applications and complexity, the threat of Internet worms against network security becomes increasingly serious. Especially under the environment of Internet, the variety of the propagation ways and the complexity of the application environment result in worm with much higher frequency of outbreak, much deeper latency and more wider coverage, and Internet worms have been a primary issue faced by malicious code researchers. In this paper, the concept and research situation of Internet worms, exploration function component and execution mechanism are first presented, then the scanning strategies and propagation model are discussed, and finally the critical techniques of Internet worm prevention are given. Some major problems and research trends in this area are also addressed.
    2009,20(11):2965-2976, DOI:
    [Abstract] (16301) [HTML] (0) [PDF 442.42 K] (14768)
    Abstract:
    This paper studies uncertain graph data mining and especially investigates the problem of mining frequent subgraph patterns from uncertain graph data. A data model is introduced for representing uncertainties in graphs, and an expected support is employed to evaluate the significance of subgraph patterns. By using the apriori property of expected support, a depth-first search-based mining algorithm is proposed with an efficient method for computing expected supports and a technique for pruning search space, which reduces the number of subgraph isomorphism testings needed by computing expected support from the exponential scale to the linear scale. Experimental results show that the proposed algorithm is 3 to 5 orders of magnitude faster than a na?ve depth-first search algorithm, and is efficient and scalable.
    2009,20(5):1226-1240, DOI:
    [Abstract] (16172) [HTML] (0) [PDF 926.82 K] (15759)
    Abstract:
    This paper introduces the concrete details of combining the automated reasoning techniques with planning methods, which includes planning as satisfiability using propositional logic, Conformant planning using modal logic and disjunctive reasoning, planning as nonmonotonic logic, and Flexible planning as fuzzy description logic. After considering experimental results of International Planning Competition and relevant papers, it concludes that planning methods based on automated reasoning techniques is helpful and can be adopted. It also proposes the challenges and possible hotspots.
    2003,14(10):1717-1727, DOI:
    [Abstract] (16047) [HTML] (0) [PDF 839.25 K] (14216)
    Abstract:
    Sensor networks are integration of sensor techniques, nested computation techniques, distributed computation techniques and wireless communication techniques. They can be used for testing, sensing, collecting and processing information of monitored objects and transferring the processed information to users. Sensor network is a new research area of computer science and technology and has a wide application future. Both academia and industries are very interested in it. The concepts and characteristics of the sensor networks and the data in the networks are introduced, and the issues of the sensor networks and the data management of sensor networks are discussed. The advance of the research on sensor networks and the data management of sensor networks are also presented.
    2009,20(2):350-362, DOI:
    [Abstract] (16037) [HTML] (0) [PDF 1.39 M] (39765)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2015,26(1):62-81, DOI:10.13328/j.cnki.jos.004701
    [Abstract] (15672) [HTML] (2525) [PDF 1.04 M] (25521)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2014,25(4):839-862, DOI:10.13328/j.cnki.jos.004558
    [Abstract] (15325) [HTML] (2127) [PDF 1.32 M] (18896)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2009,20(10):2729-2743, DOI:
    [Abstract] (14338) [HTML] (0) [PDF 1.12 M] (10673)
    Abstract:
    In a multi-hop wireless sensor network (WSN), the sensors closest to the sink tend to deplete their energy faster than other sensors, which is known as an energy hole around the sink. No more data can be delivered to the sink after an energy hole appears, while a considerable amount of energy is wasted and the network lifetime ends prematurely. This paper investigates the energy hole problem, and based on the improved corona model with levels, it concludes that the assignment of transmission ranges of nodes in different coronas is an effective approach for achieving energy-efficient network. It proves that the optimal transmission ranges for all areas is a multi-objective optimization problem (MOP), which is NP hard. The paper proposes an ACO (ant colony optimization)-based distributed algorithm to prolong the network lifetime, which can help nodes in different areas to adaptively find approximate optimal transmission range based on the node distribution. Furthermore, the simulation results indicate that the network lifetime under this solution approximates to that using the optimal list. Compared with existing algorithms, this ACO-based algorithm can not only make the network lifetime be extended more than two times longer, but also have good performance in the non-uniform node distribution.
    2012,23(1):1-20, DOI:10.3724/SP.J.1001.2012.04100
    [Abstract] (14324) [HTML] (0) [PDF 1017.73 K] (30522)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2012,23(5):1148-1166, DOI:10.3724/SP.J.1001.2012.04195
    [Abstract] (14160) [HTML] (0) [PDF 946.37 K] (16874)
    Abstract:
    With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here, the features, influence and related products of cloud databases are first discussed. Then, research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.
    2000,11(11):1460-1466, DOI:
    [Abstract] (14085) [HTML] (0) [PDF 520.69 K] (11133)
    Abstract:
    Intrusion detection is a highlighted topic of network security research in recent years. In this paper, first the necessity o f intrusion detection is presented, and its concepts and models are described. T hen, many intrusion detection techniques and architectures are summarized. Final ly, the existing problems and the future direction in this field are discussed.
    2013,24(8):1786-1803, DOI:10.3724/SP.J.1001.2013.04416
    [Abstract] (13796) [HTML] (0) [PDF 1.04 M] (16480)
    Abstract:
    Many specific application oriented NoSQL database systems are developed for satisfying the new requirement of big data management. This paper surveys researches on typical NoSQL database based on key-value data model. First, the characteristics of big data, and the key technique issues supporting big data management are introduced. Then frontier efforts and research challenges are given, including system architecture, data model, access mode, index, transaction, system elasticity, load balance, replica strategy, data consistency, flash cache, MapReduce based data process and new generation data management system etc. Finally, research prospects are given.
    2002,13(7):1228-1237, DOI:
    [Abstract] (13772) [HTML] (0) [PDF 500.04 K] (13850)
    Abstract:
    Software architecture (SA) is emerging as one of the primary research areas in software engineering recently and one of the key technologies to the development of large-scale software-intensive system and software product line system. The history and the major direction of SA are summarized, and the concept of SA is brought up based on analyzing and comparing the several classical definitions about SA. Based on summing up the activities about SA, two categories of study about SA are extracted out, and the advancements of researches on SA are subsequently introduced from seven aspects.Additionally,some disadvantages of study on SA are discussed,and the causes are explained at the same.Finally,it is concluded with some singificantly promising tendency about research on SA.
    2004,15(4):571-583, DOI:
    [Abstract] (13644) [HTML] (0) [PDF 1005.17 K] (9719)
    Abstract:
    For most peer-to-peer file-swapping applications, sharing is a volunteer action, and peers are not responsible for their irresponsible bartering history. This situation indicates the trust between participants can not be set up simply on the traditional trust mechanism. A reasonable trust construction approach comes from the social network analysis, in which trust relations between individuals are set up upon recommendations of other individuals. Current p2p trust model could not promise the convergence of iteration for trust computation, and takes no consideration for model security problems, such as sybil attack and slandering. This paper presents a novel recommendation-based global trust model and gives a distributed implementation method. Mathematic analyses and simulations show that, compared to the current global trust model, the proposed model is more robust on trust security problems and more complete on iteration for computing peer trust.
    2006,17(7):1588-1600, DOI:
    [Abstract] (13595) [HTML] (0) [PDF 808.73 K] (14193)
    Abstract:
    Routing technology at the network layer is pivotal in the architecture of wireless sensor networks. As an active branch of routing technology, cluster-based routing protocols excel in network topology management, energy minimization, data aggregation and so on. In this paper, cluster-based routing mechanisms for wireless sensor networks are analyzed. Cluster head selection, cluster formation and data transmission are three key techniques in cluster-based routing protocols. As viewed from the three techniques, recent representative cluster-based routing protocols are presented, and their characteristics and application areas are compared. Finally, the future research issues in this area are pointed out.
    2011,22(1):115-131, DOI:10.3724/SP.J.1001.2011.03950
    [Abstract] (13572) [HTML] (0) [PDF 845.91 K] (27672)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2009,20(1):11-29, DOI:
    [Abstract] (13525) [HTML] (0) [PDF 787.30 K] (13884)
    Abstract:
    Constrained optimization problems (COPs) are mathematical programming problems frequently encountered in the disciplines of science and engineering application. Solving COPs has become an important research area of evolutionary computation in recent years. In this paper, the state-of-the-art of constrained optimization evolutionary algorithms (COEAs) is surveyed from two basic aspects of COEAs (i.e., constraint-handling techniques and evolutionary algorithms). In addition, this paper discusses some important issues of COEAs. More specifically, several typical algorithms are analyzed in detail. Based on the analyses, it concluded that to obtain competitive results, a proper constraint-handling technique needs to be considered in conjunction with an appropriate search algorithm. Finally, the open research issues in this field are also pointed out.
    2008,19(zk):112-120, DOI:
    [Abstract] (13478) [HTML] (0) [PDF 594.29 K] (14334)
    Abstract:
    An ad hoc network is a collection of wireless mobile nodes dynamically forming a temporary network without the use of any existing network infrastructure or centralized administration. Due to bandwidth constraint and dynamic topology of mobile ad hoc networks, multipath supported routing is a very important research issue. In this paper, we present an entropy-based metric to support stability multipath on-demand routing (SMDR). The key idea of SMDR protocol is to construct the new metric-entropy and select the stability multipath with the help of entropy metric to reduce the number of route reconstruction so as to provide QoS guarantee in the ad hoc network whose topology changes continuously. Simulation results show that, with the proposed multipath routing protocol, packet delivery ratio, end-to-end delay, and routing overhead ratio can be improved in most of cases. It is an available approach to multipath routing decision.
    2015,26(1):26-39, DOI:10.13328/j.cnki.jos.004631
    [Abstract] (13438) [HTML] (2139) [PDF 763.52 K] (15233)
    Abstract:
    In recent years, transfer learning has provoked vast amount of attention and research. Transfer learning is a new machine learning method that applies the knowledge from related but different domains to target domains. It relaxes the two basic assumptions in traditional machine learning: (1) the training (also referred as source domain) and test data (also referred target domain) follow the independent and identically distributed (i.i.d.) condition; (2) there are enough labeled samples to learn a good classification model, aiming to solve the problems that there are few or even not any labeled data in target domains. This paper surveys the research progress of transfer learning and introduces its own works, especially the ones in building transfer learning models by applying generative model on the concept level. Finally, the paper introduces the applications of transfer learning, such as text classification and collaborative filtering, and further suggests the future research direction of transfer learning.
    2013,24(1):50-66, DOI:10.3724/SP.J.1001.2013.04276
    [Abstract] (13277) [HTML] (0) [PDF 0.00 Byte] (16583)
    Abstract:
    As an important application of acceleration in the cloud, the distributed caching technology has received considerable attention in industry and academia. This paper starts with a discussion on the combination of cloud computing and distributed caching technology, giving an analysis of its characteristics, typical application scenarios, stages of development, standards, and several key elements, which have promoted its development. In order to systematically know the state of art progress and weak points of the distributed caching technology, the paper builds a multi-dimensional framework, DctAF. This framework is constituted of 6 dimensions through analyzing the characteristics of cloud computing and boundary of the caching techniques. Based on DctAF, current techniques have been analyzed and summarized; comparisons among several influential products have also been made. Finally, the paper describes and highlights the several challenges that the cache system faces and examines the current research through in-depth analysis and comparison.
    2003,14(9):1621-1628, DOI:
    [Abstract] (13082) [HTML] (0) [PDF 680.35 K] (19560)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2008,19(8):1902-1919, DOI:
    [Abstract] (12923) [HTML] (0) [PDF 521.73 K] (13272)
    Abstract:
    Visual language techniques have exhibited more advantages in describing various software artifacts than one-dimensional textual languages during software development, ranging from the requirement analysis and design to testing and maintenance, as diagrammatic and graphical notations have been well applied in modeling system. In addition to an intuitive appearance, graph grammars provide a well-established foundation for defining visual languages with the power of precise modeling and verification on computers. This paper discusses the issues and techniques for a formal foundation of visual languages, reviews related practical graphical environments, presents a spatial graph grammar formalism, and applies the spatial graph grammar to defining behavioral semantics of UML diagrams and developing a style-driven framework for software architecture design.
    2008,19(8):1947-1964, DOI:
    [Abstract] (12921) [HTML] (0) [PDF 811.11 K] (9715)
    Abstract:
    Wide-Spread deployment for interactive information visualization is difficult. Non-Specialist users need a general development method and a toolkit to support the generic data structures suited to tree, network and multi-dimensional data, special visualization techniques and interaction techniques, and well-known generic information tasks. This paper presents a model driven development method for interactive information visualization. First, an interactive information visualization interface model (IIVM) is proposed. Then, the development method for interactive information visualization based on IIVM is presented. The Daisy toolkit is introduced, which includes Daisy model builder, Daisy IIV generator and runtime framework with Daisy library. Finally, an application example is given. Experimental results show that Daisy can provide a general solution for development for interactive information visualization.
    2003,14(9):1635-1644, DOI:
    [Abstract] (12905) [HTML] (0) [PDF 622.06 K] (11655)
    Abstract:
    Computer forensics is the technology field that attempts to prove thorough, efficient, and secure means to investigate computer crime. Computer evidence must be authentic, accurate, complete and convincing to juries. In this paper, the stages of computer forensics are presented, and the theories and the realization of the forensics software are described. An example about forensic practice is also given. The deficiency of computer forensics technique and anti-forensics are also discussed. The result comes out that it is as the improvement of computer science technology, the forensics technique will become more integrated and thorough.
    2002,13(10):1952-1961, DOI:
    [Abstract] (12874) [HTML] (0) [PDF 570.96 K] (11869)
    Abstract:
    The crucial technologies related to personalization are introduced in this paper, which include the representation and modification of user profile, the representation of resource, the recommendation technology, and the architecture of personalization. By comparing with some existing prototype systems, the key technologies about how to implement personalization are discussed in detail. In addition, three representative personalization systems are analyzed. At last, some research directions for personalization are presented.
    2012,23(1):82-96, DOI:10.3724/SP.J.1001.2012.04101
    [Abstract] (12772) [HTML] (0) [PDF 394.07 K] (14167)
    Abstract:
    Botnets are one of the most serious threats to the Internet. Researchers have done plenty of research and made significant progress. However, botnets keep evolving and have become more and more sophisticated. Due to the underlying security limitation of current system and Internet architecture, and the complexity of botnet itself, how to effectively counter the global threat of botnets is still a very challenging issue. This paper first introduces the evolving of botnet’s propagation, attack, command, and control mechanisms. Then the paper summarizes recent advances of botnet defense research and categorizes into five areas: Botnet monitoring, botnet infiltration, analysis of botnet characteristics, botnet detection and botnet disruption. The limitation of current botnet defense techniques, the evolving trend of botnet, and some possible directions for future research are also discussed.
    2010,21(2):231-247, DOI:
    [Abstract] (12674) [HTML] (0) [PDF 1.21 M] (15951)
    Abstract:
    In this paper, a framework is proposed for handling fault of service composition through analyzing fault requirements. Petri nets are used in the framework for fault detecting and its handling, which focuses on targeting the failure of available services, component failure and network failure. The corresponding fault models are given. Based on the model, the correctness criterion of fault handling is given to analyze fault handling model, and its correctness is proven. Finally, CTL (computational tree logic) is used to specify the related properties and enforcement algorithm of fault analysis. The simulation results show that this method can ensure the reliability and consistency of service composition.
    2008,19(7):1565-1580, DOI:
    [Abstract] (12566) [HTML] (0) [PDF 815.02 K] (15938)
    Abstract:
    Software defect prediction has been one of the active parts of software engineering since it was developed in 1970's. It plays a very important role in the analysis of software quality and balance of software cost. This paper investigates and discusses the motivation, evolvement, solutions and challenges of software defect prediction technologies, and it also categorizes, analyzes and compares the representatives of these prediction technologies. Some case studies for software defect distribution models are given to help understanding.
    2017,28(1):1-16, DOI:10.13328/j.cnki.jos.005139
    [Abstract] (12441) [HTML] (2494) [PDF 1.75 M] (8729)
    Abstract:
    Knapsack problem (KP) is a well-known combinatorial optimization problem which includes 0-1 KP, bounded KP, multi-constraint KP, multiple KP, multiple-choice KP, quadratic KP, dynamic knapsack KP, discounted KP and other types of KPs. KP can be considered as a mathematical model extracted from variety of real fields and therefore has wide applications. Evolutionary algorithms (EAs) are universally considered as an efficient tool to solve KP approximately and quickly. This paper presents a survey on solving KP by EAs over the past ten years. It not only discusses various KP encoding mechanism and the individual infeasible solution processing but also provides useful guidelines for designing new EAs to solve KPs.
    2010,21(7):1620-1634, DOI:
    [Abstract] (12414) [HTML] (0) [PDF 765.23 K] (19461)
    Abstract:
    As an application of mobile ad hoc networks (MANET) on Intelligent Transportation Information System, the most important goal of vehicular ad hoc networks (VANET) is to reduce the high number of accidents and fatal consequences dramatically. One of the most important factors that would contribute to the realization of this goal is the design of effective broadcast protocols. This paper introduces the characteristics and application fields of VANET briefly. Then, it discusses the characteristics, performance, and application areas with analysis and comparison of various categories of broadcast protocols in VANET. According to the characteristic of VANET and its application requirement, the paper proposes the ideas and breakthrough direction of information broadcast model design of inter-vehicle communication.
    2010,21(5):916-929, DOI:
    [Abstract] (12259) [HTML] (0) [PDF 944.50 K] (17188)
    Abstract:
    Data deduplication technologies can be divided into two categories: a) identical data detection techniques, and b) similar data detection and encoding techniques. This paper presents a systematic survey on these two categories of data deduplication technologies and analyzes their advantages and disadvantages. Besides, since data deduplication technologies can affect the reliability and performance of storage systems, this paper also surveys various kinds of technologies proposed to cope with these two aspects of problems. Based on the analysis of the current state of research on data deduplication technologies, this paper makes several conclusions as follows: a) How to mine data characteristic information in data deduplication has not been completely solved, and how to use data characteristic information to effectively eliminate duplicate data also needs further study; b) From the perspective of storage system design, it still needs further study how to introduce proper mechanisms to overcome the reliability limitations of data deduplication techniques and reduce the additional system overheads caused by data deduplication techniques.
    2006,17(9):1848-1859, DOI:
    [Abstract] (12199) [HTML] (0) [PDF 770.40 K] (20493)
    Abstract:
    In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
    2008,19(10):2706-2719, DOI:
    [Abstract] (12073) [HTML] (0) [PDF 778.29 K] (11336)
    Abstract:
    Web search engine has become a very important tool for finding information efficiently from the massive Web data. With the explosive growth of the Web data, traditional centralized search engines become harder to catch up with the growing step of people's information needs. With the rapid development of peer-to-peer (P2P) technology, the notion of P2P Web search has been proposed and quickly becomes a research focus. The goal of this paper is to give a brief summary of current P2P Web search technologies in order to facilitate future research. First, some main challenges for P2P Web search are presented. Then, key techniques for building a feasible and efficient P2P Web search engine are reviewed, including system topology, data placement, query routing, index partitioning, collection selection, relevance ranking and Web crawling. Finally, three recently proposed novel P2P Web search prototypes are introduced.
    2009,20(6):1393-1405, DOI:
    [Abstract] (12040) [HTML] (0) [PDF 831.86 K] (18017)
    Abstract:
    Combinatorial testing can use a small number of test cases to test systems while preserving fault detection ability. However, the complexity of test case generation problem for combinatorial testing is NP-complete. The efficiency and complexity of this testing method have attracted many researchers from the area of combinatorics and software engineering. This paper summarizes the research works on this topic in recent years. They include: various combinatorial test criteria, the relations between the test generation problem and other NP-complete problems, the mathematical methods for constructing test cases, the computer search techniques for test generation and fault localization techniques based on combinatorial testing.
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291, DOI:
    [Abstract] (36972) [HTML] (0) [PDF 832.28 K] (79314)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2008,19(1):48-61, DOI:
    [Abstract] (27909) [HTML] (0) [PDF 671.39 K] (60675)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2010,21(8):1834-1848, DOI:
    [Abstract] (20423) [HTML] (0) [PDF 682.96 K] (54960)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2011,22(1):71-83, DOI:10.3724/SP.J.1001.2011.03958
    [Abstract] (29713) [HTML] (0) [PDF 781.42 K] (54189)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2009,20(1):54-66, DOI:
    [Abstract] (19458) [HTML] (0) [PDF 1.41 M] (49394)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2009,20(5):1337-1348, DOI:
    [Abstract] (27956) [HTML] (0) [PDF 1.06 M] (44055)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2009,20(2):271-289, DOI:
    [Abstract] (26847) [HTML] (0) [PDF 675.56 K] (42305)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2009,20(2):350-362, DOI:
    [Abstract] (16037) [HTML] (0) [PDF 1.39 M] (39765)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2004,15(10):1493-1504, DOI:
    [Abstract] (9046) [HTML] (0) [PDF 937.72 K] (38718)
    Abstract:
    Graphics processing unit (GPU) has been developing rapidly in recent years at a speed over Moor抯 law, and as a result, various applications associated with computer graphics advance greatly. At the same time, the highly processing power, parallelism and programmability available nowadays on the contemporary GPU provide an ideal platform on which the general-purpose computation could be made. Starting from an introduction to the development history and the architecture of GPU, the technical fundamentals of GPU are described in the paper. Then in the main part of the paper, the development of various applications on general purpose computation on GPU is introduced, and among those applications, fluid dynamics, algebraic computation, database operations, and spectrum analysis are introduced in detail. The experience of our work on fluid dynamics has been also given, and the development of software tools in this area is introduced. Finally, a conclusion is made, and the future development and the new challenge on both hardware and software in this subject are discussed.
    2010,21(3):427-437, DOI:
    [Abstract] (32799) [HTML] (0) [PDF 308.76 K] (37857)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2014,25(9):1889-1908, DOI:10.13328/j.cnki.jos.004674
    [Abstract] (11575) [HTML] (2597) [PDF 550.98 K] (34945)
    Abstract:
    This paper first introduces the key features of big data in different processing modes and their typical application scenarios, as well as corresponding representative processing systems. It then summarizes three development trends of big data processing systems. Next, the paper gives a brief survey on system supported analytic technologies and applications (including deep learning, knowledge computing, social computing, and visualization), and summarizes the key roles of individual technologies in big data analysis and understanding. Finally, the paper lays out three grand challenges of big data processing and analysis, i.e., data complexity, computation complexity, and system complexity. Potential ways for dealing with each complexity are also discussed.
    2013,24(11):2476-2497, DOI:10.3724/SP.J.1001.2013.04486
    [Abstract] (10196) [HTML] (0) [PDF 1.14 M] (33912)
    Abstract:
    Probabilistic graphical models are powerful tools for compactly representing complex probability distributions, efficiently computing (approximate) marginal and conditional distributions, and conveniently learning parameters and hyperparameters in probabilistic models. As a result, they have been widely used in applications that require some sort of automated probabilistic reasoning, such as computer vision and natural language processing, as a formal approach to deal with uncertainty. This paper surveys the basic concepts and key results of representation, inference and learning in probabilistic graphical models, and demonstrates their uses in two important probabilistic models. It also reviews some recent advances in speeding up classic approximate inference algorithms, followed by a discussion of promising research directions.
    2012,23(4):962-986, DOI:10.3724/SP.J.1001.2012.04175
    [Abstract] (18560) [HTML] (0) [PDF 2.09 M] (31013)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):1-20, DOI:10.3724/SP.J.1001.2012.04100
    [Abstract] (14324) [HTML] (0) [PDF 1017.73 K] (30522)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2018,29(5):1471-1514, DOI:10.13328/j.cnki.jos.005519
    [Abstract] (5693) [HTML] (2946) [PDF 4.38 M] (30326)
    Abstract:
    Computer aided detection/diagnosis (CAD) can improve the accuracy of diagnosis,reduce false positive,and provide decision supports for doctors.The main purpose of this paper is to analyze the latest development of computer aided diagnosis tools.Focusing on the top four fatal cancer's incidence positions,major recent publications on CAD applications in different medical imaging areas are reviewed in this survey according to different imaging techniques and diseases.Further more,multidimentional analysis is made on the researches from image data sets,algorithms and evaluation methods.Finally,existing problems,research trend and development direction in the field of medical image CAD system are discussed.
    2016,27(1):45-71, DOI:10.13328/j.cnki.jos.004914
    [Abstract] (28829) [HTML] (2201) [PDF 880.96 K] (30158)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2012,23(1):32-45, DOI:10.3724/SP.J.1001.2012.04091
    [Abstract] (18546) [HTML] (0) [PDF 408.86 K] (30043)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2005,16(5):857-868, DOI:
    [Abstract] (19705) [HTML] (0) [PDF 489.65 K] (29512)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2011,22(1):115-131, DOI:10.3724/SP.J.1001.2011.03950
    [Abstract] (13572) [HTML] (0) [PDF 845.91 K] (27672)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2013,24(1):77-90, DOI:10.3724/SP.J.1001.2013.04339
    [Abstract] (11128) [HTML] (0) [PDF 0.00 Byte] (26256)
    Abstract:
    Task parallel programming model is a widely used parallel programming model on multi-core platforms. With the intention of simplifying parallel programming and improving the utilization of multiple cores, this paper provides an introduction to the essential programming interfaces and the supporting mechanism used in task parallel programming models and discusses issues and the latest achievements from three perspectives: Parallelism expression, data management and task scheduling. In the end, some future trends in this area are discussed.
    2021,32(2):349-369, DOI:10.13328/j.cnki.jos.006138
    [Abstract] (7367) [HTML] (5336) [PDF 2.36 M] (26019)
    Abstract:
    Few-shot learning is defined as learning models to solve problems from small samples. In recent years, under the trend of training model with big data, machine learning and deep learning have achieved success in many fields. However, in many application scenarios in the real world, there is not a large amount of data or labeled data for model training, and labeling a large number of unlabeled samples will cost a lot of manpower. Therefore, how to use a small number of samples for learning has become a problem that needs to be paid attention to at present. This paper systematically combs the current approaches of few-shot learning. It introduces each kind of corresponding model from the three categories: fine-tune based, data augmentation based, and transfer learning based. Then, the data augmentation based approaches are subdivided into unlabeled data based, data generation based, and feature augmentation based approaches. The transfer learning based approaches are subdivided into metric learning based, meta-learning based, and graph neural network based methods. In the following, the paper summarizes the few-shot datasets and the results in the experiments of the aforementioned models. Next, the paper summarizes the current situation and challenges in few-shot learning. Finally, the future technological development of few-shot learning is prospected.
    2015,26(1):62-81, DOI:10.13328/j.cnki.jos.004701
    [Abstract] (15672) [HTML] (2525) [PDF 1.04 M] (25521)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2017,28(4):959-992, DOI:10.13328/j.cnki.jos.005143
    [Abstract] (8940) [HTML] (3463) [PDF 3.58 M] (23663)
    Abstract:
    The development of mobile internet and the popularity of mobile terminals produce massive trajectory data of moving objects under the era of big data. Trajectory data has spatio-temporal characteristics and rich information. Trajectory data processing techniques can be used to mine the patterns of human activities and behaviors, the moving patterns of vehicles in the city and the changes of atmospheric environment. However, trajectory data also can be exploited to disclose moving objects' privacy information (e.g., behaviors, hobbies and social relationships). Accordingly, attackers can easily access moving objects' privacy information by digging into their trajectory data such as activities and check-in locations. In another front of research, quantum computation presents an important theoretical direction to mine big data due to its scalable and powerful storage and computing capacity. Applying quantum computing approaches to handle trajectory big data could make some complex problem solvable and achieve higher efficiency. This paper reviews the key technologies of processing trajectory data. First the concept and characteristics of trajectory data is introduced, and the pre-processing methods, including noise filtering and data compression, are summarized. Then, the trajectory indexing and querying techniques, and the current achievements of mining trajectory data, such as pattern mining and trajectory classification, are reviewed. Next, an overview of the basic theories and characteristics of privacy preserving with respect to trajectory data is provided. The supporting techniques of trajectory big data mining, such as processing framework and data visualization, are presented in detail. Some possible ways of applying quantum computation into trajectory data processing, as well as the implementation of some core trajectory mining algorithms by quantum computation are also described. Finally, the challenges of trajectory data processing and promising future research directions are discussed.
    2011,22(6):1299-1315, DOI:10.3724/SP.J.1001.2011.03993
    [Abstract] (10923) [HTML] (0) [PDF 987.90 K] (22038)
    Abstract:
    Attribute-Based encryption (ABE) scheme takes attributes as the public key and associates the ciphertext and user’s secret key with attributes, so that it can support expressive access control policies. This dramatically reduces the cost of network bandwidth and sending node’s operation in fine-grained access control of data sharing. Therefore, ABE has a broad prospect of application in the area of fine-grained access control. After analyzing the basic ABE system and its two variants, Key-Policy ABE (KP-ABE) and Ciphertext-Policy ABE (CP-ABE), this study elaborates the research problems relating to ABE systems, including access structure design for CP-ABE, attribute key revocation, key abuse and multi-authorities ABE with an extensive comparison of their functionality and performance. Finally, this study discusses the need-to-be solved problems and main research directions in ABE.
    2009,20(3):524-545, DOI:
    [Abstract] (17252) [HTML] (0) [PDF 1.09 M] (21680)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2009,20(1):124-137, DOI:
    [Abstract] (16806) [HTML] (0) [PDF 1.06 M] (21661)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2004,15(11):1583-1594, DOI:
    [Abstract] (8728) [HTML] (0) [PDF 1.57 M] (20591)
    Abstract:
    Uncertainty exists widely in the subjective and objective world. In all kinds of uncertainty, randomness and fuzziness are the most important and fundamental. In this paper, the relationship between randomness and fuzziness is discussed. Uncertain states and their changes can be measured by entropy and hyper-entropy respectively. Taken advantage of entropy and hyper-entropy, the uncertainty of chaos, fractal and complex networks by their various evolution and differentiation are further studied. A simple and effective way is proposed to simulate the uncertainty by means of knowledge representation which provides a basis for the automation of both logic and image thinking with uncertainty. The AI (artificial intelligence) with uncertainty is a new cross-discipline, which covers computer science, physics, mathematics, brain science, psychology, cognitive science, biology and philosophy, and results in the automation of representation, process and thinking for uncertain information and knowledge.
    2006,17(9):1848-1859, DOI:
    [Abstract] (12199) [HTML] (0) [PDF 770.40 K] (20493)
    Abstract:
    In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
    2005,16(1):1-7, DOI:
    [Abstract] (22042) [HTML] (0) [PDF 614.61 K] (20177)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2014,25(1):37-50, DOI:10.13328/j.cnki.jos.004497
    [Abstract] (9554) [HTML] (2650) [PDF 929.87 K] (20129)
    Abstract:
    This paper surveys the state of the art of speech emotion recognition (SER), and presents an outlook on the trend of future SER technology. First, the survey summarizes and analyzes SER in detail from five perspectives, including emotion representation models, representative emotional speech corpora, emotion-related acoustic features extraction, SER methods and applications. Then, based on the survey, the challenges faced by current SER research are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, and presents detailed comparison and analysis between these methods.
    2012,23(8):2058-2072, DOI:10.3724/SP.J.1001.2012.04237
    [Abstract] (9982) [HTML] (0) [PDF 800.05 K] (19888)
    Abstract:
    The Distributed denial of service (DDoS) attack is a major threat to the current network. Based on the attack packet level, the study divides DDoS attacks into network-level DDoS attacks and application-level DDoS attacks. Next, the study analyzes the detection and control methods of these two kinds of DDoS attacks in detail, and it also analyzes the drawbacks of different control methods implemented in different network positions. Finally, the study analyzes the drawbacks of the current detection and control methods, the development trend of the DDoS filter system, and corresponding technological challenges are also proposed.
    2018,29(10):2966-2994, DOI:10.13328/j.cnki.jos.005551
    [Abstract] (9252) [HTML] (3906) [PDF 610.06 K] (19781)
    Abstract:
    In recent years, the rapid development of Internet technology and Web applications has triggered the explosion of various data on the Internet, which generates a large amount of valuable knowledge. How to organize, represent and analyze these knowledge has attracted much attention. Knowledge graph was thus developed to organize these knowledge in a semantical and visualized manner. Knowledge reasoning over knowledge graph then becomes one of the hot research topics and plays an important role in many applications such as vertical search and intelligent question-answer. The goal of knowledge reasoning over knowledge graph is to infer new facts or identify erroneous facts according to existing ones. Unlike traditional knowledge reasoning, knowledge reasoning over knowledge graph is more diversified, due to the simplicity, intuitiveness, flexibility, and richness of knowledge representation in knowledge graph. Starting with the basic concept of knowledge reasoning, this paper presents a survey on the recently developed methods for knowledge reasoning over knowledge graph. Specifically, the research progress is reviewed in detail from two aspects:One-Step reasoning and multi-step reasoning, each including rule based reasoning, distributed embedding based reasoning, neural network based reasoning and hybrid reasoning. Finally, future research directions and outlook of knowledge reasoning over knowledge graph are discussed.
    2020,31(7):2245-2282, DOI:10.13328/j.cnki.jos.006037
    [Abstract] (2748) [HTML] (2910) [PDF 967.02 K] (19653)
    Abstract:
    Ultrasonography is the first choice of imaging examination and preoperative evaluation for thyroid and breast cancer. However, ultrasonic characteristics of benign and malignant nodules are commonly overlapped. The diagnosis heavily relies on operator's experience other than quantitative and stable methods. In recent years, medical imaging analysis based on computer technology has developed rapidly, and a series of landmark breakthroughs have been made, which provides effective decision supports for medical imaging diagnosis. In this work, the research progress of computer vision and image recognition technologies in thyroid and breast ultrasound images is studied. A series of key technologies involved in automatic diagnosis of ultrasound images is the main lines of the work. The major algorithms in recent years are summarized and analyzed, such as ultrasound image preprocessing, lesion localization and segmentation, feature extraction and classification. Moreover, multi-dimensional analysis is made on the algorithms, data sets, and evaluation methods. Finally, existing problems related to automatic analysis of those two kinds of ultrasound imaging are discussed, research trend and development direction in the field of ultrasound images analysis are discussed.
    2003,14(9):1621-1628, DOI:
    [Abstract] (13082) [HTML] (0) [PDF 680.35 K] (19560)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2010,21(7):1620-1634, DOI:
    [Abstract] (12414) [HTML] (0) [PDF 765.23 K] (19461)
    Abstract:
    As an application of mobile ad hoc networks (MANET) on Intelligent Transportation Information System, the most important goal of vehicular ad hoc networks (VANET) is to reduce the high number of accidents and fatal consequences dramatically. One of the most important factors that would contribute to the realization of this goal is the design of effective broadcast protocols. This paper introduces the characteristics and application fields of VANET briefly. Then, it discusses the characteristics, performance, and application areas with analysis and comparison of various categories of broadcast protocols in VANET. According to the characteristic of VANET and its application requirement, the paper proposes the ideas and breakthrough direction of information broadcast model design of inter-vehicle communication.
    2013,24(2):295-316, DOI:10.3724/SP.J.1001.2013.04336
    [Abstract] (9794) [HTML] (0) [PDF 0.00 Byte] (19220)
    Abstract:
    Under the new application mode, the traditional hierarchy data centers face several limitations in size, bandwidth, scalability, and cost. In order to meet the needs of new applications, data center network should fulfill the requirements with low-cost, such as high scalability, low configuration overhead, robustness and energy-saving. First, the shortcomings of the traditional data center network architecture are summarized, and new requirements are pointed out. Secondly, the existing proposals are divided into two categories, i.e. server-centric and network-centric. Then, several representative architectures of these two categories are overviewed and compared in detail. Finally, the future directions of data center network are discussed.
    2005,16(10):1743-1756, DOI:
    [Abstract] (10025) [HTML] (0) [PDF 545.62 K] (19132)
    Abstract:
    This paper presents a survey on the theory of provable security and its applications to the design and analysis of security protocols. It clarifies what the provable security is, explains some basic notions involved in the theory of provable security and illustrates the basic idea of random oracle model. It also reviews the development and advances of provably secure public-key encryption and digital signature schemes, in the random oracle model or the standard model, as well as the applications of provable security to the design and analysis of session-key distribution protocols and their advances.
    2014,25(4):839-862, DOI:10.13328/j.cnki.jos.004558
    [Abstract] (15325) [HTML] (2127) [PDF 1.32 M] (18896)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2010,21(7):1605-1619, DOI:
    [Abstract] (9833) [HTML] (0) [PDF 856.25 K] (18120)
    Abstract:
    The rapid development of Internet leads to an increase in system complexity and uncertainty. Traditional network management can not meet the requirement, and it shall evolve to fusion based Cyberspace Situational Awareness (CSA). Based on the analysis of function shortage and development requirement, this paper introduces CSA as well as its origin, conception, objective and characteristics. Firstly, a CSA research framework is proposed and the research history is investigated, based on which the main aspects and the existing issues of the research are analyzed. Meanwhile, assessment methods are divided into three categories: Mathematics model, knowledge reasoning and pattern recognition. Then, this paper discusses CSA from three aspects: Model, knowledge representation and assessment methods, and then goes into detail about main idea, assessment process, merits and shortcomings of novel methods. Many typical methods are compared. The current application research of CSA in the fields of security, transmission, survivable, system evaluation and so on is presented. Finally, this paper points the development directions of CSA and offers the conclusions from issue system, technical system and application system.
    2013,24(5):1078-1097, DOI:10.3724/SP.J.1001.2013.04390
    [Abstract] (11743) [HTML] (0) [PDF 1.74 M] (18031)
    Abstract:
    The control and data planes are decoupled in software-defined networking, which provide a new solution for research on new network applications and future Internet technologies. The development status of OpenFlow-based SDN technologies is surveyed in this paper. The research background of decoupled architecture of network control and data transmission in OpenFlow network is summarized first, and the key components and research progress including OpenFlow switch, controller, and SDN technologies are introduced. Moreover, current problems and solutions of OpenFlow-based SDN technologies are analyzed in four aspects. Combined with the development status in recent years, the applications used in campus, data center, network management and network security are summarized. Finally, future research trends are discussed.
    2009,20(6):1393-1405, DOI:
    [Abstract] (12040) [HTML] (0) [PDF 831.86 K] (18017)
    Abstract:
    Combinatorial testing can use a small number of test cases to test systems while preserving fault detection ability. However, the complexity of test case generation problem for combinatorial testing is NP-complete. The efficiency and complexity of this testing method have attracted many researchers from the area of combinatorics and software engineering. This paper summarizes the research works on this topic in recent years. They include: various combinatorial test criteria, the relations between the test generation problem and other NP-complete problems, the mathematical methods for constructing test cases, the computer search techniques for test generation and fault localization techniques based on combinatorial testing.
    2011,22(3):381-407, DOI:10.3724/SP.J.1001.2011.03934
    [Abstract] (10453) [HTML] (0) [PDF 614.69 K] (17687)
    Abstract:
    The popularity of the Internet and the boom of the World Wide Web foster innovative changes in software technology that give birth to a new form of software—networked software, which delivers diversified and personalized on-demand services to the public. With the ever-increasing expansion of applications and users, the scale and complexity of networked software are growing beyond the information processing capability of human beings, which brings software engineers a series of challenges to face. In order to come to a scientific understanding of this kind of ultra-large-scale artificial complex systems, a survey research on the infrastructure, application services, and social interactions of networked software is conducted from a three-dimensional perspective of cyberization, servicesation, and socialization. Interestingly enough, most of them have been found to share the same global characteristics of complex networks such as “Small World” and “Scale Free”. Next, the impact of the empirical study on software engineering research and practice and its implications for further investigations are systematically set forth. The convergence of software engineering and other disciplines will put forth new ideas and thoughts that will breed a new way of thinking and input new methodologies for the study of networked software. This convergence is also expected to achieve the innovations of theories, methods, and key technologies of software engineering to promote the rapid development of software service industry in China.
    2008,19(11):2803-2813, DOI:
    [Abstract] (9208) [HTML] (0) [PDF 319.20 K] (17677)
    Abstract:
    A semi-supervised clustering method based on affinity propagation (AP) algorithm is proposed in this paper. AP takes as input measures of similarity between pairs of data points. AP is an efficient and fast clustering algorithm for large dataset compared with the existing clustering algorithms, such as K-center clustering. But for the datasets with complex cluster structures, it cannot produce good clustering results. It can improve the clustering performance of AP by using the priori known labeled data or pairwise constraints to adjust the similarity matrix. Experimental results show that such method indeed reaches its goal for complex datasets, and this method outperforms the comparative methods when there are a large number of pairwise constraints.
    2009,20(8):2241-2254, DOI:
    [Abstract] (6757) [HTML] (0) [PDF 1.99 M] (17608)
    Abstract:
    Inspired from the idea of data fields, a community discovery algorithm based on topological potential is proposed. The basic idea is that a topological potential function is introduced to analytically model the virtual interaction among all nodes in a network and, by regarding each community as a local high potential area, the community structure in the network can be uncovered by detecting all local high potential areas margined by low potential nodes. The experiments on some real-world networks show that the algorithm requires no input parameters and can discover the intrinsic or even overlapping community structure in networks. The time complexity of the algorithm is O(m+n3/γ)~O(n2), where n is the number of nodes to be explored, m is the number of edges, and 2<γ<3 is a constant.
    2017,28(1):160-183, DOI:10.13328/j.cnki.jos.005136
    [Abstract] (8637) [HTML] (3570) [PDF 3.12 M] (17537)
    Abstract:
    Image segmentation is the process of dividing the image into a number of regions with similar properties, and it's the preprocessing step for many image processing tasks. In recent years, domestic and foreign scholars mainly focus on the content-based image segmentation algorithms. Based on extensive research on the existing literatures and the latest achievements, this paper categorizes image segmentation algorithms into three types:graph theory based method, pixel clustering based method and semantic segmentation method. The basic ideas, advantage and disadvantage of typical algorithms belong to each category, especially the most recent image semantic segmentation algorithms based on deep neural network are analyzed, compared and summarized. Furthermore, the paper introduces the datasets which are commonly used as benchmark in image segmentation and evaluation criteria for algorithms, and compares several image segmentation algorithms with experiments as well. Finally, some potential future research work is discussed.
    2013,24(4):825-842, DOI:10.3724/SP.J.1001.2013.04369
    [Abstract] (8363) [HTML] (0) [PDF 1.09 M] (17515)
    Abstract:
    Honeypot is a proactive defense technology, introduced by the defense side to change the asymmetric situation of a network attack and defensive game. Through the deployment of the honeypots, i.e. security resources without any production purpose, the defenders can deceive attackers to illegally take advantage of the honeypots and capture and analyze the attack behaviors to understand the attack tools and methods, and to learn the intentions and motivations. Honeypot technology has won the sustained attention of the security community to make considerable progress and get wide application, and has become one of the main technical means of the Internet security threat monitoring and analysis. In this paper, the origin and evolution process of the honeypot technology are presented first. Next, the key mechanisms of honeypot technology are comprehensively analyzed, the development process of the honeypot deployment structure is also reviewed, and the latest applications of honeypot technology in the directions of Internet security threat monitoring, analysis and prevention are summarized. Finally, the problems of honeypot technology, development trends and further research directions are discussed.
    2018,29(1):42-68, DOI:10.13328/j.cnki.jos.005320
    [Abstract] (9563) [HTML] (2933) [PDF 2.54 M] (17450)
    Abstract:
    The Internet has penetrated into all aspects of human society and has greatly promoted social progress. At the same time, various forms of cybercrimes and network theft occur frequently, bringing great harm to our society and national security. Cyber security has become a major concern to the public and the government. As a large number of Internet functionalities and applications are implemented by software, software plays a crucial role in cyber security research and practice. In fact, almost all cyberattacks were carried out by exploiting vulnerabilities in system software or application software. It is increasingly urgent to investigate the problems of software security in the new age. This paper reviews the state of the art of malware, software vulnerabilities and software security mechanism, and analyzes the new challenges and trends that the software ecosystem is currently facing.
    2023,34(2):625-654, DOI:10.13328/j.cnki.jos.006696
    [Abstract] (2329) [HTML] (2841) [PDF 3.04 M] (17404)
    Abstract:
    Source code bug (vulnerability) detection is a process of judging whether there are unexpected behaviors in the program code. It is widely used in software engineering tasks such as software testing and software maintenance, and plays a vital role in software functional assurance and application security. Traditional vulnerability detection research is based on program analysis, which usually requires strong domain knowledge and complex calculation rules, and faces the problem of state explosion, resulting in limited detection performance, and there is room for greater improvement in the rate of false positives and false negatives. In recent years, the open source community's vigorous development has accumulated massive amounts of data with open source code as the core. In this context, the feature learning capabilities of deep learning can automatically learn semantically rich code representations, thereby providing a new way for vulnerability detection. This study collected the latest high-level papers in this field, systematically summarized and explained the current methods from two aspects:vulnerability code dataset and deep learning vulnerability detection model. Finally, it summarizes the main challenges faced by the research in this field, and looks forward to the possible future research focus.
    2016,27(3):691-713, DOI:10.13328/j.cnki.jos.004948
    [Abstract] (9322) [HTML] (1685) [PDF 2.43 M] (17403)
    Abstract:
    Learning to rank(L2R) techniques try to solve sorting problems using machine learning methods, and have been well studied and widely used in various fields such as information retrieval, text mining, personalized recommendation, and biomedicine.The main task of L2R based recommendation algorithms is integrating L2R techniques into recommendation algorithms, and studying how to organize a large number of users and features of items, build more suitable user models according to user preferences requirements, and improve the performance and user satisfaction of recommendation algorithms.This paper surveys L2R based recommendation algorithms in recent years, summarizes the problem definition, compares key technologies and analyzes evaluation metrics and their applications.In addition, the paper discusses the future development trend of L2R based recommendation algorithms.
    2009,20(3):567-582, DOI:
    [Abstract] (8276) [HTML] (0) [PDF 780.38 K] (17387)
    Abstract:
    The research on the software quality model and software quality evaluation model has always been a hot topic in the area of software quality assurance and assessment. A great amount of domestic and foreignresearches have been done in building software quality model and quality assessment model, and so far certainaccomplishments have been achieved in these areas. In recent years, platform building and systematization havebecome the trends of developing basic softwares based on operating systems. Therefore, the quality evaluation ofthe foundational software platform becomes an essential issue to be solved. This article analyzes and concludes thecurrent development of researches on software quality model and software quality assessment model focusing onsummarizing and depicting the developing process of quality evaluation of foundational software platform. It alsodiscusses the future development of researches on quality assessment of foundational software platform in brief, trying to establish a good foundation for it.