• Current Issue
  • Online First
  • Archive
  • Click Rank
  • Most Downloaded
  • 综述文章
  • 专刊文章
  • 分辑系列
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2023,34(9):3941-3965, DOI: 10.13328/j.cnki.jos.006875
    [Abstract] (983) [HTML] (102) [PDF 5.23 M] (739)
    Abstract:
    Artificial intelligence systems are widely used to solve various challenges in the real world in an unprecedented way, and they have become the core driving force for the development of human society. With the rapid popularization of artificial intelligence systems in all walks of life, the trustworthiness of artificial intelligence systems is becoming more and more worrying. The main reason is that the trustworthiness of traditional software systems is not enough to fully describe that of artificial intelligence systems. Therefore, research on the trustworthiness of artificial intelligence systems is urgently needed. At present, there have been a large number of relevant studies, which focus on different aspects. However, these studies lack a holistic and systematic understanding. This study is a tertiary study with the existing secondary study as the research object. It aims to reveal the research status of quality attributes and practices related to the trustworthiness of artificial intelligence systems and establish a more comprehensive quality attribute framework for trustworthy artificial intelligence systems. This study collects, sorts out, and analyzes 34 secondary studies published until March 2022. In addition, it identifies 21 quality attributes related to trustworthiness, as well as measurement methods and assurance practices of trustworthiness. The study finds that existing research mainly focuses on security and privacy, and extensive and in-depth research on other quality attributes is fewer. Furthermore, two research directions requiring interdisciplinary collaboration need more attention in future research. On the one hand, the artificial intelligence system is essentially a software system, and its trustworthiness as a software system is worthy of collaborative research by artificial intelligence and software engineering experts. On the other hand, artificial intelligence belongs to human’s exploration of machine anthropomorphism, and research on how to ensure the trustworthiness of machines in the social environment from the system level, such as how to satisfy humanism, is worthy of collaborative research by artificial intelligence and social science experts.
    2023,34(9):3966-3980, DOI: 10.13328/j.cnki.jos.006874
    [Abstract] (1069) [HTML] (54) [PDF 6.48 M] (987)
    Abstract:
    In recent years, intelligent computing frameworks have been widely applied as implementation tools in artificial intelligence (AI) engineering, and the reliability of the frameworks is the key to AI implementation effectiveness. However, the reliability assurance of intelligent computing frameworks is challenging. On one hand, the code iteration of frameworks is fast, with difficult code testing. On the other hand, unlike traditional software, intelligent computing frameworks involve a large number of tensor calculations, and the code specification lacks the guidance of software engineering theory. To this end, existing research mostly employs fuzzy testing to localize defects. However, such a method can only accurately discover specific fault types, and it is difficult to guide developers and make them focus on software quality during the development process. Therefore, this study takes the popular intelligent computing frameworks (TensorFlow, Baidu PaddlePaddle, etc.) as the research object, selects multiple change features to build datasets, and conducts just-in-time prediction on the defects of the intelligent computing framework at the code submission level. Additionally, LDA is employed to mine codes and code submission information as new features, and then the random forest is adopted for prediction. Results show that the average AUC-ROC is 0.77, and semantic information can slightly improve the prediction performance. Finally, this study leverages an explainable machine learning method called SHAP to analyze the influence of each feature on the prediction output of the model. The findings are as follows. (1) The influence of basic features on the model conforms to traditional software development laws. (2) Code and semantic features in submitted information are important in the prediction result of the model. (3) The contribution of different features in different systems to the output of the prediction model varies a lot.
    2023,34(9):3981-4002, DOI: 10.13328/j.cnki.jos.006877
    [Abstract] (834) [HTML] (25) [PDF 3.41 M] (717)
    Abstract:
    The development of deep learning technology has driven the rapid progress of autonomous driving. While the accuracy of perception models based on deep learning is gradually improved, hidden dangers related to robustness and reliability still exist. Therefore, tests should be conducted thoroughly under various scenes to ensure acceptable security levels. Scene-based simulation testing is crucial in autonomous driving technology. One key challenge is to describe and generate diversified simulation testing scenes. Scenario description languages can describe autonomous driving scenes and instantiate the scenes in virtual environments to obtain simulation data. However, most existing scenario description languages cannot provide high-level abstractions and descriptions of the road structure of the scene. This study presents a road network property graph to represent the abstracted entities and their relations within a road network. It also introduces SceneRoad, a language specifically designed to provide concise and expressive descriptions of the road structure in a scene. SceneRoad can build a road network feature query graph based on the described road structure features of a scene. Thus, the problem of searching the road structures in the road network is abstracted as a subgraph matching one on the property graph, which can be solved by the VF2 algorithm. Additionally, SceneRoad is incorporated as an extension into the Scenic scenario description language. With this extended language, a diverse set of static scenes are employed to build a simulation dataset. Statistical analysis of the dataset indicates the wide variety of scenes that have been generated. The results of training and testing various perception models on both real and simulated datasets show that the model’s performance on the two datasets is positively correlated, which shows that the model’s evaluation on the simulated dataset aligns with its performance in real scenes. This is significant for perception model evaluation and research on improving the model’s robustness and safety.
    2023,34(9):4003-4017, DOI: 10.13328/j.cnki.jos.006878
    [Abstract] (835) [HTML] (35) [PDF 5.18 M] (987)
    Abstract:
    In recent years, deep neural network (DNN) has made great progress in the field of image. However, studies show that DNN is susceptible to the interference of adversarial examples and exhibits poor robustness. By generating adversarial examples to attack DNN, the robustness of DNN can be evaluated, and then corresponding defense methods can be adopted to improve the robustness of DNN. The existing adversarial example generation methods still have some defects, such as insufficient sparsity of generated perturbations, and excessive perturbation magnitude. This study proposes an adversarial example generation method based on sparse perturbation, sparse perturbation based adversarial example generation (SparseAG), which can generate relatively sparse and small-magnitude perturbations for image examples. Specifically, SparseAG first selects the perturbation points iteratively based on the gradient value of the loss function for the input image to generate the initial adversarial example. In each iteration, the candidate set of the new perturbation points is determined in the order of gradient value from large to small values, and the perturbation which makes the value of loss function value smallest is added to the image. Secondly, a perturbation optimization strategy is employed in the initial perturbation scheme to improve the sparsity and authenticity of the adversarial example. The perturbations are improved based on the importance of each perturbation for jumping out of the local optimum, and the redundant perturbation and the redundant perturbation magnitude are further reduced. This study selects the CIFAR-10 dataset and the ImageNet dataset to evaluate the method in the target attack and non-target attack scenarios. The experimental results show that SparseAG can achieve a 100% attack success rate in different datasets and different attack scenarios, and the sparsity and the overall perturbation magnitude of the generated perturbations are better than those of the comparison methods.
    2023,34(9):4018-4036, DOI: 10.13328/j.cnki.jos.006879
    [Abstract] (822) [HTML] (28) [PDF 2.31 M] (551)
    Abstract:
    The development of artificial intelligence (AI) technology provides strong support for AI systems based on source code processing. Compared with natural language processing, source code is special in semantic space. Machine learning tasks related to source code processing usually employ abstract syntax trees, data dependency graphs, and control flow graphs to obtain the structured information of codes and extract features. Existing studies can obtain excellent results in experimental scenarios through in-depth analysis of source code structures and flexible application of classifiers. However, for real application scenarios where the source code structures are more complex, most of the AI systems related to source code processing have poor performance and are difficult to implement in the industry, which triggers practitioners to consider the robustness of AI systems. As AI-based systems are generally data-driven black box systems, it is difficult to directly measure the robustness of these software systems. With the emerging adversarial attack techniques, some scholars in natural language processing have designed adversarial attacks for different tasks to verify the robustness of models and conducted large-scale empirical studies. To solve the instability of AI systems based on source code processing in complex code scenarios, this study proposes robustness verification by Metropolis-Hastings attack method (RVMHM). Firstly, the code preprocessing tool based on abstract syntax trees is adopted to extract the variable pool of the model, and then the MHM source code attack algorithm is employed to replace the prediction effect of the variable perturbation model. The robustness of AI systems is measured by observing the changes in the robustness verification index before and after the attack by interfering with the data and model interaction process. With vulnerability prediction as a typical binary classification scenario of source code processing, this study verifies the robustness of 12 groups of AI vulnerability prediction models on three datasets of open source projects to illustrate the RVMHM effectiveness for robustness verification of source code processing based on AI systems.
    2023,34(9):4037-4055, DOI: 10.13328/j.cnki.jos.006872
    [Abstract] (1022) [HTML] (47) [PDF 7.12 M] (979)
    Abstract:
    In recent years, deep neural networks have been widely employed in real decision-making systems. Unfairness in decision-making systems will exacerbate social inequality and harm society. Therefore, researchers begin to carry out a lot of studies on the fairness of deep learning systems, where as most studies focus on group fairness and cannot guarantee fairness within the group. To this end, this study defines two individual fairness calculation methods. The first one is individual fairness rate IFRb based on labels of output, which is the probability of having the same predicted label for two similar samples. The second is individual fairness rate IFRp based on distributions of output, which is the probability of having similar predicted output distribution for two similar samples respectively, and the latter has stricter individual fairness. In addition, this study proposes an algorithm IIFR to improve the individual fairness of these models. The algorithm employs cosine similarity to measure the similarity between samples and then selects similar sample pairs via the similarity threshold decided by different applications. Finally, the output difference of the similar sample pairs is added to the objective function as an individual fairness loss item during the training, which penalizes the similar training samples with large differences in model output to improve the individual fairness of the model. The experimental results show that the proposed IIFR algorithm outperforms the state-of-the-art methods on individual fairness improvement, and can maintain group fairness of models while improving individual fairness.
    2023,34(9):4056-4068, DOI: 10.13328/j.cnki.jos.006873
    [Abstract] (583) [HTML] (13) [PDF 5.71 M] (649)
    Abstract:
    The recent boom in artificial intelligence (AI) benefits from the open collaboration of the open source software (OSS) community. An increasing number of OSS developers are contributing to AI projects by submitting pull requests (PRs). However, the PR quality submitted by external contributors varies, and the AI project management teams have to review PRs and ask contributors to revise them if necessary. Since the revision exerts a direct impact on the review efficiency and acceptance of PRs, it is important to achieve a better understanding of PR revisions. This study conducts an empirical study based on a set of PRs and their revision histories collected from the TensorFlow project. It first manually analyzes a sample of commit messages, reviews PR comments, and constructs a taxonomy of revision types. Then, according to the defined taxonomy, a large set of PR revisions are manually labeled. Based on the dataset, the frequency and order of each type of revision are explored. Additionally, this study also investigates the frequency distribution, order distribution, and correlation relationship between different types of revisions. The empirical findings show that there are 11 different types of revisions which can be classified into three categories. Evolvability revisions occur more frequently than other revision types, and functional revisions are more likely to occur in the early PR updates than evolvability revisions and other types of revisions. Structure-related revisions have a high chance to co-occur or adj-occur with other revisions. Configuration-related revisions or rebasing revisions are more likely to appear in succession. The empirical results can help AI open source practitioners and researchers better understand the PR revision process, especially guide the review and revision behaviors of PRs and improve the collaborative efficiency of open source groups.
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    Available online:  September 20, 2023 , DOI: 10.13328/j.cnki.jos.006838
    Abstract:
    Attendance may be for private purposes, which is not associated with an organization, such as keeping a personal travel log, or it is for business needs, which is part of organizational attendance and sometimes associated with multiple organizations. Therefore, the recording, sharing, and analysis of attendance data require elaborate management. The HAO attendance system is a lightweight and mobile attendance platform. It takes the user and organization as two starting points and is driven by HAO intelligence consisting of human intelligence (HI), artificial intelligence (AI), and organizational intelligence (OI). This study builds the knowledge graph of the HAO attendance system and puts forward the closed-loop authority management structure of the HAO attendance system, supplemented by the privacy authority management method from coarse-gained to fine-gained level to ensure refined attendance management and protect the users’ privacy, thereby promoting the intelligent transformation of a new-generation attendance system. For organizational attendance analysis, a four-element scoring method and a four-element attendance reporting method are designed to calculate employee attendance scores, generate accurate and comprehensive attendance reports, provide decision-making support for organizations, and inspire the vitality of both organizations and individuals, so as to build intelligent organizations with organizational intelligence.
    Available online:  September 20, 2023 , DOI: 10.13328/j.cnki.jos.006955
    Abstract:
    The detection of the human respiration waveform in the sleep state is crucial for applications in intelligent health care as well as medical and healthcare in that different respiration waveform patterns can be examined to analyze sleep quality and monitor respiratory diseases. Traditional respiration sensing methods based on contact devices cause various inconveniences to users. In contrast, contactless sensing methods are more suitable for continuous monitoring. However, the randomness of the device deployment, sleep posture, and human motion during sleep severely restrict the application of contactless respiration sensing solutions in daily life. For this reason, the study proposes a detection method for the human respiration waveform in the sleep state based on impulse radio-ultra wide band (IR-UWB). On the basis of the periodic changes in the propagation path of the wireless pulse signal caused by the undulation of the human chest during respiration in the sleep state, the proposed method generates a fine-grained human respiration waveform and thereby achieves the real-time output of the respiration waveform and high-precision respiratory rate estimation. Specifically, to obtain the position of the human chest during respiration from the received wireless radio-frequency (RF) signals, this study proposes the indicator respiration energy ratio based on IR-UWB signals to estimate the target position. Then, it puts forward a vector projection method based on the in-phase/quadrature (I/Q) complex plane and a method of projection signal selection based on the circumferential position of the respiration vector to extract the characteristic human respiration waveform from the reflected signal. Finally, a variational encoder-decoder network is leveraged to achieve the fine-grained recovery of the respiratory waveform in the sleep state. Extensive experiments and tests are conducted under different conditions, and the results show that the human respiration waveforms monitored by the proposed method in the sleep state are highly similar to the actual waveforms captured by commercial respiratory belts. The average error of the proposed method in estimating the human respiratory rate is 0.229 bpm, indicating that the method can achieve high-precision detection of the human respiration waveform in the sleep state.
    Available online:  September 20, 2023 , DOI: 10.13328/j.cnki.jos.006956
    Abstract:
    It is essential to detect out-of-distribution (OOD) training set samples for a safe and reliable machine learning system. Likelihood-based generative models are popular methods to detect OOD samples because they do not require sample labels during training. However, recent studies show that likelihoods sometimes fail to detect OOD samples, and the failure reason and solutions are under explored, especially for text data. Therefore, this study investigates the text failure reason from the views of the model and data: insufficient generalization of the generative model and prior probability bias of the text. To tackle the above problems, the study proposes a new OOD text detection method, namely Pobe. To address insufficient generalization of the generative model, the study increases the model generalization via KNN retrieval. Next, to address the prior probability bias of the text, the study designs a strategy to calibrate the bias and improve the influence of probability bias on OOD detection by a pre-trained language model and demonstrates the effectiveness of the strategy according to Bayes’ theorem. Experimental results over a wide range of datasets show the effectiveness of the proposed method. Specifically, the average AUROC is over 99%, and FPR95 is below 1% under eight datasets.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006959
    Abstract:
    The openness and ease-of-use of Python make it one of the most commonly used programming languages. The PyPI ecosystem formed by Python not only provides convenience for developers but also becomes an important target for attackers to launch vulnerability attacks. Thus, after discovering Python vulnerabilities, it is critical to deal with Python vulnerabilities by accurately and comprehensively assessing the impact scope of the vulnerabilities. However, the current assessment methods of Python vulnerability impact scope mainly rely on the dependency analysis of packet granularity, which will produce a large number of false positives. On the other hand, existing Python program analysis methods of function granularity have accuracy problems due to context insensitivity and produce false positives when applied to assess the impact scope of vulnerabilities. This study proposes a vulnerability impact scope assessment method for the PyPI ecosystem based on static analysis, namely PyVul++. First, it builds the index of the PyPI ecosystem, then finds the candidate packets affected by the vulnerability through vulnerability function identification, and confirms the vulnerability packets through vulnerability trigger condition. PyVul++ realizes vulnerability impact scope assessment of function granularity, improves the call analysis of function granularity for Python code, and outperforms other tools on the PyCG benchmark (accuracy of 86.71% and recall of 83.20%). PyVul++ is used to assess the impact scope of 10 Python CVE vulnerabilities on the PyPI ecosystem (385855 packets) and finds more vulnerability packets and reduces false positives compared with other tools such as pip-audit. In addition, PyVul++ newly finds that 11 packets in the current PyPI ecosystem still have security issues of referencing unpatched vulnerable functions in 10 assessment experiments of Python CVE vulnerability impact scope.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006948
    Abstract:
    Forgetting is the biggest problem of artificial neural networks in incremental learning and is thus called “catastrophic forgetting”. In contrast, humans can continuously acquire new knowledge and retain most of the frequently used old knowledge. This continuous “incremental learning” ability of human without extensive forgetting is related to the partitioned learning structure and memory replay ability of the human brain. To simulate this structure and ability, the study proposes an incremental learning approach of “recency bias-avoiding self-learning mask (SLM)-based partitioned incremental learning”, or ASPIL for short. ASPIL involves the two stages of regional isolation and regional integration, which are alternately iterated to accomplish continuous incremental learning. Specifically, this study proposes the “Bayesian network (BN)-based sparse regional isolation” method to isolate the new learning process from the existing knowledge and thereby avoid the interference with the existing knowledge. For regional integration, SLM and dual-branch fusion (GBF) methods are proposed. The SLM method can accurately extracts new knowledge and improves the adaptability of the network to new knowledge, while the GBF method integrates the old and new knowledge to achieve the goal of fostering unified and high-precision cognition. During training, a regularization term for Margin Loss is proposed to avoid the “recency bias”, thereby ensuring the further balance of the old knowledge and the avoidance of the bias towards the new knowledge. To evaluate the effectiveness of the proposed method, this study also presents systematic ablation experiments performed on the standard incremental learning datasets CIFAR-100 and miniImageNet and compares the proposed method with a series of well-known state-of-the-art methods. The experimental results show that the method proposed in this study improves the memory ability of the artificial neural network and outperforms the latest well-known methods by more than 5.27% in average identification rate.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006949
    Abstract:
    Deep neural networks can be affected by well-designed backdoor attacks during training. Such attacks are an attack method that controls the model output during tests by injecting data with backdoor labels into the training set. The attacked model performs normally on a clean test set but will be misclassified as the attack target class when the backdoor labels are recognized. The currently available backdoor attack methods have poor invisibility and are still expected to achieve a higher attack success rate. A backdoor attack method based on singular value decomposition is proposed to address the above limitations. The method proposed can be implemented in two ways: One is to directly set some singular values of the picture to zero, and the obtained picture is compressed to a certain extent and can be used as an effective backdoor triggering label. The other is to inject the singular vector information of the attack target class into the left and right singular vectors of the picture, which can also achieve an effective backdoor attack. The backdoor pictures obtained in the two kinds of processing ways are basically the same as the original picture from a visual point of view. According to the experiments, the proposed method proves that singular value decomposition can be effectively leveraged in backdoor attack algorithms to attack neural networks with considerably high success rates on multiple datasets.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006964
    Abstract:
    The domain name plays an important role in cybercrimes. Existing malicious domain name detection methods are not only difficult to use with rich topology and attribute information but also require a large amount of label data, resulting in limited detection effects and high costs. To address this problem, this study proposes a malicious domain name detection method based on graph contrastive learning. The domain name and IP address are taken as two types of nodes in a heterogeneous graph, and the feature matrix of corresponding nodes is established according to their attributes. Three types of meta paths are constructed based on the inclusion relationship between domain names, the measure of similarity, and the correspondence between domain names and IP addresses. In the pre-training stage, the contrast learning model based on the asymmetric encoder is applied to avoid the damage to graph structure and semantics caused by graph data augmentation operation and reduce the demand for computing resources. By using the inductive graph neural network graph encoders HeteroSAGE and HeteroGAT, a node-centric mini-batch training strategy is adopted to explore the aggregation relationship between the target node and its neighbor nodes, which solves the problem of poor applicability of the transductive graph neural networks such as GCN in dynamic scenarios. The downstream classification detection task contrastively utilizes logistic regression and random forest algorithms. Experimental results on publicly available data sets show that detection performance is improved by two to six percentage points compared with that of related works.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006945
    Abstract:
    Jacobi computation is a kind of stencil computation, which has been widely applied in the field of scientific computing. The performance optimization of Jacobi computation is a classic topic, where loop tiling is an effective optimization method. The existing loop tiling methods mainly focus on the impact of tiling on parallel communication and program locality and fail to consider other factors such as load balancing and vectorization. This study analyzes and compares several tiling methods based on multi-core computing architecture and chooses an advanced hexagonal tiling as the main method to accelerate Jacobi computation. For tile size selection, this study proposes a hexagonal tile size selection algorithm called Hexagon_TSS by comprehensively considering the impact of tiling on load balancing, vectorization efficiency, and locality. The experimental results show that the L1 data cache miss rate can be reduced to 5.46% of original serial program computation in the best case by Hexagon_TSS, and the maximum speedup reaches 24.48. The proposed method also has excellent scalability.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006946
    Abstract:
    In the fields of autonomous driving, augmented reality, and intelligent mobile robots, visual relocalization is a crucial fundamental issue. It refers to the issue of determining the position and attitude in an existing prior map according to the data captured in real time by visual sensors. In the last decades, visual relocalization has received extensive attention, and numerous kinds of prior map construction methods and visual relocalization methods have come to the fore. These efforts vary considerably and cover a wide scope, but technical overviews and summaries are still unavailable. Therefore, a survey of the field of visual relocalization is valuable both theoretically and practically. This study tries to construct a unified blueprint for visual relocalization methods and summarize related studies from the perspective of image data querying from large-scale map databases. This study surveys various types of construction methods for map databases and different feature matching, relocalization, and pose calculation approaches. It then summarizes the current mainstream datasets for visual relocalization and finally analyzes the challenges ahead and the potential development directions of visual relocalization.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006947
    Abstract:
    Software change prediction, aimed at identifying change-prone modules, can help software managers and developers allocate resources efficiently and reduce maintenance overhead. Extracting effective features from the code plays a vital role in the construction of accurate prediction models. In recent years, researchers have shifted from traditional hand-crafted features to semantic features with powerful representation capabilities for prediction. They extracted semantic features from abstract syntax tree (AST) node sequences to build models. However, existing studies have ignored the structural information in the AST and the rich semantic information in the code. How to extract the semantic features of the code is still a challenging problem. For this reason, the study proposes a change prediction method based on hybrid graph representation. To start with, the model combines AST, control flow graph (CFG), data flow graph (DFG), and other structural information to construct the program graph representation of the code. Then, it uses the graph neural network to learn the semantic features of the program graph and the features obtained to predict change-proneness. The model can integrate various semantic information to represent the code better. The effectiveness of the proposed method is verified by comparing it with the latest change prediction methods on various change datasets.
    Available online:  September 13, 2023 , DOI: 10.13328/j.cnki.jos.006928
    Abstract:
    Detecting out-of-distribution (OOD) samples outside the training set distribution is crucial for deploying deep neural network (DNN) classifiers in the open environment. OOD sample detection is a binary classification problem, which is to classify the input samples into the in-distribution (ID) or OOD categories. Then, the detector itself can be re-bypassed by malicious adversarial attacks. These OOD samples with malicious perturbations are called adversarial OOD samples. Building robust OOD detectors to detect adversarial OOD samples is more challenging. Existing methods usually train DNN through adversarial OOD samples within the neighborhood of auxiliary clean OOD samples to learn separable and robust representations to malicious perturbations. However, due to the distributional differences between the auxiliary OOD training set and original ID training set, training adversarial OOD samples is not effective enough to ensure the robustness of ID boundary against adversarial perturbations. Adversarial ID samples generated from within the neighborhood of (clean) ID samples are closer to the ID boundary and are also effective in improving the adversarial robustness of the ID boundary. This study proposes a semi-supervised adversarial training approach, DiTing, to build robust OOD detectors to detect clean and adversarial OOD samples. This approach treats the adversarial ID samples as auxiliary near-OOD samples and trains them jointly with other auxiliary clean and adversarial OOD samples to improve the robustness of OOD detection. Experiments show that DiTing has a significant advantage in detecting adversarial OOD samples generated by strong attacks while maintaining state-of-the-art performance in classifying clean ID samples and detecting clean OOD samples. Code is available at https://gitee.com/zhiyang3344/diting.
    Available online:  September 06, 2023 , DOI: 10.13328/j.cnki.jos.006960
    Abstract:
    Thanks to the low storage cost and high retrieval speed, graph-based unsupervised cross-modal hash learning has attracted much attention from academic and industrial researchers and has been an indispensable tool for cross-modal retrieval. However, the high computational complexity of graph structures prevents its application in large-scale multi-modal applications. This study mainly attempts to solve two important challenges facing graph-based unsupervised cross-modal hash learning: 1) How to efficiently construct graphs in unsupervised cross-modal hash learning? 2) How to handle the discrete optimization in cross-modal hash learning? To address such two problems, this study presents anchor-based cross-modal learning and a differentiable hash layer. To be specific, the study first randomly samples some image-text pairs from the training set as anchor sets and uses the anchor sets as the agent to compute the graph matrix of each batch of data. The graph matrix is used to guide cross-modal hash learning, thus remarkably reducing the space and time cost; second, the proposed differentiable hash layer directly adopts binary coding for computation during network forward propagation and produces gradient to update the network without continuous-value relaxation during backpropagation, thus embracing better hash encoding performance. Finally, the study introduces cross-modal ranking loss to consider the ranking results in the training process and improve the cross-modal retrieval accuracy. To verify the effectiveness of the proposed algorithm, the study compares the algorithm with 10 cross-modal hash algorithms on three general data sets.
    Available online:  September 06, 2023 , DOI: 10.13328/j.cnki.jos.006963
    Abstract:
    Aspect-level sentiment classification task, which aims to determine the sentiment polarity of a given aspect, has attracted increasing attention due to its broad applications. The key to this task is to identify contextual descriptions relevant to the given aspect and predict the aspect-related sentiment orientation of the author according to the context. Statistically, it is found that close to 30% of reviews convey a clear sentiment orientation without any explicit sentiment description of the given aspect, which is called implicit sentiment expression. Recent attention mechanism-based neural network methods have gained great achievement in sentiment analysis. However, this kind of method can only capture explicit aspect-related sentiment descriptions but fails to effectively explore and analyze implicit sentiment, and it often models aspect words and sentence contexts separately, which makes the expression of aspect words lack contextual semantics. To solve the above two problems, this study proposes an aspect-level sentiment classification method that integrates local aspect information and global sentence context information and improves the classification performance of the model by curriculum learning according to different classification difficulties of implicit and explicit sentiment sentences. Experimental results show that the proposed method not only has a high accuracy in identifying the aspect-related sentiment orientation of explicit sentiment sentences but also can effectively learn the sentiment categories of implicit sentiment sentences.
    Available online:  September 06, 2023 , DOI: 10.13328/j.cnki.jos.006969
    Abstract:
    As an essential component of real-time system design, priority is utilized to resolve conflicts in resource sharing and design for safety. For real-time systems that introduce priorities, each task is assigned a priority, which leads to the possibility of low-priority tasks being preempted by high-priority tasks at runtime, thus creating a preemptive scheduling problem for real-time systems. Existing research on this problem lacks a modeling and automatic verification method that can visually represent the priority of tasks and the dependencies between tasks. To this end, a preemptive priority timed automata (PPTA) is proposed and a preemptive priority timed automata network (PPTAN) is introduced. First, the priority of a task is represented by adding the priority of migration to the timed automata, and then the migration is adopted to correlate tasks with dependencies so that PPTA can be applied to model real-time tasks with priority. The blocking position is also added to the timed automata, so PPTAN can be used to model the priority preemptive scheduling problem. Second, a model-based transformation method is proposed to map the PPTA to the automatic verification tool UPPAAL. Finally, by modeling an example of a multi-core multi-task real-time system and comparing it with other models, it is shown that this model is not only suitable for modeling the priority preemptive scheduling problem but also for accurately verifying and analyzing it.
    Available online:  September 06, 2023 , DOI: 10.13328/j.cnki.jos.006979
    Abstract:
    When prototypical networks are directly applied to few-shot named entity recognition (FEW-NER), there are the following problems: Non-entities do not have strong semantic relationships with each other, and using the same way to construct the prototype for both entities and non-entities will make non-entity prototypes fail to accurately represent the semantic characteristics of non-entities; using only the average entity vector as the computing method of the prototype will make it difficult to capture similar entities with different semantic features. To address these problems, this study proposes a FEW-NER based on fine-grained prototypical networks (FNFP) to improve the annotation effect of FEW-NER. Firstly, different non-entity prototypes are constructed for different query sets to capture the key semantic features of non-entities in sentences and obtain finer-grained prototypes to improve the recognition effect of non-entities. Then, an inconsistent metric module is designed to measure the inconsistency between similar entities, and different metric functions are applied to entities and non-entities, so as to reduce the feature representation between similar samples and improve the feature representation of the prototype. Finally, a Viterbi decoder is introduced to capture the label transformation relationship and optimize the final annotation sequence. The experimental results show that the performance of the proposed method is improved compared with that of the large-scale FEW-NER dataset, namely FEW-NERD; and the generalization ability of this method in different domain scenarios is verified on the cross-domain dataset.
    Available online:  September 06, 2023 , DOI: 10.13328/j.cnki.jos.006980
    Abstract:
    A large number of bug reports are generated during software development and maintenance, which can help developers to locate bugs. Information retrieval based bug localization (IRBL) analyzes the similarity of bug reports and source code files to locate bugs, achieving high accuracy at the file and function levels. However, a lot of labor and time costs are consumed to find bugs from suspicious files and function fragments due to the coarse location granularity of IRBL. This study proposes a statement level software bug localization method based on historical bug information retrieval, STMTLocator. Firstly, it retrieves historical bug reports which are similar to the bug report of the program under test and extracts the bug statements from the historical bug reports. Then, it retrieves the suspicious files according to the text similarity between the source code files and the bug report of the program under test, and extracts the suspicious statements from the suspicious files. Finally, it calculates the similarity between the suspicious statements and the historical bug statements, and arranges them in descending order to localize bug statements. To evaluate the bug localization performance of STMTLocator, comparative experiments are conducted on the Defects4J and JIRA dataset with Top@N, MRR, and other evaluation metrics. The experimental results show that STMTLocator is nearly four times than the static bug localization method BugLocator in terms of MRR and locates 7 more bug statements for Top@1. The average time used by STMTLocator to locate a bug version is reduced by 98.37% and 63.41% than dynamic bug localization methods Metallaxis and DStar, and STMTLocator has a significant advantage of not requiring the construction and execution of test cases.
    Available online:  September 06, 2023 , DOI: 10.13328/j.cnki.jos.006950
    Abstract:
    Blockchain is the basis of the Internet of value. However, data and value silos arise from independent blockchain systems. Blockchain interoperability (also known as cross-chain operability) is essential for breaking inter-chain barriers and building a blockchain network. After differentiating between the blockchain interoperability in the narrow sense and that in the broad sense, this study redefines the former concept and abstracts out two primary operations: cross-chain reading and cross-chain writing. Subsequently, it summarizes three key technical problems that need to be resolved for achieving the blockchain interoperability in the narrow sense: cross-chain information transmission, cross-chain trust transfer, and cross-chain operation atomicity guarantee. Then, the study reviews the current research status of the three problems systematically and makes comparisons from multiple perspectives. Furthermore, it analyzes some representative holistic solutions from the perspective of the key technical problems. Finally, several research directions deserving of further exploration are also presented.
    Available online:  August 30, 2023 , DOI: 10.13328/j.cnki.jos.006961
    Abstract:
    Fault localization collects and analyzes the runtime information of test case sets to evaluate the suspiciousness of each statement of being faulty. Test case sets are constructed by the data from the input domain and have two types, i.e., passing test cases and failing ones. Since failing test cases generally account for a very small portion of the input domain, and their distribution is usually random, the number of failing test cases is much fewer than that of passing ones. Previous work has shown that the lack of failing test cases leads to a class-imbalanced problem of test case sets, which severely hampers fault localization effectiveness. To address this problem, this study proposes a model-domain data augmentation approach using generative adversarial networks for fault localization. Based on the model domain (i.e., spectrum information of fault localization) rather than the traditional input domain (i.e., program input), this approach uses the generative adversarial network to synthesize the model-domain failing test cases covering the minimum suspicious set, so as to address the class-imbalanced problem from the model domain. The experimental results show that the proposed approach significantly improves the effectiveness of 12 representative fault localization approaches.
    Available online:  August 30, 2023 , DOI: 10.13328/j.cnki.jos.006962
    Abstract:
    With the rapid development of Internet information technologies, the explosive growth of online learning resources has caused the problem of “information overload” and “learning disorientation”. In the absence of expert guidance, it is difficult for users to identify their learning demands and select the appropriate content from the vast amount of learning resources. Educational domain recommendation methods have received a lot of attention from researchers in recent years because they can provide personalized recommendations of learning resources based on the historical learning behaviors of users. However, the existing educational domain recommendation methods ignore the modeling of complex relationships among knowledge points in learning demand perception and fail to consider the dynamic changes of users’ learning demands, which leads to inaccurate learning resource recommendations. To address the above problems, this study proposes a knowledge point recommendation method based on static and dynamic learning demand perception, which models users’ learning behaviors under complex knowledge association by combining static perception and dynamic perception. For static learning demand perception, this study innovatively designs an attentional graph convolutional network based on the first-course-following meta-path guidance of knowledge points, which can accurately capture users’ static learning demands at the fine-grained knowledge point level by modeling the complex constraints of the first-course-following relationship between knowledge points and eliminating the interference of other non-learning demand factors. For dynamic learning demand perception, the method aggregates knowledge point embeddings to characterize users’ knowledge levels at different moments by taking courses as units and then uses a recurrent neural network to encode users’ knowledge level sequences, which can effectively explore the dynamic learning demands hidden in users’ knowledge level changes. Finally, this study fuses the obtained static and dynamic learning demands, models the compatibility between static and dynamic learning demands in the same framework, and promotes the complementarity of these two learning demands to achieve fine-grained and personalized knowledge point recommendations. Experiments show that the proposed method can effectively perceive users’ learning demands, provide personalized knowledge point recommendations on two publicly available datasets, and outperform the mainstream recommendation methods in terms of various evaluation metrics.
    Available online:  August 30, 2023 , DOI: 10.13328/j.cnki.jos.006923
    Abstract:
    Kernel heap vulnerability is currently one of the main threats to operating system security. User-space attackers can leak or modify sensitive kernel information, disrupt kernel control flow, and even gain root privilege by triggering a vulnerability. However, due to the rapid increase in the number and complexity of vulnerabilities, it often takes a long time from when a vulnerability is first reported to when the developer issues a patch, and kernel mitigation mechanisms currently adopted are usually steadily bypassed. Therefore, this study proposes an eBPF-based dynamic mitigation framework for kernel heap vulnerabilities, so as to reduce kernel security risks during the time window fixing. The framework adopts data object space randomization to assign random addresses to the data objects involved in vulnerability reports at each allocation. In addition, it takes full advantage of the dynamic and secure features of eBPF to inject space-randomized objects into the kernel during runtime, so the attacker cannot place any attack payload accurately, and the heap vulnerabilities are almost unexploitable. This study evaluates 40 real kernel heap vulnerabilities and collects 12 attacks that bypass the existing mitigation mechanisms for further analysis and tests. As a result, it verifies that the dynamic mitigation framework provides sufficient security. Performance tests show that even under severe conditions, the four types of data objects only cause performance loss of about 1% and negligible memory loss to the system, and there is almost no additional performance loss when the number of protected objects increases. Compared with related work, the mechanism in this study has a wider scope of application and stronger security, and it does not require vulnerability patches issued by security experts. Furthermore, it can generate mitigation procedures according to vulnerability reports and has a broad application prospect.
    Available online:  August 30, 2023 , DOI: 10.13328/j.cnki.jos.006925
    Abstract:
    Regular expressions are widely used in various areas of computer science. However, due to the complex syntax and the use of a large number of meta-characters, regular expressions are quite error-prone when defined and used by developers. Testing is a practical and effective way to ensure the semantic correctness of regular expressions. The most common method is to generate a set of character strings according to the tested expression and check whether they comply with the intended language. Most of the existing test data generation focuses only on positive strings. However, empirical study shows that a majority of errors during actual development are manifested by the fact that the defined language is smaller than the intended one. In addition, such errors can only be detected by negative strings. This study investigates the generation of negative strings from regular expressions based on mutation. The study first obtains a set of mutants by injecting defects into the tested expression through mutation and then selects a negative character string in the complementary set of the language defined by the tested expression to reveal the error simulated by the corresponding mutant. In order to simulate complex defects and avoid the problem that the negative strings cannot be obtained due to the specialization of mutants, a second-order mutation mechanism is adopted. Meanwhile, optimization techniques such as redundant mutant elimination and mutation operator selection are used to reduce the mutants, so as to control the size of the finally generated test set. The experimental results show that the proposed algorithm can generate negative test strings with a moderate size and have strong error detection ability compared with the existing tools.
    Available online:  August 30, 2023 , DOI: 10.13328/j.cnki.jos.006942
    [Abstract] (473) [HTML] (0) [PDF 1.38 M] (13482)
    Abstract:
    The network protocol software is widely deployed and applied, and it provides diversified functions such as communication, transmission, control, and management in cyberspace. In recent years, its security has gradually attracted the attention of academia and industry. Timely finding and repairing network protocol software vulnerabilities has become an important topic. The features, such as diversified deployment methods, complex protocol interaction processes, and functional differences in multiple protocol implementations of the same protocol specification, make the vulnerability mining technique of network protocol software face many challenges. This study first classifies the vulnerability mining technologies of network protocol software and defines the connotation of existing key technologies. Secondly, this study summarizes the technical progress in four aspects of network protocol software vulnerability mining, including network protocol description method, mining object adaptation technology, fuzz testing technology, and vulnerability mining method based on program analysis. In addition, through comparative analysis, the technical advantages and evaluation dimensions of different methods are summarized. Finally, this study summarizes the technical status and challenges of network protocol software vulnerability mining and proposes five potential research directions.
    Available online:  August 30, 2023 , DOI: 10.13328/j.cnki.jos.006944
    Abstract:
    As an essential mechanism of group collaboration in software development, code comments are widely used by developers to improve the efficiency of specific developing tasks. However, code comments do not directly affect the software operation, and developers often ignore them, which leads to poor quality of code comments and affects development efficiency. Quality issues of code comments hinder code understanding, bring misunderstanding, or even introduce bugs, which receive widespread attention from researchers. This study systematically analyzes the research work of global scholars on quality issues of code comments in recent years by literature review. It also summarizes related studies in three aspects: evaluation dimensions of code comment quality, indicators of code comment quality, and strategies to promote code comment quality and points out shortcomings, challenges, and suggestions for the current research.
    Available online:  August 30, 2023 , DOI: 10.13328/j.cnki.jos.006953
    Abstract:
    The effectiveness of a test suite in defect detection refers to the extent to which the test suite could detect the defects hidden in the software. How to evaluate this performance of a test suite is an important issue. Coverage and mutation score are two of the most important and widely used metrics for test suite effectiveness. To quantify the defect detection capability of a test suite, researchers have devoted a large amount of research effort to this issue and have made significant progress. However, inconsistent conclusions can be observed among the existing studies, and some challenges still call for prompt resolution in the area. This study systematically summarizes the research results achieved by scholars both in China and abroad in the field of the evaluation of test suite effectiveness over the years. To start with, it expounds the problems in the research on the evaluation of test suite effectiveness. Then, it outlines and analyzes the evaluation of test suite effectiveness based on coverage and mutation score and presents the application of the evaluation of test suite effectiveness in test suite optimization. Finally, the study points out the challenges faced by this line of research and suggests the directions of future research.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006926
    Abstract:
    Hypergraphs are generalized representations of ordinary graphs, which are common in many application areas, including the Internet, bioinformatics, and social networks. The independent set problem is a fundamental research problem in the field of graph analysis. Most of the traditional independent set algorithms are targeted for ordinary graph data, and how to achieve efficient maximum independent set mining on hypergraph data is an urgent problem to be solved. To address this problem, this study proposes a definition of hypergraph independent sets. Firstly, two properties of hypergraph independent set search are analyzed, and then a basic algorithm based on the greedy strategy is proposed. Then a pruning framework for hypergraph approximate maximum independent set search is proposed, i.e., a combination of exact pruning and approximate pruning, which reduces the size of the graph by the exact pruning strategy and speeds up the search by the approximate pruning strategy. In addition, four efficient pruning strategies are proposed in this study, and a theoretical proof of each pruning strategy is presented. Finally, experiments are conducted on 10 real hypergraph data sets, and the results show that the pruning algorithm can efficiently search for hypergraph maximum independent sets that are closer to the real results.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006927
    Abstract:
    Entity recognition is a key technology for information extraction. Compared with ordinary text, the entity recognition of Chinese medical text is often faced with a large number of nested entities. Previous methods of entity recognition often ignore the entity nesting rules unique to medical text and directly use sequence annotation methods. Therefore, a Chinese entity recognition method that incorporates entity nesting rules is proposed. This method transforms the entity recognition task into a joint training task of entity boundary recognition and boundary first-tail relationship recognition in the training process and filters the results by combining the entity nesting rules summarized from actual medical text in the decoding process. In this way, the recognition results are in line with the composition law of the nested combinations of inner and outer entities in the actual text. Good results have been achieved in public experiments on entity recognition of medical text. Experiments on the dataset show that the proposed method is significantly superior to the existing methods in terms of nested-type entity recognition performance, and the overall accuracy is increased by 0.5% compared with the state-of-the-art methods.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006929
    Abstract:
    As a privacy-preserving digital identity authentication technology, anonymous credentials not only authenticate the validity of the users’ digital identity but also protect the privacy of their identity. Anonymous credentials are widely applied in anonymous authentication, anonymous tokens, and decentralized digital identity systems. Existing anonymous credentials usually adopt the commitment-signature-proof paradigm, which requires that the adopted signature scheme should have the re-randomization property, such as CL signatures, PS signatures, and structure-preserving signatures (SPS). In practical applications, ECDSA, Schnorr, and SM2 are widely employed for digital identity authentication, but they lack the protection of user identity privacy. Therefore, it is of certain practical significance to construct anonymous credentials compatible with ECDSA, Schnorr, SM2, and other digital signatures, and protect identity privacy during the authentication. This study explores anonymous credentials based on SM2 digital signature. Pedersen commitment is utilized to commit the user attributes in the registration phase. Meanwhile, according to the structural characteristics of SM2, the signed message is H(m), and the equivalence between the Pedersen commitment message and the hash commitment message is proven. This study also employs ZKB++ technology to prove the equivalence of algebraic and non-algebraic statements. The commitment message is transformed to achieve the cross-domain proof and issue the users’ credentials based on the SM2 digital signature. In the showing phase of anonymous credentials, the zero-knowledge proof is combined to prove the possession of an SM2 signature and ensure the anonymity of credentials. This study provides the construction of an anonymous credential protocol based on SM2 digital signature and proves the security of this protocol. Finally, it also verifies the effectiveness and feasibility of the protocol by analyzing the computational complexity of the protocol and testing the algorithm execution efficiency.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006930
    Abstract:
    Since the Snowden revelations, threats from backdoor attacks represented by algorithm substitution attack (ASA) have been widely concerned. This kind of attack subverts the process of the algorithm that tampers with the cryptographic protocol participants in an undetectable manner, which embeds backdoors to obtain secrets. Building a cryptographic reverse firewall (CRF) for protocol participants is a well-known and feasible approach against ASA. Identity-based encryption (IBE), as a quite applicable public key infrastructure, has vital importance to be protected by appropriate CRF schemes. However, the existing work only realizes the CRF re-randomization, ignoring the security risk of sending users’ private keys directly to the third-party CRF. Given the above problem, the formal definition and security model of security properties of CRF applicable to IBE are proposed. Then, the formal definition of rerandomizable and key-malleable secure channel free IBE (RKM-SCF-IBE) and the method of transforming traditional IBE to RKM-SFC-IBE are presented. In addition, an approach to increasing anonymity is also given. Finally, a generic provably secure framework of CRF construction for IBE is proposed based on RKM-SFC-IBE, with several instantiations from classic IBE schemes in the standard model and simulation results with optimization methods. Compared with existing work, the proposed scheme is proven secure under a more complete security model with a generic approach to building CRF for IBE schemes and clarifies the basic principles when constructing CRF for more expressive encryption schemes.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006931
    Abstract:
    Accurately extracting two types of information including elements and clauses in contract texts can effectively improve the contract review efficiency and provide facilitation services for all trading parties. However, current contract information extraction methods generally train single-task models to extract elements and clauses separately, whereas they do not dig deep into the characteristics of contract texts, ignoring the relevance among different tasks. Therefore, this study employs a deep neural network structure to study the correlation between the two tasks of element extraction and clause extraction and proposes a multitask learning method. Firstly, the primary multitask learning model is built for contract information extraction by combining the above two tasks. Then, the model is optimized and attention mechanism is adopted to further explore the correlation. Additionally, an Attention-based dynamic multitask-learning model is built. Finally, based on the above two methods, adynamic multitask learning model with lexical knowledge is proposed for the complex semantic environment in contract texts. The experimental results show that the method can fully capture the shared features among tasks and yield better information extraction results than the single-task model. It can solve the nested entity among elements and clauses in contract texts, and realize the joint information extraction of contract elements and clauses. In addition, to verify the robustness of the proposed method, this study conducts experiments on public datasets in various fields, and the results show that the proposed method is superior to baseline methods.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006932
    Abstract:
    Adversarial texts are malicious samples that can cause deep learning classifiers to make errors. The adversary creates an adversarial text that can deceive the target model by adding subtle perturbations to the original text that are imperceptible to humans. The study of adversarial text generation methods can evaluate the robustness of deep neural networks and contribute to the subsequent robustness improvement of the model. Among the current adversarial text generation methods designed for Chinese text, few attack the robust Chinese BERT model as the target model. For Chinese text classification tasks, this study proposes an attack method against Chinese BERT, that is Chinese BERT Tricker. This method adopts a character-level word importance scoring method, important Chinese character positioning. Meanwhile, a word-level perturbation method for Chinese based on the masked language model with two types of strategies is designed to achieve the replacement of important words. Experimental results show that for the text classification tasks, the proposed method can significantly reduce the classification accuracy of the Chinese BERT model to less than 40% on two real datasets, and it outperforms other baseline methods in terms of multiple attack performance.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006933
    Abstract:
    Graph data, such as citation networks, social networks, and transportation networks, exist widely in the real world. Graph neural networks (GNNs) have attracted extensive attention due to their strong expressiveness and excellent performance in a variety of graph analysis applications. However, the excellent performance of GNNs benefits from label data which are difficult to obtain, and complex network models with high computational costs. Knowledge distillation (KD) is introduced into the GNNs to address the labeled data scarcity and high complexity of GNNs. KD is a method of training constructed small models (student models) by soft-label supervision information from larger models (teacher models) to yield better performance and accuracy. Therefore, how to apply the KD technology to graph data has become a research challenge, but there is still a lack of a graph-based KD research review. Aiming at providing a comprehensive overview of KD based on graphs, this study first summarizes the existing studies and fills in the review gap in this field. Specifically, this study first introduces the background knowledge of graph and KD. Then, three types of graph-based knowledge distillation methods are comprehensively summarized, including graph knowledge distillation for deep neural networks (DNNs), graph knowledge distillation for GNNs, and self-KD-based graph knowledge distillation. Furthermore, each type of method is further divided into knowledge distillation methods based on the output layer, the middle layer, and the constructed graph. Subsequently, the design ideas of various graph-based knowledge distillation algorithms are analyzed and compared, and the advantages and disadvantages of the algorithms are concluded with experimental results. In addition, the application of graph-based knowledge distillation in computer vision, natural language processing, recommendation systems, and other fields are also listed. Finally, the development of graph-based knowledge distillation is summarized and prospected. This study also discloses the references related to graph-based knowledge distillation on GitHub. Please refer to https://github.com/liujing1023/Graph-based-Knowledge-Distillation.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006935
    Abstract:
    As the modern software scale expands, software vulnerabilities bring a great threat to the security and reliability of computer systems and software, causing huge damage to people’s production and life. In recent years, as open source software (OSS) is widely used, the vulnerability issues of OSS have received much attention. Vulnerability awareness techniques can effectively help OSS users to identify vulnerabilities at the early stage for timely defense. Different from the vulnerability detection techniques for traditional software, the transparency and cooperativity of OSS vulnerabilities bring great challenges to vulnerability awareness. Therefore, various techniques are proposed by scholars and developers to perceive potential vulnerabilities and risks in OSS from the code and open source community, so as to find OSS vulnerabilities as early as possible and reduce the losses caused by the vulnerabilities. To boost the development of OSS vulnerability awareness techniques, this study conducts a systematic literature review of existing research works. The study selects 45 high-level papers on open source vulnerability awareness techniques, including code-based, open source community discussion-based, and patch-based vulnerability awareness techniques. The results of these papers are systematically summarized. Especially, this study proposes the category of techniques based on the OSS vulnerability life cycle for the first time according to the most recent publications, which supplements and improves the existing taxonomy of vulnerability awareness techniques. Finally, the study discusses the challenges in the field and predicts future research direction.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006936
    Abstract:
    As a new learning paradigm to solve the problem of label ambiguity, label distribution learning (LDL) has received much attention in recent years. To further improve the prediction performance of LDL, this study proposes an LDL based on deep forest and heterogeneous ensemble (LDLDF), which uses the cascade structure of deep forest to simulate deep learning models with multi-layer processing structure and combines multiple heterogeneous classifiers in the cascade layer to increase the diversity of ensemble. Compared with other existing LDL methods, LDLDF can process information layer by layer and learn better feature representations to mine rich semantic information in data, and it has better representation learning ability and generalization ability. In addition, by considering the degradation problem of deep models, LDLDF adopts a layer feature reuse mechanism to reduce the training error of the model, which effectively utilizes the prediction ability of each layer in the deep model. Sufficient experimental results show that LDLDF is superior to other methods.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006937
    Abstract:
    Object detection is widely used in various fields such as autonomous driving, industry, and medical care. Using the object detection algorithm to solve key tasks in different fields has gradually become the main method. However, the robustness of the object detection model based on deep learning is seriously insufficient under the attack of adversarial samples. It is easy to make the model prediction wrong by adding the adversarial samples constructed by small perturbations, which greatly limits the application of the object detection model in key security fields. In practical applications, the models are black-box models. Related research on black-box attacks against object detection models is relatively lacking, and there are many problems such as incomplete robustness evaluation, low attack success rate of black-box, and high resource consumption. To address the aforementioned issues, this study proposes a black-box object detection attack algorithm based on a generative adversarial network. The algorithm uses the generative network fused with an attention mechanism to output the adversarial perturbations and employs the alternative model loss and the category attention loss to optimize the generated network parameters, which can support two scenarios of target attack and vanish attack. A large number of experiments are conducted on the Pascal VOC and the MSCOCO datasets. The results demonstrate that the proposed method has a higher black-box transferable attack success rate and can perform transferable attacks between different datasets.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006938
    Abstract:
    Network management and monitoring are crucial topics in the network field, with the technologies used to achieve this being referred to as network measurement. In particular, network heavy hitter detection is an important technique of network measurement, and it is analyzed in this study. Heavy hitters are flows that exceed an established threshold in terms of occupied network resources (bandwidth or the number of packets transmitted). Detecting heavy hitters can contribute to quick anomaly detection and more efficient network operation. However, the implementation of heavy hitter detection is impacted by high-speed links. Traditional methods and software defined network (SDN)-based methods are two categories of heavy hitter detection methods that have been developed over time. This study reviews the related frameworks and algorithms, systematically summarizes the development and current status, and finally tries to predict future research directions of network heavy hitter detection.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006914
    Abstract:
    The training of high-precision federated learning models consumes a large number of users’ local resources. The users who participate in the training can gain illegal profits by selling the jointly trained model without others’ permission. In order to protect the property rights of federated learning models, this study proposes a federated learning watermark based on backdoor (FLWB) by using the feature that deep learning backdoor technology maintains the accuracy of main tasks and only causes misclassification in a small number of trigger set samples. FLWB allows users who participate in the training to embed their own private watermarks in the local model and then map the private backdoor watermarks to the global model through the model aggregation in the cloud as the global watermark for federated learning. Then a stepwise training method is designed to enhance the expression effect of private backdoor watermarks in the global model so that FLWB can accommodate the private watermarks of the users without affecting the accuracy of the global model. Theoretical analysis proves the security of FLWB, and experiments verify that the global model can effectively accommodate the private watermarks of the users who participate in the training by only causing an accuracy loss of 1% of the main tasks through the stepwise training method. Finally, FLWB is tested by model compression and fine-tuning attacks. The results show that more than 80% of the watermarks can be retained when the model is compressed to 30% by FLWB, and more than 90% of the watermarks can be retained under four different fine-tuning attacks, which indicates the excellent robustness of FLWB.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006924
    Abstract:
    Software defect localization refers to the activity of finding program elements that are related to software failure. The existing defect localization techniques, however, can only produce localization results at the function or statement level. These coarse-grained localization results can affect the efficiency and effectiveness of manual debugging and automatic software defect repair. This study focuses on the fine-grained identification of specific code tokens that lead to software defects. The study establishes abstract syntax tree paths for code tokens and proposes a fine-grained defect localization model based on a pointer neural network to predict specific code tokens of defects and specific operation behaviors of repairing the tokens. A large number of defect patch data sets in open-source projects contain a large amount of trainable data, and the paths constructed based on abstract syntax trees can effectively capture the program’s structural information. Experimental results show that the model trained in this study can accurately predict defect code tokens and is significantly better than the baseline methods based on statistics and machine learning. In addition, in order to verify that fine-grained defect localization results can contribute to automatic defect repair, two kinds of program repair processes are designed based on the fine-grained defect localization results. The processes are implemented by using code completion tools to predict the correct token or by following heuristic rules to find appropriate code repair elements. The results show that both methods can effectively solve the overfitting problem in automatic software defect repair.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006941
    Abstract:
    The transport layer is a key component in the network protocol stack, which is responsible for providing end-to-end services for applications between different hosts. Existing transport layer protocols such as TCP provide users with some basic security protection mechanisms, e.g., error controls and acknowledgments, which ensures the consistency of datagrams sent and received by applications between different hosts to a certain extent. However, these security protection mechanisms of the transport layer have serious flaws. For example, the sequence number of TCP datagrams is easy to be guessed and inferred, and the calculation of the datagram’s checksum depends on the vulnerable sum of the complement algorithm. As a result, the existing transport layer security mechanisms cannot guarantee the integrity and security of the datagram, which allows a remote attacker to craft a fake datagram and inject it into the target network stream, thus poisoning the target network stream. The attack against the transport layer occurs at the basic layers of the network protocol stack, which can bypass the security protection mechanisms enforced at the upper application layer and thus cause serious damage to the network infrastructure. After investigating various attacks over network protocols and the related security vulnerabilities in recent years, this study proposes a method for enhancing the security of the transport layer? based on lightweight chain verification, namely LightCTL. Based on the hash verification, LightCTL enables both sides of a TCP connection to create a mutually verifiable consensus on transport layer datagrams, so as to prevent attackers or middlemen from stealing and forging sensitive information. As a result, LightCTL can successfully foil various attacks against the network protocol stack, including TCP connection reset attacks based on sequence number inferring, TCP hijacking attacks, SYN flooding attacks, man-in-the-middle attacks, and datagram replay attacks. Besides, LightCTL does not need to modify the protocol stack of intermediate network devices such as routers. It only needs to modify the checksum and the related parts of the end protocol stack. Therefore, LightCTL can be easily deployed and significantly improves the security of network systems.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006951
    Abstract:
    Fact verification is intended to check whether a textual statement is supported by a given piece of evidence. Due to the structural dependence and implicit content of tables, the task of fact verification with tables as the evidence still faces many challenges. Existing literature has either used logical expressions to parse statements based on tabular evidence or designed table-aware neural networks to encode statement-table pairs and thereby accomplish table-based fact verification tasks. However, these approaches fail to fully utilize the implicit tabular information behind the statements, which leads to the degraded inference performance of the model. Moreover, Chinese statements based on tabular evidence have more complex syntax and semantics, which also adds to the difficulties in model inference. For this reason, the study proposes a method of fact verification with Chinese tabular data based on the capsule heterogeneous graph attention network (CapsHAN). This method can fully understand the structure and semantics of statements. On this basis, the tabular information implied by the statements is mined and utilized to effectively improve the accuracy of table-based fact verification tasks. Specifically, a heterogeneous graph is constructed by performing syntactic dependency parsing and named entity recognition of statements. Subsequently, the graph is learned and understood by the heterogeneous graph attention network and the capsule graph neural network, and the obtained textual representation of the statements is sliced together with the textual representation of the encoded tables. Finally, the result is predicted. Further, this study also attempts to address the problem that the datasets of fact verification based on Chinese tables are scarce and thus unable to support the performance evaluation of table-based fact verification methods. For this purpose, the study transforms the mainstream English table-based fact verification datasets TABFACT and INFOTABS into Chinese and constructs a dataset that is based on the uniform content label (UCL) national standard and specifically tailored to the characteristics of Chinese tabular data. This dataset, namely, UCLDS, takes Wikipedia infoboxes as evidence of manually annotated natural language statements and labels them into three classes: entailed, contradictory, and neutral. UCLDS outperforms the traditional datasets TABFACT and INFOTABS in supporting both single-table and multi-table inference. The experimental results on the above three Chinese benchmark datasets show that the proposed model outperforms the baseline model invariably, demonstrating its superiority for Chinese table-based fact verification tasks.
    Available online:  August 23, 2023 , DOI: 10.13328/j.cnki.jos.006952
    Abstract:
    The virtualization, high availability, high scheduling elasticity, and other characteristics of cloud infrastructure provide cloud databases with many advantages, such as the out-of-the-box feature, high reliability and availability, and pay-as-you-go model. Cloud databases can be divided into two categories according to the architecture design: cloud-hosted databases and cloud-native databases. Cloud-hosted databases, deploying the database system in the virtual machine environment on the cloud, offer the advantages of low cost, easy operation and maintenance, and high reliability. Besides, cloud-native databases take full advantage of the characteristic elastic scaling of the cloud infrastructure. The disaggregated compute and storage architecture is adopted to achieve the independent scaling of computing and storage resources and further increase the cost-performance ratio of the databases. However, the disaggregated compute and storage architecture poses new challenges to the design of database systems. This survey is an in-depth analysis of the architecture and technology of the cloud-native database system. Specifically, the architectures of cloud-native online transaction processing (OLTP) and online analytical processing (OLAP) databases are classified and analyzed, respectively, according to the difference in the resource disaggregation mode, and the advantages and limitations of each architecture are compared. Then, on the basis of the disaggregated compute and storage architectures, this study explores the key technologies of cloud-native databases in depth by functional modules. The technologies under discussion include those of cloud-native OLTP (data organization, replica consistency, main/standby synchronization, failure recovery, and mixed workload processing) and those of cloud-native OLAP (storage management, query processing, serverless-aware compute, data protection, and machine learning optimization). At last, the study summarizes the technical challenges for existing cloud-native databases and suggests the directions for future research.
    Available online:  August 16, 2023 , DOI: 10.13328/j.cnki.jos.006939
    Abstract:
    Internet transport-layer protocols rely on the feedback information provided by the acknowledgment (ACK) mechanism to achieve functions such as congestion control and reliable transmission. According to the evolution of Internet transmission protocols, the ACK mechanisms of transmission control are reviewed. The unsolved problems among the mechanisms are discussed. Based on the elements of “type-trigger-information”, the ACK mechanism based on demand and its design principle are proposed, and the coupling relationship between the ACK mechanism and other transmission protocol submodules (e.g., congestion control, packet loss recovery, etc.) is emphatically analyzed. Subsequently, according to the design principle, the TACK mechanism, a feasible ACK mechanism based on demand, is elaborated, and relative concepts are systematically clarified. Finally, several meaningful research directions are provided according to the challenges encountered by the ACK mechanism based on demand.
    Available online:  August 16, 2023 , DOI: 10.13328/j.cnki.jos.006904
    Abstract:
    Code review is one of the best practices widely used in modern software development, which is crucial for ensuring software quality and strengthening engineering capability. Code review comments (CRCs) are one of the main and most important outputs of code reviews. CRCs are not only the reviewers’ perceptions of code quality but also the references for authors to fix code defects and improve quality. Nowadays, although a number of software organizations have developed guidelines for performing code reviews, there are still few effective methods for evaluating the quality of CRCs. To provide an explainable and automated quality evaluation of CRCs, this study conducts a series of empirical studies such as literature reviews and case analyses. Based on the results of the empirical studies, the study proposes a multi-label learning-based approach for evaluating the quality of CRCs. Experiments are carried out by using a large software enterprise-specific dataset that includes a total of 17 000 CRCs from 34 commercial projects. The results indicate that the proposed approach can effectively evaluate the quality attributes and grades of CRCs. The study also provides some modeling experiences such as CRC labeling and verification, so as to help software organizations struggling with code reviews better implement the proposed approach.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006912
    Abstract:
    Adaptor signature, also known as scriptless script, is an important cryptographic technique that can be used to solve the problems of poor scalability and low transaction throughput in blockchain applications such as cryptocurrency. An adaptor signature can be seen as an extension of a digital signature on hard relations, and it ties together the authorization with witness extraction and has many advantages in blockchain applications, such as (1) low on-chain cost; (2) improved fungibility of transactions; (3) advanced functionality beyond the limitation of the blockchain’s scripting language. SM2 signature is the Chinese national standard signature algorithm and has been widely used in various important information systems. This work designs an efficient SM2-based adaptor signature with batch proofs and gives security proofs under the random oracle model. The scheme avoids to generate zero-knowledge proofs used in the pre-signing phase based on the structure of SM2 signature and is more efficient than existing ECDSA/SM2-based adaptor signature. Specifically, the efficiency of pre-signature generation is increased by 4 times, and the efficiency of pre-signature verification is increased by 3 times. Then, based on distributed SM2 signature, this work develops distributed SM2-based adaptor signature which can avoid the single point of failure and improve the security of signing key. Finally, in real-world applications, this work gives a secure and efficient batch atomic swap protocol for one-to-many scenarios based on SM2-based adaptor signature.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006913
    Abstract:
    Commonsense question answering is an essential natural language understanding task that aims to solve natural language questions automatically by using commonsense knowledge to obtain accurate answers. It has a broad application prospect in areas such as virtual assistants or social chatbots and contains crucial scientific issues such as knowledge mining and representation, language understanding and computation, and answer reasoning and generation. Therefore, it has received wide attention from industry and academia. This study first introduces the main datasets in commonsense question answering. Secondly, it summarizes the distinctions between different sources of commonsense knowledge in terms of construction methods, knowledge sources, and presentation forms. Meanwhile, the study focuses on the analysis and comparison of the state-of-the-art commonsense question answering models, as well as the characteristic methods fusing commonsense knowledge. Particularly, based on the commonalities and characteristics of commonsense knowledge in different question answering task scenarios, this study establishes a commonsense knowledge classification system containing attribute, semantic, causal, context, abstract, and intention. On this basis, it conducts prospective research on the construction of commonsense knowledge datasets, the collaboration mechanism of perceptual knowledge fusion and pre-trained language models, and corresponding commonsense knowledge pre-classification techniques. Furthermore, the study reports specifically on the performance changes in the above models under cross-dataset migration scenarios and their potential contributions in commonsense answer reasoning. On the whole, this study gives a comprehensive review of existing data and state-of-the-art technologies, as well as a pre-research for the construction of cross-data knowledge systems, technology migration, and generalization, so as to provide references for the further development of theories and technologies while reporting on the existing technologies in the field.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006902
    Abstract:
    As an important cornerstone of artificial intelligence, knowledge graphs can extract and represent a priori knowledge from massive data on the Internet, which greatly solves the bottleneck problem of the poor interpretability of cognitive decisions of intelligent systems and plays a key role in the construction and application of intelligent systems. As the application of knowledge graph technology continues to deepen, the knowledge graph completion that aims to solve the problem of the incompleteness of graphs is imminent. Link prediction is the task of predicting the missing entities and relations in the knowledge graph, which is indispensable in the construction and completion of the knowledge graph. The full exploitation of the hidden relations in the knowledge graph and the use of massive entities and relations for computation require the conversion of the symbolic representations of information into the numerical form, i.e., knowledge graph representation learning. Hence, link prediction-oriented knowledge graph representation learning has become a popular research topic in the field of knowledge graphs. This study systematically introduces the latest research progress of link prediction-oriented knowledge graph representation learning methods from the basic concepts of link prediction and representation learning. Specifically, the research progress is discussed in detail in terms of knowledge representation forms and algorithmic modeling methods. The development of the knowledge representation forms is used as a clue to introduce the mathematical modeling of link prediction tasks in the knowledge representation forms of binary relations, multi-relations, and hyper-relations. On the basis of the representation learning modeling, the existing methods are refined into four types of models: translation distance models, tensor decomposition models, traditional deep learning models, and graph neural network models. The implementation methods of each type are described in detail together with representative models for solving link prediction tasks with different relational metrics. The common datasets and criteria for link prediction are then introduced, and on this basis, the link prediction effects of the four types of knowledge representation learning models under the knowledge representation forms of binary relations, multi-relations, and hyper-relations are presented in a comparative analysis. Finally, the future development trends are given in terms of model optimization, knowledge representation forms, and problem scope.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006915
    Abstract:
    Driven by mature data mining technologies, the recommendation system has been able to efficiently utilize explicit and implicit information such as score data and behavior traces and then combine the information with complex and advanced deep learning technologies to achieve sound results. Meanwhile, its application requirements also drive the in-depth mining and utilization of basic data and the load reduction of technical requirements to become research hotspots. On this basis, a lightweight recommendation model, namely LG_APIF is proposed, which uses the graph convolutional network (GCN) method to deeply integrate information. According to behavior memory, the model employs Ebbinghaus forgetting curve to simulate the users’ interest change process and adopts linear regression and other relatively lightweight traditional methods to mine adaptive periods and other depth information of items. In addition, it analyzes users’ current interest distribution and calculates the interest value of the item to obtain users’ potential interest type. It further constructs the graph structure of the user-type-item triplet and uses GCN technology after load reduction to generate the final item recommendation list. The experiments have verified the effectiveness of the proposed method. Through the comparison with eight classical models on the datasets of Last.fm, Douban, Yelp, and MovieLens, it is found that the Precision, Recall, and NDCG of the proposed method are improved, with an average improvement of 2.11% on Precision, 1.01% on Recall, and 1.48% on NDCG, respectively.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006916
    Abstract:
    Database management systems are divided into transactional (OLTP) systems and analytical (OLAP) systems according to application scenarios. With the growing demand for real-time data analysis and the increasing popularity of mixed OLTP and OLAP tasks, the industry has begun to focus on database management systems that support hybrid transactional/analytical processing (HTAP). An HTAP database system not only needs to meet the requirements of high-performance transaction processing but also supports real-time analysis for data freshness. Therefore, it poses new challenges to the design and implementation of database systems. In recent years, some prototypes and products with diverse architectures and technologies have emerged in industry and academia. This study reviews the background and development status of HTAP databases and classifies current HTAP databases from the perspective of storage and computing. On this basis, this study summarizes the key technologies used in the storage and computing of HTAP systems from bottom to top. Under this framework, the design ideas, advantages and disadvantages, and applicable scenarios of various systems are introduced. In addition, according to the evaluation benchmarks and metrics of HTAP databases, this study also analyzes the relationship between the design of various HTAP databases and their performance as well as data freshness. Finally, this study combines cloud computing, artificial intelligence, and new hardware technologies to provide ideas for future research and development of HTAP databases.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006921
    Abstract:
    With the development of modern information technology, people’s demand for high resolution and realistic visual perception of image display devices has increased, which has put forward higher requirements for computer software and hardware and brought many challenges to rendering technology in terms of performance and workload. Using machine learning technologies such as deep neural networks to improve the quality and performance of rendered images has become a popular research method in computer graphics, while upsampling low-resolution images through network inference to obtain clearer high-resolution images is an important way to improve image generation performance and ensure high-resolution details. The geometry buffers (G-buffers) generated by the rendering engine in the rendering process contain much semantic information, which help the network learn scene information and features effectively and then improve the quality of upsampling results. In this study, a super-resolution method for rendered contents in low resolution based on deep neural networks is designed. In addition to the color image of the current frame, the method uses high-resolution G-buffers to assist in the calculation and reconstruct the high-resolution content details. The method also leverages a new strategy to fuse the features of high-resolution buffers and low-resolution images, which implements a multi-scale fusion of different feature information in a specific fusion module. Experiments demonstrate the effectiveness of the proposed fusion strategy and module, and the proposed method shows obvious advantages, especially in maintaining high-resolution details, when compared with other image super-resolution methods.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006922
    Abstract:
    SMT solver is an important system software. Therefore, bugs in the SMT solver may lead to the function failure of software relying on it and even bring security incidents. However, fixing bugs in the SMT solver is time-consuming because developers need to spend a lot of effort in understanding and finding the root cause of the bugs. Although many studies on software bug localization have been proposed, there is no systematic work to automatically locate bugs in the SMT solver. Therefore, this study proposes a bug localization method for the SMT solver based on multi-source spectrums, namely SMTLOC. First, for a given bug in the SMT solver, SMTLOC proposes an enumeration-based algorithm to mutate the formula that triggers the bug by generating a set of witness formulas that will not trigger the bug but has a similar execution trace with the formula that triggers the corresponding bug. Then, according to the execution trace of the witness formulas and the source code information of the SMT solver, SMTLOC develops a technique based on the coverage spectrum and historical spectrum to calculate the suspiciousness of files, thus locating the files that contain the bug. In order to evaluate the effectiveness of SMTLOC, 60 bugs in the SMT solver are collected. Experimental results show that SMTLOC is superior to the traditional spectrum bug localization method and can locate 46.67% of the bugs in TOP-5 files, and the localization effect is improved by 133.33%.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006911
    Abstract:
    How to quickly and effectively mine valuable information from massive data to better guide decision-making is an important goal of big data analysis. Visual analysis is an important big data analysis method, and it takes advantage of the characteristics of human visual perception, utilizes visualization charts to present laws contained in complex data intuitively, and supports human-centered interactive data analysis. However, the visual analysis still faces several challenges, such as the high cost of data preparation, high latency of interaction response, high threshold for visual analysis, and low efficiency of interaction modes. To address the above challenges, researchers propose a series of methods to optimize the human-computer interaction mode of visual analysis systems and improve the intelligence of the system by leveraging data management and artificial intelligence techniques. This study systematically sorts out, analyzes, and summarizes these methods and puts forward the basic concept and key technical framework of intelligent data visualization analysis. Then, under the framework, the research progress of data preparation for visual analysis, intelligent data visualization, efficient visual analysis, and intelligent visual analysis interfaces both in China and abroad is reviewed and analyzed. Finally, this study looks forward to the future development trend of intelligent data visualization analysis.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006905
    Abstract:
    Machine learning methods can be well combined with software testing to enhance test effect, but few scholars have applied it to test data generation. In order to further improve the efficiency of test data generation, a chained model combining support vector machine (SVM) and extreme gradient boosting (XGBoost) is proposed, and multi-path test data generation is realized by a genetic algorithm based on the chained model. Firstly, this study uses certain samples to train several sub-models (i.e., SVM and XGBoost) for predicting the state of path nodes, filters the optimal sub-models based on the prediction accuracy value of the sub-models, and links the optimal sub-models in sequence according to the order of the path nodes, so as to form a chained model, namely chained SVM and XGBoost (C-SVMXGBoost). When using the genetic algorithm to generate test cases, the study makes use of the chained model that is trained instead of the instrumentation method to obtain the test data coverage path (i.e., predicted path), finds the path set with the predicted path similar to the target path, performs instrumentation verification on the predicted path with similar path sets, obtains accurate paths, and calculates fitness values. In the crossover and mutation process, excellent test cases with a large path level depth in the sample set are introduced for reuse to generate test data covering the target path. Finally, individuals with higher fitness during the evolutionary generation are saved, and C-SVMXGBoost is updated, so as to further improve the test efficiency. Experiments show that C-SVMXGBoost is more suitable for solving the path prediction problem and improving the test efficiency than other chained models. Moreover, compared with the existing classical methods, the proposed method can increase the coverage rate by up to 15%. The mean evolutionary algebra is also reduced, and the reduction percentage can reach 65% on programs of large size.
    Available online:  August 09, 2023 , DOI: 10.13328/j.cnki.jos.006834
    Abstract:
    As an important technology in the field of artificial intelligence (AI), deep neural networks are widely used in various image classification tasks. However, existing studies have shown that deep neural networks have security vulnerabilities and are vulnerable to adversarial examples. At present, there is no research on the systematic analysis of adversarial example detection of images. To improve the security of deep neural networks, this study, based on the existing research work, comprehensively introduces adversarial example detection methods in the field of image classification. First, the detection methods are divided into supervised detection and unsupervised detection by the construction method of the detector, which are then classified into subclasses according to detection principles. Finally, the study summarizes the problems in adversarial example detection and provides suggestions and an outlook in terms of generalization and lightweight, aiming to assist in AI security research.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006839
    Abstract:
    As an automatic search tool, mixed integer linear programming (MILP) is widely used to search for differential, linear, integral, and other cryptographic properties of block ciphers. In this study, a new technique of constructing MILP models based on a dynamic selection strategy is proposed, which uses different constraint inequalities to describe the propagation of cryptographic properties under different conditions. Specifically, according to the different Hamming weights of the input division property, this study adopts different methods to construct MILP models of the division property propagation with linear layers. Finally, this technique is applied to search for integral distinguishers of uBlock and Saturnin algorithms. The experimental results show that the proposed technique can obtain an 8-round integral distinguisher with 32 more balance bits than the previous optimal integral distinguisher for the uBlock128 algorithm. In addition, this study gets 9- and 10-round integral distinguishers for uBlock128 and uBlock256 algorithms which are one round longer than the previous optimal integral distinguishers. For the Saturnin256 algorithm, the study finds a 9-round integral distinguisher which is one round longer than the previous optimal integral distinguisher.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006840
    Abstract:
    Hierarchical topic model is an important tool to organize topic hierarchy. Most of the existing hierarchical topic models provide tree-structured prior distributions for document topics by introducing the nCRP construction method into the topic models, but they cannot acquire a topic hierarchy with clear domain meanings, referred to as domain topic hierarchy. Meanwhile, there are not only hierarchical relationships among domain topics but also sub-topic aspect sharing relationships under different parent topics. There is no appropriate model that yields such domain topic hierarchy in the current research on topic relationships. In order to automatically and effectively mine the hierarchical and correlated relationships of domain topics from domain texts, improvements are put forward as follows. Firstly, this study improves the nCRP construction method through the topic sharing mechanism and proposes the nCRP+ hierarchical construction method to provide a tree-structured prior distribution with hierarchical topic aspect sharing for topics generated from topic models. Then the reallocated hierarchical Dirichlet processes (rHDP) are developed based on nCRP+ and HDP models, and an rHDP model is proposed. By employing the domain taxonomy, word semantics, and domain representation of topic words, the study defines domain knowledge, including the domain membership degree based on the voting mechanism, the semantic relevance between words and domain topics, and the contribution degree of hierarchical topic words. Finally, domain knowledge is used to improve the allocation processes of domain topics and topic words in the rHDP model, and rHDP with domain knowledge (rHDP_DK) model is proposed to improve the sampling process. The experimental results show that hierarchical topic models based on nCRP+ are superior to those based on nCRP (hLDA and nHDP) and neural topic model (TSNTM) in terms of evaluation metrics. The topic hierarchy, built by the rHDP_DK model, is characterized by clear domain topic hierarchy and explicit domain differences among related sub-topics. Furthermore, the model will provide a general automatic mining framework for domain topic hierarchy.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006841
    Abstract:
    In multi-label learning, each sample is associated with multiple labels. The key task is how to use the correlation between labels when building the model. Multi-label deep forest (MLDF) algorithm attempts to mine the correlation between labels by using layer-by-layer representation learning under the framework of deep ensemble learning and use the obtained label probability representation to improve prediction accuracy. However, on the one hand, the label probability representation is highly correlated with the label information, which will lead to its low diversity. As the depth of the deep forest increases, the performance will decline. On the other hand, the calculation of label probability requires the storage of forest structures with all layers and the application of these structures one by one in the test stage, which will cause unbearable computational and storage overhead. To solve these problems, this study proposes interaction representation-based MLDF (iMLDF). iMLDF mines the structural information in the feature space from the decision path of the forest model, extracts the feature interaction in the decision tree path by using the random interaction trees, and obtains two interaction representations of feature confidence score and label probability distribution, respectively. On the one hand, iMLDF makes full use of the feature structural information in the forest model to enrich the relevant information between labels. On the other hand, it calculates all the representations through interaction expressions so that the algorithm does not need to store all the forest structures, which greatly improves computational efficiency. The experimental results show that iMLDF algorithm achieves better prediction performance, and the computational efficiency is improved by an order of magnitude compared with MLDF for datasets with massive samples.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006842
    Abstract:
    Graph partitioning is a basic task for distributed graph computing. It is used to divide a large-scale graph into different parts and allocate them to different machines in a cluster. The quality of graph partitioning has a great impact on the performance of distributed graph computing, and graph partitioning aims to minimize edge cuts and load balance. Nowadays, the graph data usually grow dynamically, which needs a partitioning method to process dynamic incremental graphs, so as to ensure the quality of graph partitioning. Although some dynamic graph partitioning algorithms have been presented recently, they cannot process real-time dynamic changes and obtain high-quality graph partitioning results simultaneously. In this study, a dynamic incremental graph partitioning algorithm based on vertex group redistribution (ED-IDGP) is proposed to solve the problem of large-scale dynamic incremental graph partitioning. In ED-IDGP, a dynamic processor is designed to process four different unit update types in real time, and the graph partitioning quality is further improved by executing a local optimizer near the dynamic change in the partition after each unit update. In the local optimizer of ED-IDGP, a vertex group search strategy based on the improved label propagation algorithm is used to search for the vertex group, and a vertex group movement gain formula is proposed to measure the most beneficial vertex group and move it to the target partition for optimization. This study evaluates the performance and efficiency of the ED-IDGP algorithm from different perspectives and metrics on real datasets.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006843
    Abstract:
    As a new granular computing model, partition order product space can describe and solve problems from multiple views and levels. Its problem solving space is a lattice structure composed of multiple problem solving levels, and each problem solving level is composed of multiple one-level views. How to choose the problem solving level in the partition order product space is an NP-hard problem. Therefore, this study proposes a two-stage adaptive genetic algorithm (TSAGA) to find the problem solving level. First, real encoding is used to encode the problem solving level, and then the fitness function is defined according to the classification accuracy and granularity of the problem solving level. The first stage of the algorithm is based on a classical genetic algorithm, and some excellent problem solving levels are pre-selected as part of the initial population of the second stage, so as to optimize the problem solving space. In the second stage of the algorithm, an adaptive selection operator, adaptive crossover operator, and adaptive large-mutation operator are proposed, which can dynamically change with the number of iterations of the current population evolution, so as to further select the problem solving level in the optimized problem solving space. Experimental results demonstrate the effectiveness of the proposed method.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006844
    Abstract:
    The integration of machine learning and automatic reasoning is a new trend in artificial intelligence. Constraint satisfaction is a classic problem in artificial intelligence. A large number of scheduling, planning, and configuration problems in the real world can be modeled as constraint satisfaction problems, and efficient solving algorithms have always been a research hotspot. In recent years, many new methods of applying machine learning to solve constraint satisfaction problems have emerged. These methods based on “learning to reasoning” open up new directions for solving constraint satisfaction problems and show great development potential. They are featured by better adaptability, strong scalability, and online optimization. This study divides the current “learning to reasoning” methods into three categories including message-passing neural network-based, sequence-to-sequence-based, and optimization-based methods. Additionally, the characteristics of various methods and their solution effects on different problem sets are analyzed in detail. In particular, a comparative analysis is conducted on relevant work involved in each type of method from multiple perspectives. Finally, the constraint solving method based on “learning to reasoning” is summarized and prospected.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006845
    Abstract:
    Enumerating minimal unsatisfiable subsets (MUS) is an important subproblem in the Boolean satisfiability problem. For an unsatisfiable problem, the MUS enumeration can reflect the key factors resulting in its unsatisfiability. However, enumerating MUS is extremely time-consuming, and different pruning schemes will directly affect the size of the search space and the total number of iterations, thus affecting the algorithm efficiency. This study proposes a novel enhanced pruning scheme, accelerating by critical MSS (ABC), to accelerate the MUS enumeration. According to the relationship among maximal satisfiable subsets (MSS), minimal correction sets (MCS), and MUS, the concepts of cMSS and subMUS are put forward. Additionally, four properties are summarized, namely that each MUS must be a superset of subMUS, and then the feature that MUS and MCS are mutually hitting sets can be effectively employed to avoid the time cost in solving hitting sets of MCS. When the subMUS is unsatisfiable, it will be the only MUS, and the algorithm will terminate in advance; otherwise, the node representing subMUS will be pruned to effectively avoid searching the non-solution space. Meanwhile, the effectiveness of the proposed ABC scheme is proven by theorem, which has been applied to the state-of-the-art algorithms MARCO and MARCO-MAM, respectively. Experimental results on SAT11 MUS benchmarks show the proposed scheme can effectively prune the search space to improve the enumeration efficiency of MUS.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006891
    Abstract:
    Network traffic encryption not only protects corporate data and user privacy but also brings new challenges to malicious traffic detection. According to different ways of processing encrypted traffic, encrypted malicious traffic detection technology can be divided into active and passive detection. Active detection technology includes detection after traffic decryption and that based on searchable encryption technology. Its research focuses on privacy protection and detection efficiency improvement, and mainly analyzes the application of trusted execution environments and controllable transmission protocols. Passive detection technology is a method of identifying encrypted malicious traffic without perception for users and without performing any encryption or decryption operations. The research focuses on the selection and construction of features. It analyzes relevant detection methods from three types of features such as side channel features, plaintext features, and raw traffic, and then the experimental evaluation conclusions of relevant models are given. Finally, the feasibility of the research on the countermeasures of encrypted malicious traffic detection is analyzed from the perspectives of obfuscating traffic characteristics, interference learning algorithms, and hiding relevant information.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006892
    Abstract:
    The committee consensus and hybrid consensus elect the committee to replace the whole nodes for block validation, which can effectively speed up consensus and improve throughput. However, malicious attacks and bribes can easily lead to committee corruption, affect consensus results, and even cause system paralysis. Although the existing work proposes the reputation mechanism to reduce the possibility of committee corruption, it has high overhead and poor reliability and cannot reduce the impact of corruption on the system. Therefore, this study proposes a dynamic blockchain consensus with pre-validation (DBCP). DBCP realizes reliable reputation evaluation of the committee through pre-validation with little overhead, which can eliminate malicious nodes from the committee in time. If serious corruption has undermined the consensus result, DBCP will transfer the authority of block validation to the whole nodes through dynamic consensus and eliminate the committee nodes that give wrong suggestions to avoid system paralysis. When the committee iterates to the high-credibility state, DBCP will hand over the authority of block validation to the committee, and the whole nodes will accept the consensus result from the committee without verifying the block to speed up the consensus. The experimental results show that the throughput of DBCP is two orders of magnitude higher than that of Bitcoin and similar to that of Byzcoin. In addition, DBCP can quickly deal with committee corruption within a block cycle, demonstrating better security than Byzcoin.
    Available online:  July 28, 2023 , DOI: 10.13328/j.cnki.jos.006837
    Abstract:
    Depth ambiguity is an important challenge for multi-person three-dimensional (3D) pose estimation of single-frame images, and extracting contexts from an image has great potential for alleviating depth ambiguity. Current top-down approaches usually model key point relationships based on human detection, which not only easily results in key point shifting or mismatching but also affects the reliability of absolute depth estimation using human scale factor because of a coarse-grained human bounding box with large background noise. Bottom-up approaches directly detect human key points from an image and then restore the 3D human pose one by one. However, the approaches are at a disadvantage in relative depth estimation although the scene context can be obtained explicitly. This study proposes a new two-branch network, in which human context based on key point region proposal and scene context based on 3D space are extracted by top-down and bottom-up branches, respectively. The human context extraction method with noise resistance is proposed to describe the human by modeling key point region proposal. The dynamic sparse key point relationship for pose association is modeled to eliminate weak connections and reduce noise propagation. A scene context extraction method from a bird’s-eye-view is proposed. The human position layout in 3D space is obtained by modeling the image’s depth features and mapping them to a bird’s-eye-view plane. A network fusing human and scene contexts is designed to predict absolute human depth. The experiments are carried out on public datasets, namely MuPoTS-3D and Human3.6M, and results show that compared with those by the state-of-the-art models, the relative and absolute position accuracies of 3D key points by the proposed HSC-Pose are improved by at least 2.2% and 0.5%, respectively, and the position error of mean roots of the key points is reduced by at least 4.2 mm.
    Available online:  July 26, 2023 , DOI: 10.13328/j.cnki.jos.006918
    Abstract:
    Third-party library (TPL) detection is an upstream task in the domain of Android application security analysis, and its detection accuracy has a significant impact on its downstream tasks including malware detection, repackaged application detection, and privacy leakage detection. To improve detection accuracy and efficiency, this study proposes a package structure and signature-based TPL detection method, named LibPass, by leveraging the idea of pairwise comparison. LibPass combines primary module identification, TPL candidate identification, and fine-grained detection in a streamlined way. The primary module identification aims at improving detection efficiency by distinguishing the binary code of the main program from that of the introduced TPL. On this basis, a two-stage detection method consisting of TPL candidate identification and fine-grained detection is proposed. The TPL candidate identification leverages the stability of package structure features to deal with obfuscation of applications to improve detection accuracy and identifies candidate TPLs by rapidly comparing package structure signatures to reduce the number of pairwise comparisons, so as to improve the detection efficiency. The fine-grained detection accurately identifies the TPL of a specific version by a finer-grained but more costly pairwise comparison among candidate TPLs. In order to validate the performance and the efficiency of the detection method, three benchmark datasets are built to evaluate different detection capabilities, and experiments are conducted on these datasets. The experimental results are deeply analyzed in terms of detection performance, detection efficiency, and obfuscation resistance, and it is found that LibPass has high detection accuracy and efficiency and can deal with various common obfuscation operations.
    Available online:  July 26, 2023 , DOI: 10.13328/j.cnki.jos.006919
    Abstract:
    Memory error vulnerabilities (e.g., buffer overflow) are often caused by improper use of memory copy functions. The identification of memory copy functions in binary programs is beneficial for finding memory error vulnerabilities. However, current methods for identifying memory copy functions in binary programs mainly rely on static analysis to extract functions’ features, control flow, data flow, and other information, with a high false positive and false negative. This study proposes a novel technique, namely CPSeeker, based on hybrid static and dynamic analysis to improve the effectiveness of identifying memory copy functions. CPSeeker combines the advantages of static analysis and dynamic analysis, collects the global static information and local execution information of functions in stages, and fuses the extracted information to identify memory copy functions in binary programs. The experimental results show that CPSeeker outperforms the state-of-the-art BootStomp, SaTC, CPYFinder, and Gemini in identifying memory copy functions, despite its increased runtime consumption, and its F1 value reaches 0.96. Furthermore, CPSeeker is not affected by the compilation environment (compiler version, compiler type, and compiler optimization level). In addition, CPSeeker has a better performance in actual firmware tests.
    Available online:  July 26, 2023 , DOI: 10.13328/j.cnki.jos.006920
    Abstract:
    The broad-learning-based dynamic fuzzy inference system (BL-DFIS) can automatically assemble simplified fuzzy rules and achieve high accuracy in classification tasks. However, when BL-DFIS works on large and complex datasets, it may generate too many fuzzy rules to achieve satisfactory identification accuracy, which adversely affects its interpretability. In order to circumvent such a bottleneck, a fuzzy neural network called feature-augmented random vector functional-link neural network (FA-RVFLNN) is proposed in this study to achieve excellent trade-off between classification performance and interpretability. In the proposed network, the RVFLNN with original data as input is taken as its primary structure, and BL-DFIS is taken as a performance supplement, which implies that FA-RVFLNN contains direct links to boost the performance of the whole system. The inference mechanism of the primary structure can be explained by a fuzzy logic operator (I-OR), owing to the use of Sigmoid activation functions in the enhancement nodes of this structure. Moreover, the original input data with clear meaning also help to explain the inference rules of the primary structure. With the support of direct links, FA-RVFLNN can learn more useful information through enhancement nodes, feature nodes, and fuzzy nodes. The experimental results indicate that FA-RVFLNN indeed eases the problem of rule explosion caused by excessive enhancement nodes in the primary structure and improves the interpretability of BL-DFIS therein (The average number of fuzzy rules is reduced by about 50%), and is still competitive in terms of generalization performance and network size.
    Available online:  July 26, 2023 , DOI: 10.13328/j.cnki.jos.006940
    Abstract:
    How to improve the accuracy of matching between natural language query input and highly structured programming language source code is a fundamental concern in code search. Accurate extraction of code features is one of the key challenges to improving matching accuracy. The semantics expressed by statements in codes is not only relevant to themselves but also to their contexts. The structural model of the code provides rich contextual information for understanding code functions. This study proposes a code search method based on function multigraph embedding. By using an early fusion strategy, the study fuses the data dependencies of code statements into a control flow graph and constructs a function multigraph to represent the code. The multigraph explicitly expresses the dependency relationships of indirect predecessor and successor nodes that are lacking in the control flow graph through data dependencies and enhances the contextual information of statement nodes. At the same time, in view of the edge heterogeneity of the multigraph, this study uses the relational graph convolutional network to extract the features of the code from the function multigraph. Experiments on a public dataset show that the proposed method can improve the MRR by more than 5% compared with the existing methods based on code text and structural models. The ablation experiments also show that the control flow graph contributes more to the search accuracy than the data dependence graph.
    Available online:  July 19, 2023 , DOI: 10.13328/j.cnki.jos.006832
    Abstract:
    The mixed cooperative-competitive multi-agent system consists of controlled target agents and uncontrolled external agents. The target agents cooperate with each other and compete with external agents, so as to deal with the dynamic changes in the environment and the external agents and complete tasks. In order to train the target agents and make them learn the optimal policy for completing the tasks, the existing work proposes two kinds of solutions: (1) focusing on the cooperation between target agents, viewing the external agents as a part of the environment, and leveraging the multi-agent-reinforcement learning to train the target agents; but these approaches cannot handle the uncertainty of or dynamic changes in the external agents’ policy; (2) focusing on the competition between target agents and external agents, modeling the competition as two-player games, and using a self-play approach to train the target agents; these approaches are only suitable for cases where there is one target agent and external agent, and they are difficult to be extended to a system consisting of multiple target agents and external agents. This study combines the two kinds of solutions and proposes a counterfactual regret advantage-based self-play approach. Specifically, first, based on the counterfactual regret minimization and counterfactual multi-agent policy gradient, the study designs a counterfactual regret advantage-based policy gradient approach for making the target agent update the policy more accurately. Second, in order to deal with the dynamic changes in the external agents’ policy during the self-play process, the study leverages imitation learning, which takes the external agents’ historical decision-making trajectories as training data and imitates the external agents’ policy, so as to explicitly model the external agents’ behaviors. Third, based on the counterfactual regret advantage-based policy gradient and the modeling of external agents’ behaviors, this study designs a self-play training approach. This approach can obtain the optimal joint policy for training multiple target agents when the external agents’ policy is uncertain or dynamically changing. The study also conducts a set of experiments on the cooperative electromagnetic countermeasure, including three typical mixed cooperative-competitive tasks. The experimental results demonstrate that compared with other approaches, the proposed approach has an improvement of at least 78% in the self-game effect.
    Available online:  July 12, 2023 , DOI: 10.13328/j.cnki.jos.006909
    Abstract:
    With the popularity of touch devices, pen + touch inputs have become mainstream input modes for mobile officing. However, existing applications mainly take one of them as input, which limits users’ interaction space. In addition, existing pen + touch research mainly focuses on serial pen + touch cooperation and parallel processing of specific interactive tasks and does not systematically consider parallel cooperation mechanism and intention correlation between different inputs. This study first proposes an interaction model based on pen + touch inputs and then defines pen + touch interaction primitives according to users’ behavioral habits in pen + touch cooperation, so as to extend pen + touch interaction space. Furthermore, by using a partially observable Markov decision process (POMDP), the study develops a method of extracting pen + touch input intentions based on time sequence information, so as to incrementally extract the interaction intention of polysemantic interaction primitives. Finally, the study evaluates the advantages of pen + touch inputs through a user experiment.
    Available online:  July 12, 2023 , DOI: 10.13328/j.cnki.jos.006910
    Abstract:
    Code search is an important research topic in natural language processing and software engineering. Developing efficient code search algorithms can significantly improve the code reuse and the working efficiency of software developers. The task of code search is to retrieve code fragments that meet the requirements from the massive code repository by taking the natural language describing the function of the code fragments as input. Although the sequence model-based code search method, namely DeepCS has achieved promising results, it cannot capture the deep semantics of the code. GraphSearchNet, a code search method based on graph embedding, can alleviate this problem, but it does not perform fine-grained matching on codes and texts and ignores the global relationship between code graphs and text graphs. To address the above limitations, this study proposes a code search method based on a relational graph convolutional network, which encodes the constructed text graphs and code graphs, performs fine-grained matching on text query and code fragments at the node level, and applies neural tensor networks to capture their global relationship. Experimental results on two public datasets show that the proposed method achieves higher search accuracy than state-of-the-art baseline models, namely DeepCS and GraphSearchNet.
    Available online:  July 12, 2023 , DOI: 10.13328/j.cnki.jos.006903
    Abstract:
    One of the main challenges in judicial artificial intelligence is the identification of key case elements. The existing methods only take the identification of case elements as an identification task of named entities, and thus, the recognized information is mostly irrelevant. In addition, due to the lack of effective use of global and local information in texts, the effect of element boundary recognition is poor. To solve these problems, this study proposes a recognition method of key case elements by integrating global and local information. Specifically, the BERT model is used as the input-sharing layer of judicial texts to extract text features. Then, three sub-task networks of judicial case element recognition, judicial text classification (global information), and judicial Chinese word segmentation (local information) are established on the sharing layer for joint learning. Finally, the effectiveness of this method is tested on two public data sets. The results show that the F1 value of the proposed method exceeds the existing optimal method, improves the classification accuracy of element entities, and reduces boundary recognition errors.
    Available online:  July 12, 2023 , DOI: 10.13328/j.cnki.jos.006917
    Abstract:
    In order to perform knowledge mining and management, information systems need to process various forms of data, including stream data. Stream data have the characteristics of large data scale, fast generation speed, and strong timeliness of the knowledge contained in them. Therefore, it is very important for knowledge management of information systems to develop stream processing technology that supports real-time stream processing applications. Stream processing systems (SPSs) can be traced back to the 1990s, and they have undergone significant development since then. However, current diverse knowledge management needs and the new generation of hardware architectures have brought new challenges and opportunities for SPSs, and a series of technical research on stream processing ensues. This study introduces the basic requirements and development history of SPSs and then analyzes relevant technologies in the SPS field in terms of four aspects: programming interface, execution plan, resource scheduling, and fault tolerance. Finally, this study predicts the research directions and development trends of stream processing technology in the future.
    Available online:  July 05, 2023 , DOI: 10.13328/j.cnki.jos.006901
    Abstract:
    Hybrid transactional/analytical processing (HTAP) database systems have gained extensive acknowledgment of users due to their full processing support of the mixed workloads in one system, i.e., transactions and analytical queries. Most HTAP database systems tend to maintain multiple data versions or additional replicas to accomplish online analytical processing (OLAP) without downgrading the write performance of online transactional processing (OLTP). This leads to a consistency problem between the data of TP and AP versions. Meanwhile, HTAP database systems face the core challenge of achieving efficient data sharing under resource isolation, and the data-sharing model integrates the trade-off between business requirements for performance and data freshness. To systematically explain the data-sharing model and optimization strategies of existing HTAP database systems, this study first utilizes the consistency models to define the data-sharing model and classify the consistency models for HTAP data sharing into three categories, namely, linear consistency, sequential consistency, and session consistency, according to the differences between TP generated versions and AP query versions. After that, it takes a deep dive into the whole process of data-sharing models from three core issues, i.e., data-version number distribution, data version synchronization, and data version tracking, and provides the implementation methods of different consistency models. Furthermore, this study takes a dozen of classic and popular HTAP database systems as examples for an in-depth interpretation of the implementation methods. Finally, it summarizes and analyzes the optimization strategies of version synchronization, tracking, and recycling modules involved in the data-sharing process and predicts the optimization directions of the data-sharing models. It is concluded that the self-adaptability of the data synchronization scope, self-tuning of the data synchronization cycle, and freshness-bound constraint control under sequential consistency are the possible means for better performance of HTAP database systems and higher freshness.
    Available online:  July 05, 2023 , DOI: 10.13328/j.cnki.jos.006812
    Abstract:
    Security bug reports (SBRs) can describe critical security vulnerabilities in software products. SBR prediction has attracted the increasing attention of researchers to eliminate security attack risks of software products. However, in actual software development scenarios, a new company or new project may need software security bug prediction, without enough marked SBRs for building SBR prediction models in practice. A simple solution is employing the migration model, which means that marked data of other projects can be adopted to build the prediction model. Inspired by two recent studies in this field, this study puts forward a cross-project SBR prediction method integrating knowledge graphs, i.e., knowledge graph of security bug report prediction (KG-SBRP), based on the idea of security keyword filtering. The text information field in SBR is combined with common weakness enumeration (CWE) and common vulnerabilities and exposures (CVE) Details to build a triple rule entity. Then the entity is utilized to build a knowledge graph of security bugs and identify SBRs by combining the entity and relationship recognition. Finally, the data is divided into training sets and test sets for model fitting and performance evaluation. The built model conducts empirical research on seven SBR datasets with different scales. The results show that compared with the current main methods FARSEC and Keyword matrix, the proposed method can increase the performance index F1-score by an average of 11% under cross-project SBR prediction scenarios. In addition, the F1-score value can also grow by an average of 30% in SBR prediction scenarios within a project.
    Available online:  July 05, 2023 , DOI: 10.13328/j.cnki.jos.006906
    Abstract:
    Software product line testing is challenging. The similarity-based testing method can improve testing coverage and fault detection rate by increasing the diversity of test suites. Due to its excellent scalability and satisfactory testing effects, the method has become one of the most important test methods for software product lines. How to generate diverse test cases and how to maintain the diversity of test suites are two key issues in this test method. To handle the above issues, this study proposes a software product line test algorithm based on diverse SAT solvers and novelty search (NS). Specifically, the algorithm simultaneously uses two types of diverse SAT solvers to generate diverse test cases. In particular, in order to improve the diversity of stochastic local search SAT solvers, the study proposes a general strategy that is based on a probability vector to generate candidate solutions. Furthermore, two archiving strategies inspired by the idea of the NS algorithm are designed and applied to maintain both the global and local diversity of the test suites. Ablation and comparison experiments on 50 real software product lines verify the effectiveness of both the diverse SAT solvers and the two archiving strategies, as well as the superiority of the proposed algorithm over other state-of-the-art algorithms.
    Available online:  July 05, 2023 , DOI: 10.13328/j.cnki.jos.006907
    Abstract:
    Business?process?execution language (BPEL) is an executable web service composition language. Compared with traditional programs, BPEL programs are significantly different in terms of programming models and execution modes. These new features make it challenging to locate and fix faults of BPEL programs detected during the testing process. In addition, fault fixing techniques developed for traditional software cannot be used for BPEL programs directly. This study proposes a fault fixing technique for BPEL programs based on template matching, namely BPELRepair from the perspective of mutation analysis. In order to overcome the high computational overhead of the mutation analysis-based fault fixing technique, a set of optimization strategies are proposed from three perspectives, namely patch generation, test case selection, and termination condition. A supporting tool is developed to improve the automation and efficiency of fault fixing for BPEL programs. An empirical study is used to evaluate the effectiveness of the proposed fault fixing technique and optimization strategies. The experimental results show that the proposed technique can successfully fix about 53% of faults of BPEL programs, and the proposed optimization strategies can significantly reduce the overhead in terms of search matching, patch program verification, test case execution, and fault fixing.
    Available online:  July 05, 2023 , DOI: 10.13328/j.cnki.jos.006908
    Abstract:
    Quantum computing is expected to solve many typical and difficult problems in theory. The rapid development of quantum computers in recent years is pushing the theory into practice. However, numerous errors in current hardware can cause incorrect computational results, which severely limit the ability of quantum computers to solve practical problems. Quantum computing system software lies between applications and hardware. In addition, tapping the full potential of the system software in mitigating hardware errors is crucial to realizing practical quantum computing in the near future. As a result, many research works on quantum computing system software have recently emerged. This study classifies them into three categories: compilers, runtime systems, and debuggers. Through an in-depth analysis of these works, the study sorts out the research status of quantum computing system software and reveals their important roles in mitigating hardware errors. This study also looks forward to future research directions.
    Available online:  July 04, 2023 , DOI: 10.13328/j.cnki.jos.006835
    Abstract:
    Runtime configuration brings flexibility and customizability to users in the utilization of software systems. However, its enormous scale and complex mechanisms also pose significant challenges. A large number of scholars and research institutions have probed into runtime configuration to improve the availability and adaptability of software systems in complex environments. This study develops an analytical framework of runtime configuration to provide a systematic overview of state-of-the-art research from three different stages, namely configuration analysis and comprehension, configuration defect detection and misconfiguration diagnosis, and configuration utilization. The study also summarizes the limitations and challenges faced by current research and outlines the research trend of runtime configuration, which is of guiding significance for future work.
    Available online:  July 04, 2023 , DOI: 10.13328/j.cnki.jos.006836
    Abstract:
    Autonomous driving software based on deep neural networks (DNNs) has become the most popular solution. Like traditional software, DNN can also produce incorrect output or unexpected behaviors, and DNN-based autonomous driving software has caused serious accidents, which seriously threaten life and property safety. Therefore, how to effectively test DNN-based autonomous driving software has become an urgent problem. Since it is difficult to predict and understand the behavior of DNNs, traditional software testing methods are no longer applicable. Existing autonomous driving software testing methods are implemented byadding pixel-level perturbations to original images or modifying the whole image to generate test data. The generated test data are quite different from the real images, and the perturbation-based methods are difficult to be understood. To solve the above problem, this study proposes a test data generation method, namely interpretability analysis-based test data generation (IATG). Firstly, it uses the interpretation method for DNNs to generate visual explanations of decisions made by autonomous driving software and chooses objects in the original images that have significant impacts on the decisions. Then, it generates test data by replacing the chosen objects with other objects with the same semantics. The generated test data are more similar to the real image, and the process is more understandable. As an important part of the autonomous driving software’s decision-making module, the steering angle prediction model is used to conduct experiments. Experimental results show that the introduction of the interpretation method effectively enhances the ability of IATG to mislead the steering angle prediction model. Furthermore, when the misleading angle is the same, the test data generated by IATG are more similar to the real image than DeepTest; IATG has a stronger misleading ability than semSensFuzz, and the interpretation analysis based important object selection method of IATG can effectively improve the misleading ability of semSensFuzz.
    Available online:  June 28, 2023 , DOI: 10.13328/j.cnki.jos.006898
    Abstract:
    As critical Internet infrastructure, DNS brings many privacy and security risks due to its plaintext transmission. Many encryption technologies for DNS channel transmission, such as DoH, DoT, and DoQ, are committed to preventing DNS data from leaking or tampering and ensuring the reliability of DNS message sources. Firstly, this study analyzes the privacy and security problems of plaintext DNS from six aspects, including the DNS message format, data storage and management, and system architecture and deployment, and then summarizes the existing related technologies and protocols. Secondly, the implementation principles and the application statuses of the encryption protocols for DNS channel transmission are analyzed, and the performance of each encryption protocol under different network conditions is discussed with multi-angle evaluation indicators. Meanwhile, it discusses the privacy protection effects of the encryption technologies for DNS channel transmission through the limitations of the padding mechanism, the encrypted traffic identification, and the fingerprint-based encryption activity analysis. In addition, the problems and challenges faced by encryption technologies for DNS channel transmission are summarized from the aspects of the deployment specifications, the illegal use of encryption technologies by malicious traffic and its attack on them, the contradiction between privacy and network security management, and other factors affecting privacy and security after encryption. Relevant solutions are also presented. Finally, it summarizes the highlights of future research, such as the discovery of the encrypted DNS service, server-side privacy protection, the encryption between recursive resolvers and authoritative servers, and DNS over HTTP/3.
    Available online:  June 28, 2023 , DOI: 10.13328/j.cnki.jos.006899
    Abstract:
    Knowledge space theory, which uses mathematical language for the knowledge evaluation and learning guide of learners, belongs to the research field of mathematical psychology. Skills and problems are the two basic elements of knowledge space, and an in-depth study of the relationship between them is the inherent requirement of knowledge state description and knowledge structure analysis. In the existing knowledge space theory, no explicit bi-directional mapping between skills and problems has been established, which makes it difficult to put forward a knowledge structure analysis model under intuitive conceptual meanings. Moreover, the partial order relationship between knowledge states has not been clearly obtained, which is not conducive to depicting the differences between knowledge states and planning the learning path of learners. In addition, the existing achievements mainly focus on the classical knowledge space, without considering the uncertainties of data in practical problems. To this end, this study introduces formal concept analysis and fuzzy sets into knowledge space theory and builds the fuzzy concept lattice models for knowledge structure analysis. Specifically, fuzzy concept lattice models of knowledge space and closure space are presented. Firstly, the fuzzy concept lattice of knowledge space is constructed, and it is proved that the extents of all concepts form a knowledge space by the upper bounds of any two concepts. The idea of granule description is introduced to define the skill-induced atomic granules of problems, whose combinations can help determine whether a combination of problems is a state in the knowledge space. On this basis, a method to obtain the fuzzy concepts in the knowledge space from the problem combinations is proposed. Secondly, the fuzzy concept lattice of closure space is established, and it is proved that the extents of all concepts form the closure space by the lower bounds of any two concepts. Similarly, the problem-induced atomic granules of skills are defined, and their combinations can help determine whether a skill combination is the skills required by a knowledge state in the closure space. In this way, a method to obtain the fuzzy concepts in the closure space from the skill combinations is presented. Finally, the effects of the number of problems, the number of skills, the filling factor, and the analysis scale on the sizes of knowledge space and closure space are analyzed by some experiments. The results show that the fuzzy concepts in the knowledge space are different from any existing concept and cannot be derived from other concepts. The fuzzy concepts in the closure space are attribute-oriented one-sided fuzzy concepts in essence. In the formal context of two-valued skills, there is one-to-one correspondence between the states in knowledge space and closure space, but this relationship does not hold in the formal context of fuzzy skills.
    Available online:  June 28, 2023 , DOI: 10.13328/j.cnki.jos.006900
    Abstract:
    Log is an important carrier of a computer system, which records the states of events, and a log system is responsible for log generation, collection, and output. OpenHarmony is a new open-source, distributed operating system for smart devices in all scenarios of a fully-connected world. Prior to the work described in this study, many key subsystems of OpenHarmony, including the log system, had not been built. The open-source feature of OpenHarmony enables third-party developers to contribute core codes. To solve the problem of the lack of a log system of OpenHarmony, this paper mainly does the following work: ① It analyzes the technical architecture, advantages, and disadvantages of today’s popular log systems. ② It clarifies the model specifications of the log system HiLog according to the interconnection feature of heterogeneous devices in OpenHarmony. ③ It designs and implements the first log system HiLog of OpenHarmony and contributes it to the OpenHarmony trunk. ④ It conducts comparative experiments on the key indicators of HiLog. The experimental data show that in terms of basic performance, the throughput of HiLog and Log is 1500 KB/s and 700 KB/s, respectively, which indicates that HiLog has a 114% improvement over the log system of Android. In terms of log persistence, the packet loss of HiLog is less than 6‰ with a compression rate of 3.5% for persistency, much lower than that of Log. In addition, HiLog also has some novel practical functions such as data protection and flow control.
    Available online:  June 28, 2023 , DOI: 10.13328/j.cnki.jos.006895
    Abstract:
    The morphological changes in retina boundaries are important indicators of retinal diseases, and the subtle changes can be captured by images obtained by optical coherence tomography (OCT). The retinal layer boundary segmentation based on OCT images can assist in the clinical judgment of related diseases. In OCT images, due to the diverse morphological changes in retina boundaries, the key boundary-related information, such as contexts and saliency boundaries, is crucial to the judgment and segmentation of layer boundaries. However, existing segmentation methods lack the consideration of the above information, which results in incomplete and discontinuous boundaries. To solve the above problems, this study proposes a coarse-to-fine method for the segmentation of retinal layer boundary in OCT images based on the end-to-end deep neural networks and graph search (GS), which avoids the phenomenon of “faults” common in non-end-to-end methods. In coarse segmentation, the attention global residual network (AGR-Net), an end-to-end deep neural network, is proposed to extract the above key information in a more sufficient and effective way. Specifically, a global feature module (GFM) is designed to capture the global context information of OCT images by scanning from four directions of the images. After that, the channel attention module (CAM) and GFM are sequentially combined and embedded in the backbone network to realize saliency modeling of context information of the retina and its boundaries. This effort effectively solves the problem of wrong segmentation caused by retina deformation and insufficient information extraction in OCT images. In fine segmentation, a GS algorithm is adopted to remove isolated areas or holes from the coarse segmentation results obtained by AGR-Net. In this way, the boundary keeps a fixed topology, and it is continuous and smooth, which further optimizes the overall segmentation results and provides a more complete reference for medical clinical diagnosis. Finally, the performance of the proposed method is evaluated from different perspectives on two public datasets, and the method is compared with the latest methods. The comparative experiments show that the proposed method outperforms the existing methods in terms of segmentation accuracy and stability.
    Available online:  June 28, 2023 , DOI: 10.13328/j.cnki.jos.006829
    Abstract:
    Nowadays, deep neural networks (DNNs) have been widely used in various fields. However, research has shown that DNNs are vulnerable to attacks of adversarial examples (AEs), which seriously threaten the development and application of DNNs. Most of the existing adversarial defense methods need to sacrifice part of the original classification accuracy to obtain defense capability and strongly rely on the knowledge provided by the generated AEs, so they cannot balance the effectiveness and efficiency of defense. Therefore, based on manifold learning, this study proposes an origin hypothesis of AEs in attackable space from the feature space perspective and a trap-type ensemble adversarial defense network (Trap-Net). Trap-Net adds trap data to the training data based on the original model and uses the trap-type smoothing loss function to establish the seducing relationship between the target data and trap data, so as to generate trap-type networks. In order to address the problem that most adversarial defense methods sacrifice original classification accuracy, ensemble learning is used to ensemble multiple trap networks, so as to expand attackable target space defined by trap labels in the feature space and reduce the loss of the original classification accuracy. Finally, Trap-Net determines whether the input data are AEs by detecting whether the data hit the attackable target space. Experiments on MNIST, K-MNIST, F-MNIST, CIFAR-10, and CIFAR-100 datasets show that Trap-Net has strong defense generalization of AEs without sacrificing the classification accuracy of clean samples, and the results of experiments validate the adversarial origin hypothesis in attackable space. In the low-perturbation white-box attack scenario, Trap-Net achieves a detection rate of more than 85% for AEs. In the high-perturbation white-box attack and black-box attack scenarios, Trap-Net has a detection rate of almost 100% for AEs. Compared with other detection methods of AEs, Trap-Net is highly effective against white-box and black-box adversarial attacks, and it provides an efficient robustness optimization method for DNNs in adversarial environments.
    Available online:  June 28, 2023 , DOI: 10.13328/j.cnki.jos.006830
    Abstract:
    Dynamic memory allocators are fundamental components of modern applications. They manage free memory and handle user memory requests. Modern general-purpose dynamic memory allocators ensure the balance of performance and memory footprint. However, in view of different memory footprints and optimization goals in application scenarios, a general-purpose memory allocator is not the optimal solution. Special-purpose memory allocators for specific application scenarios usually can better satisfy system requirements. However, they are time-consuming and error-prone to implement. Developers often use the memory allocation framework to build special-purpose dynamic memory allocators. However, the existing memory allocator framework has the problems of poor abstraction ability and insufficient composability and customizability. For this reason, this study proposes a composable and customizable dynamic memory allocator framework, namely mortise, based on function composability by reviewing the dynamic memory allocation process from the perspective of functional programming. The framework abstracts system memory allocation as a composition of hierarchical functions of several multiple decoupled memory allocations, and these functions can provide policies to ensure higher customizability and composability. Mortise is implemented by using standard C. To achieve zero performance overhead of hierarchical function composition, mortise uses the metaprogramming features offered by the C preprocessor. Developers can quickly build a memory allocator for targeted application scenarios by composing and customizing the hierarchical function of allocators. In order to prove the effectiveness of mortise, this study presents three different memory allocator instances, namely tlsfcc, hslab, and wfslab, by using mortise. Specifically, tlsfcc is designed for multi-core embedded application scenarios, which improves the parallel throughput by replacing the synchronization strategy; hslab is a core-aware slab-type allocator, which optimizes performance on heterogeneous hardware by customizing thread cache; wfslab is a low-latency and wait-free/lock-free allocator. This study runs benchmarks to compare these allocators with several existing memory allocators. The experiments are carried out on an 8-core x86/64 platform and an 8-core heterogeneous aarch64 embedded platform, and the experimental results show that tlsfcc achieves a mean speedup of 1.76 and 1.59 on the two platforms compared with the original tlsf allocator; hlsab achieves only 69.6% and 85.0% execution time compared with the tcmalloc with a similar architecture; the worst-case memory request latency of wfslab is the smallest among all memory allocators in the experiment, including the state-of-art lock-free memory allocators: mimalloc and snmalloc.
    Available online:  June 28, 2023 , DOI: 10.13328/j.cnki.jos.006893
    Abstract:
    Deep neural networks (DNNs) have made remarkable achievements in many fields, but related studies show that they are vulnerable to adversarial examples. The gradient-based attack is a popular adversarial attack and has attracted wide attention. This study investigates the relationship between gradient-based adversarial attacks and numerical methods for solving ordinary differential equations (ODEs). In addition, it proposes a new adversarial attack based on Runge-Kutta (RK) method, a numerical method for solving ODEs. According to the prediction idea in the RK method, perturbations are added to the original examples first to construct predicted examples, and then the gradients of the loss functions with respect to the original and predicted examples are linearly combined to determine the perturbations to be added for the generation of adversarial examples. Different from the existing adversarial attacks, the proposed adversarial attack employs the prediction idea of the RK method to obtain the future gradient information (i.e., the gradient of the loss function with respect to the predicted examples) and uses it to determine the adversarial perturbations to be added. The proposed attack features good extensibility and can be easily applied to all available gradient-based attacks. Extensive experiments demonstrate that in contrast to the state-of-the-art gradient-based attacks, the proposed RK-based attack boasts higher success rates and better transferability.
    Available online:  June 16, 2023 , DOI: 10.13328/j.cnki.jos.006743
    Abstract:
    A (t, n) threshold private set intersection protocol allows the N participants, each holding a private set of size n, to learn the intersection of their sets only if the number of the elements in their intersection is larger than or equal to the threshold t without revealing their private information. It has a wide range of applications, such as fingerprint recognition, online carpooling, and blind date websites. However, most of the existing research on threshold protocols for private set intersections focuses on two parties. The research on the threshold protocols for multi-party private set intersections is still faced with many challenges. For example, the existing threshold protocols for multi-party private set intersections use public key algorithms with large overheads, such as the fully homomorphic encryption algorithm, and such algorithms have not yet been effectively implemented. To solve the above problems, this study combines robust secret sharing and Bloom filters to propose two effective multi-party threshold private set intersection protocols and implements the protocols by simulation for the first time. Specifically, this study designs a new construction method for Bloom filters. The shares generated by robust secret sharing are corresponded to the elements in the sets of the participants. Whether the number of the elements in the intersection of the parties reaches the threshold is determined by whether correct secrets can be reconstructed from the secret sub-shares obtained by querying the Bloom filter. In this way, the protocols effectively prevent the revealing of the intersection cardinality. The first protocol designed in this study avoids public key algorithms with large overheads. When the security parameter λ is set to 128, the set size n is set to 214 and the threshold is set to 0.8n, the time cost of the online phase of the protocols in the three-party scenario is 191 s. In addition, to resist the collusion of at most N – 1 adversaries in the semi-honest model, this study combines oblivious transfer with the first protocol to design a variant of the first protocol. Under the same condition, the time cost of the online phase is 194 s. Finally, the security proof conducted proves that the proposed protocols are secure in the semi-honest model.
    Available online:  June 16, 2023 , DOI: 10.13328/j.cnki.jos.006744
    Abstract:
    Deep neural networks are vulnerable to attacks from adversarial samples. For instance, in a text classification task, the model can be fooled by modifying a few characters, words, or punctuation marks in the original text to change the classification result. Currently, studies of Chinese adversarial samples are limited in the field of natural language processing (NLP), and they fail to give due consideration to the language features of Chinese. This study proposes CWordCheater, a character-level and word-level high-quality method to generate adversarial samples covering the aspects of pronunciation, glyphs, and punctuation marks by approaching from the Chinese sentiment classification scenarios and taking into account the pictographic, alphabetic, and other language features of Chinese. The ConvAE network is adopted to embed Chinese visual vectors for the replacement modes of visually similar characters and further obtain the candidate pool of such characters for replacement. Moreover, a semantic constraint method based on universal sentence encoder (USE) distance is proposed to avoid the semantic offset in the adversarial sample. Finally, the study proposes a set of multi-dimensional evaluation methods to evaluate the quality of adversarial samples from the two aspects of attack effect and attack cost. Experiment results show that CWordAttacker can reduce the classification accuracy by at least 27.9% on multiple classification models and multiple datasets and has a lower perturbation cost based on vision and semantics.
    Available online:  June 16, 2023 , DOI: 10.13328/j.cnki.jos.006745
    Abstract:
    In the past decade or so, artificial intelligence-related services and applications have boomed, and they require high computing power, high bandwidth, and low latency. Edge computing is currently regarded as one of the most appropriate solutions for such applications, especially for video analysis-related ones. This study investigates multi-server multi-user heterogeneous video analysis task offloading, where users select appropriate edge servers and then upload their raw video data to the servers for video analysis. It models the issue of multi-server multi-user heterogeneous video analysis task offloading as a multiplayer game issue. The aim is to effectively deal with the competition for and sharing of the limited network resources among the numerous users and achieve a stable network resource allocation situation where each user has no incentive to change their task offloading decision unilaterally. With the optimization goal of minimizing the overall delay, this study successively investigates the non-distributed and distributed video analysis scenarios and proposes the game theory-based algorithms of potential optimal server selection and video unit allocation accordingly. Rigorous mathematical proof reveals that Nash equilibrium can be reached by the proposed algorithms in both of the two cases, and a low overall delay is guaranteed. Finally, extensive experiments on actual datasets show that the proposed methods reduce the overall delay by 26.3% on average, compared with that of other currently available algorithms.
    Available online:  June 16, 2023 , DOI: 10.13328/j.cnki.jos.006746
    Abstract:
    In the process of software development, software libraries are widely used as they can reduce development time and costs. Consequently, modern software projects contain code from different sources, which makes the systems highly complex and diversified. In addition, various risks come along with the usage of software libraries, such as low quality or security vulnerabilities, seriously affecting the quality of software projects. By analyzing the intensity of the coupling with software libraries, this study quantifies the complexity and diversity introduced by the dependence on the software libraries to the client code. For this purpose, a software boundary graph (SBG) model is constructed according to the method invocation relationships of the client code with the software libraries to distinguish their code boundaries. Then, a metric suite RMS for the complexity of the software library dependency graph is proposed on the basis of the SBG model to quantify the intensity of the coupling with the software from different sources. In the experiment, this study mines the data on all the historical versions of 10 popular software in the Apache open-source community and finally collects 7857 dependency defects among real-world projects. With the above-mentioned real-world data, empirical investigation based on hypothesis testing is conducted according to the proposed complexity metric suite RMS to discuss the following issues: H1: whether boundary nodes with higher risk factors are more likely to introduce more inter-project dependency defects; H2: whether boundary nodes with higher risk factors are more likely to introduce serious inter-project dependency defects; H3: what is the extent to which the value of the metric suite RMS affects the number and severity of introduced inter-project dependency defects. Experimental results show that according to the evaluation with the RMS, the boundary nodes exhibiting higher coupling degrees with the software libraries are more likely to introduce more inter-project dependency defects with higher severity. Moreover, compared with traditional complexity metrics, RMS greatly influences the number and severity of introduced inter-project dependency defects.
    Available online:  June 16, 2023 , DOI: 10.13328/j.cnki.jos.006747
    Abstract:
    The over-threshold multi-party private set intersection (OT-MP-PSI) protocol is a variant of the conventional PSI protocol. This protocol allows m participants to jointly compute the OT intersection for which at least t (tm) participants have the common element and ensures that only the participant with the OT element knows whether the element belongs to the OT intersection and nothing else. The OT-MP-PSI protocol extends the practical application scenarios of the PSI protocol. As the existing schemes are all constructed on the basis of expensive public key-based cryptography, their heavy computational burden results in long runtime. This study designs a novel cryptographic component, the oblivious programmable pseudo-random secret-sharing (OPPR-SS) component based on symmetric cryptography. Furthermore, a two cloud-assisted OT-MP-PSI protocol is designed on the basis of the OPPR-SS component, and it assigns the tasks of secret sharing and reconstructing to untrusted cloud servers, respectively, so that they can assist in the completion of those tasks. As a result, participants with weak computation capability can complete the OT-MP-PSI protocol as well. Furthermore, the study proves that the proposed protocol is secure in the semi-honest model. Compared with the existing OT-MP-PSI protocols, the proposed protocol achieves the optimal runtime and communication overhead at both the secret sharing stage and the secret reconstructing stage. The communication complexities of the participants, the secret sharing cloud, and reconstructing cloud are no longer related to the threshold t. The number of communication rounds for the participants is constant, and the communication complexity is merely O(n). The computational complexities of the secret sharing cloud and the secret reconstructing cloud are only related to the number of symmetric cryptographic operations.
    Available online:  June 16, 2023 , DOI: 10.13328/j.cnki.jos.006738
    Abstract:
    This study proposes a measurement-device-independent (MDI) quantum secure direct communication (QSDC) protocol with an identity authentication server to solve the problems concerning identity authentication and protocol feasibility during quantum communication and further puts forward a quantum voting scheme on the basis of the proposed MDI-QSDC protocol. This scheme takes advantage of various technologies, such as MDI quantum key distribution, perfect quantum encryption, and the classical one-time pad. In this way, it not only ensures its unconditional security in theory but also avoids the attack of the vulnerabilities of the measurement equipment by outside attackers in practice. Furthermore, this scheme takes the weak coherent pulses in the BB84 state as quantum resources and only performs single-particle operations and the measurements for identifying Bell states. As a result, this scheme is highly feasible for the present technologies. In addition, it extends the identity authentication function and enables the scrutineer to verify the integrity and correctness of voting information by adopting the Bit Commitment. Simulation results and analysis show that the proposed scheme is correct and has unconditional security in theory, i.e., information-theoretic security. Compared with the existing quantum voting schemes, the proposed scheme is more feasible.
    Available online:  June 16, 2023 , DOI: 10.13328/j.cnki.jos.006739
    Abstract:
    Graphics processing unit (GPU) databases have attracted a lot of attention from the academic and industrial communities in recent years. Although quite a few prototype systems and commercial systems (including open-source systems) have been developed as next-generation database systems, whether GPU-based online analytical processing (OLAP) engines really outperform central processing unit (CPU)-based systems is still in doubt. If they do, more in-depth research should be conducted on what kind of workload/data/query processing models are more appropriate. GPU-based OLAP engines have two major technical roadmaps: GPU in-memory processing mode and GPU-accelerated mode. The former stores all the datasets in the GPU device memory to take the best advantage of GPU’s computing power and high bandwidth memory. Its drawbacks are that the limited capacity of the GPU device memory restricts the dataset size and that memory-resident data in the sparse access mode reduces the storage efficiency of the GPU display memory. The latter only stores some datasets in the GPU device memory and accelerates computation-intensive workloads by GPU to support large datasets. The key challenges are how to choose the optimal data distribution and workload distribution models for the GPU device memory to minimize peripheral component interconnect express (PCIe) transfer overhead and maximize GPU’s computation efficiency. This study focuses on how to integrate these two technical roadmaps into the accelerated OLAP engine and proposes OLAP Accelerator as a customized OLAP framework for hybrid CPU-GPU platforms. In addition, this study designs three calculation models, namely, the CPU in-memory calculation model, the GPU in-memory calculation model, and the GPU-accelerated model for OLAP, and proposes a vectorized query processing technique for the GPU platform to optimize device memory utilization and query performance. Furthermore, the different technical roadmaps of GPU databases and corresponding performance characteristics are explored. The experimental results show that the vectorized query processing model based on GPU in-memory achieves the best performance and memory efficiency. The performance is 3.1 and 4.2 times faster than that achieved with the datasets OmniSciDB and Hyper, respectively. The partition-based GPU-accelerated mode only accelerates the join workloads to balance the workloads between the CPU and GPU ends and can support larger datasets than those the GPU in-memory mode can support.
    Available online:  June 16, 2023 , DOI: 10.13328/j.cnki.jos.006740
    Abstract:
    In recent years, deep reinforcement learning has achieved impressive results in complex control tasks. However, its applicability to real-world problems has been seriously weakened by the high sensitivity of hyperparameters and the difficulty in guaranteeing convergence. Metaheuristic algorithms, as a class of black-box optimization methods simulating the objective laws of nature, can effectively avoid the sensitivity of hyperparameters. Nevertheless, they are still faced with various problems, such as the inability to adapt to a huge scale of parameters to be optimized and the low efficiency of sample usage. To address the above problems, this study proposes the twin delayed deep deterministic policy gradient based on a gravitational search algorithm (GSA-TD3). The method combines the advantages of the two types of algorithms. Specifically, it updates the policy by gradient optimization for higher sample efficiency and a faster learning speed. Moreover, it applies the population update method based on the law of gravity to the policy search process to make it more exploratory and stable. GSA-TD3 is further applied to a series of complex control tasks, and experiments show that it significantly out performs similar deep reinforcement learning methods at the forefront.
    Available online:  June 16, 2023 , DOI: 10.13328/j.cnki.jos.006741
    Abstract:
    Blockchain-based decentralized applications have been providing robust, reliable, and durable services in multiple fields, such as encrypted digital currency, cloud storage, and Internet of Things. However, the throughput capacity of blockchain can no longer meet the increasing performance requirements of decentralized applications. Sharding is the current mainstream performance optimization technology for blockchain. Nevertheless, the existing blockchain sharding approaches, mainly focusing on transactions among users, are not always applicable to decentralized applications dominated by the transactions of smart-contract invocation. To solve the above problem, this study designs and implements the consortium blockchain system BETASCO based on smart contract-oriented sharding. BETASCO provides a shard that serves as an independent execution environment for each smart contract, routes transactions to the shards holding the target smart contracts by a contract location service based on the distributed hash table (DHT), and supports communication and collaboration needs across smart contracts by availing the asynchronous invocation mechanism among smart contracts. By virtualizing the nodes, BETASCO allows each node to join multiple shards to support parallel executions of multiple smart contracts on the same set of nodes. The results of experiments show that the overall throughput capacity of BETASCO linearly increases as the number of smart contracts grows, and the throughput capacity for the execution of a single smart contract is comparable to that of HyperLedger Fabric.
    Available online:  June 16, 2023 , DOI: 10.13328/j.cnki.jos.006742
    Abstract:
    Most cross-modal hash retrieval methods only use cosine similarity for feature matching, employ one single calculation method, and do not take into account the impact of instance relations on performance. For this reason, the study proposes a novel method based on reasoning in multiple instance relation graphs. Global and local instance relation graphs are generated by constructing similarity matrices to fully explore the fine-grained relations among the instances. Similarity reasoning is then conducted on the basis of the multiple instance relation graphs. For this purpose, reasoning is performed within the relation graphs in the image and text modalities, respectively. Then, the relations within each modality are mapped to the instance graphs for reasoning. Finally, reasoning within the instance graphs is performed. Furthermore, the neural network is trained by a step-by-step training strategy to adapt to the features of the image and text modalities. Experiments on the MIRFlickr and NUS-WIDE datasets demonstrate that the proposed method has distinct advantages in the metric mean average precision (mAP) and obtains a favorable Top-k-Precision curve. This also indicates that the proposed method deeply explores instance relations and thereby significantly improves the retrieval performance.
    Available online:  June 14, 2023 , DOI: 10.13328/j.cnki.jos.006831
    Abstract:
    Spoken language understanding (SLU), as a core component of task-oriented dialogue systems, aims to extract the semantic framework of user queries. In dialogue systems, the SLU component is responsible for identifying user requests and creating a semantic framework that summarizes user requests. SLU usually includes two subtasks: intent detection (ID) and slot filling (SF). ID is regarded as a semantic utterance classification problem that analyzes the semantics of utterance at the sentence level, while SF is viewed as a sequence labeling task that analyzes the semantics of utterance at the word level. Due to the close correlation between intentions and slots, mainstream works employ joint models to exploit shared knowledge across tasks. However, ID and SF are two different tasks with strong correlation, and they represent sentence-level semantic information and word-level information of utterances respectively, which means that the information of the two tasks is heterogeneous and has different granularities. This study proposes a heterogeneous interactive structure for joint ID and SF, which adequately captures the relationship between sentence-level semantic information and word-level information in heterogeneous information for two correlative tasks by adopting self-attention and graph attention networks. Different from ordinary homogeneous structures, the proposed model is a heterogeneous graph architecture containing different types of nodes and links because a heterogeneous graph involves more comprehensive information and rich semantics and can better interactively represent the information between nodes with different granularities. In addition, this study utilizes a window mechanism to accurately represent word-level embedding to better accommodate the local continuity of slot labels. Meanwhile, the study uses a pre-trained model (BERT) to analyze the effect of the proposed model using BERT. The experimental results of the proposed model on two public datasets show that the model achieves an accuracy of 97.98% and 99.11% on the ID task and an F1 score of 96.10% and 96.11% on the SF task, which are superior to the current mainstream methods.
    Available online:  June 14, 2023 , DOI: 10.13328/j.cnki.jos.006828
    Abstract:
    Software vulnerabilities are known as security defects of computer software systems, and they threaten the completeness, security, and reliability of modern software and application data. Artificial vulnerability management is time-consuming and error-prone. Therefore, in order to better deal with the challenges of vulnerability management, researchers have proposed a variety of automated vulnerability management schemes, among which automated vulnerability repair has attracted wide attention from researchers recently. Automated vulnerability repair consists of three main functions: vulnerability cause localization, patch generation, and patch validation, and it aims to assist developers to repair vulnerabilities. The existing work lacks systematic classification and discussion of vulnerability repair technology. To this end, this study gives a comprehensive insight into the theory, practice, applicable scenarios, advantages, and disadvantages of existing vulnerability repair methods and technologies and writes a research review of automated vulnerability repair technologies, so as to promote the development of vulnerability repair technologies and deepen researchers’ cognition and understanding of vulnerability repair problems. The main contents of the study include: (1) sorting out and summarizing the repair methods of specific and general vulnerabilities according to different vulnerability types; (2) classifying and summarizing different repair methods based on technical principles; (3) summarizing the main challenges of vulnerability repair; (4) looking into future development direction of vulnerability repair.
    Available online:  June 14, 2023 , DOI: 10.13328/j.cnki.jos.006833
    Abstract:
    In recent years, RGB-D salient detection method has achieved better performance than RGB salient detection model by virtue of its rich geometric structure and spatial position information in depth maps and thus has been highly concerned by the academic community. However, the existing RGB-D detection model still faces the challenge of improving performance continuously. The emerging Transformer is good at modeling global information, while the convolutional neural network (CNN) is good at extracting local details. Therefore, effectively combining the advantages of CNN and Transformer to mine global and local information will help to improve the accuracy of salient object detection. For this purpose, an RGB-D salient object detection method based on cross-modal interactive fusion and global awareness is proposed in this study. The transformer network is embedded into U-Net to better extract features by combining the global attention mechanism with local convolution. First, with the help of the U-Net encoder-decoder structure, this study efficiently extracts multi-level complementary features and decodes them step by step to generate a salient feature map. Then, the Transformer module is used to learn the global dependency between high-level features to enhance the feature representation, and the progressive upsampling fusion strategy is used to process the input and reduce the introduction of noise information. Moreover, to reduce the negative impact of low-quality depth maps, the study also designs a cross-modal interactive fusion module to realize cross-modal feature fusion. Finally, experimental results on five benchmark datasets show that the proposed algorithm has an excellent performance than other latest algorithms.
    Available online:  June 14, 2023 , DOI: 10.13328/j.cnki.jos.006896
    Abstract:
    The heterogeneous many-core architecture with an ultra-high energy efficiency ratio has become an important development trend of supercomputer architecture. However, the complexity of heterogeneous systems puts forward higher requirements for application development and optimization, and they face many technical challenges such as usability and programmability in the development process. The independently developed new-generation Sunway supercomputer is equipped with a homegrown heterogeneous many-core processor, SW26010Pro. To take full advantage of the performance of the new-generation many-core processors and support the development and optimization of emerging scientific computing applications, this study designs and implements an optimized compiler swLLVM oriented to the SW26010Pro platform. The compiler supports Athread and SDAA dual-mode heterogeneous programming models and provides multi-level storage hierarchy description and SIMD extensions for vector-like operations. In addition, it realizes control-flow vectorization, cost-based node combination, and compiler optimization for multi-level storage hierarchy according to the architecture characteristics of SW26010Pro. The experimental results show that the compiler optimization designed and implemented in this paper achieves significant performance improvements. The average speedup of control-flow vectorization and node combination and optimization is 1.23 and 1.11, respectively, and the memory access optimization achieves a maximum performance improvement of 2.49 times. Finally, a comprehensive evaluation of swLLVM is performed from multiple dimensions on the standard test set SPEC CPU2006. The results show that swLLVM reports an average increase of 9.04% in the performance of floating-point projects, 5.25% in overall performance, and 79.1% in compilation speed and an average decline of 0.12% in the performance of integer projects and 1.15% in the code size compared to SWGCC with the same optimization level.
    Available online:  June 14, 2023 , DOI: 10.13328/j.cnki.jos.006819
    Abstract:
    Federated learning is an effective method to solve the problem of data silos. When the server calculates all gradients, incorrect calculation of global gradients exists due to the inertia and self-interest of the server, so it is necessary to verify the integrity of global gradients. The existing schemes based on cryptographic algorithms are overspending on verification. To solve these problems, this study proposes a rational and verifiable federated learning framework. Firstly, according to game theory, the prisoner contract and betrayal contract are designed to force the server to be honest. Secondly, the scheme uses a replication-based verification scheme to verify the integrity of the global gradient and supports the offline client side. Finally, the analysis proves the correctness of the scheme, and the experiments show that compared with the existing verification algorithms, the proposed scheme reduces the computing overhead of the client side to zero, the number of communication rounds in one iteration is optimized from three to two, and the training overhead is inversely proportional to the offline rate of the client side
    Available online:  June 14, 2023 , DOI: 10.13328/j.cnki.jos.006825
    Abstract:
    A social law is a set of restrictions on the available actions of agents to establish some target properties in a multiagent system. In the strategic case, where the agents have individual rationality and private information, the social law synthesizing problem should be modeled as an algorithmic mechanism design problem instead of a common optimization problem. Minimal side effect is usually a basic requirement for social laws. From the perspective of game theory, minimal side effect closely relates to the concept of maximum social welfare, and synthesizing a social law with minimal side effect can be modeled as an efficient mechanism design problem. Therefore, this study not only needs to find out the efficient social laws with maximum social welfare for the given target property but also pays for the agents to induce incentive compatibility and individual rationality. The study first designs an efficient mechanism based on the VCG mechanism, namely VCG-SLM, and proves that it satisfies all the required formal properties. However, as the computation of VCG-SLM is an FPNP-complete problem, the study proposes an ILP-based implementation of this mechanism (VCG-SLM-ILP), transforms the computation of allocation and payment to ILPs based on the semantics of ATL, and strictly proves its correction, so as to effectively utilize the currently mature industrial-grade integer programming solver and successfully solve the intractable mechanism computing problems.
    Available online:  June 07, 2023 , DOI: 10.13328/j.cnki.jos.006817
    Abstract:
    The ranking function method is the main method for the termination analysis of loops, and it indicates that loop programs can be terminated. In view of single-path linear constraint loop programs, this study presents a new method to analyze the termination of the loops. Based on the calculation of the normal space of the increasing function, this method considers the calculation of the ranking function in the original program space as that in the subspace. Experimental results show that the method can effectively verify the termination of most loop programs in the existing literature.
    Available online:  June 07, 2023 , DOI: 10.13328/j.cnki.jos.006897
    Abstract:
    Multi-behavior recommendation aims to utilize interactive data from multiple behaviors of users to improve recommendation performance. Existing multi-behavior recommendation methods generally directly exploit the multi-behavior data for the shared initialized user representations and involve the mining of user preferences and modeling of relationships among different behaviors in the tasks. However, these methods ignore the data imbalance under different interactive behaviors (the amount of interactive data varies greatly among different behaviors) and the information loss caused by the adaptation to the above two tasks. User preferences refer to the interests that users exhibit in different behaviors (e.g., browsing preferences), and the relationship among behaviors indicates a potential conversion from one behavior to another behavior (e.g., the conversion from browsing to purchasing). In multi-behavior recommendation, the mining of user preferences and the modeling of relationships among different behaviors can be regarded as a two-stage task. On the basis of the above considerations, the model of two-stage learning for multi-behavior recommendation (TSL-MBR for short) is proposed, which decouples the above two tasks with a two-stage strategy. In particular, the model retains the end-to-end structure and learns the two tasks by alternating training with fixed parameters. The first stage is to model user preferences under different behaviors. In this stage, the interactive data from all behaviors (without distinction as to behavior type) are first used to model the global preferences of users to alleviate the problem of data sparsity to the greatest extent. Then, the interactive data of each behavior are used to refine the behavior-specific user preference (local preference) and thus lessen the influence of the data imbalance among different behaviors. The second stage is to model the relationships among different behaviors. In this stage, the mining of user preferences and modeling of relationships among different behaviors are decoupled to relieve the information loss problem caused by adaptation to the two tasks. This two-stage model significantly improves the system’s ability to predict target behaviors. Extensive experimental results show that TSL-MBR can substantially outperform the state-of-the-art baseline models, achieving 103.01% and 33.87% of relative gains on average over the best baseline on the Tmall and Beibei datasets, respectively.
    Available online:  May 31, 2023 , DOI: 10.13328/j.cnki.jos.006827
    Abstract:
    Microservice architectures have been widely deployed and applied, which can greatly improve the efficiency of software system development, reduce the cost of system update and maintenance, and enhance the extendibility of software systems. However, However, microservices are characterized by frequent changes and heterogeneous fusion, which result in frequent faults, fast fault propagation, and great influence. Meanwhile, complex call dependency or logical dependency between microservices makes it difficult to locate and diagnose faults timely and accurately, which poses a challenge to the intelligent operation and maintenance of microservice architecture systems. The service dependency discovery technology identifies and deduces the call dependency or logical dependency between services from data during system running and constructs a service dependency graph, which helps to timely and accurately discover and locate faults and diagnose causes during system running and is conducive to intelligent operation and maintenance requirements such as resource scheduling and change management. This study first analyzes the problem of service dependency discovery in microservice systems and then summarizes the technical status of the service dependency discovery from the perspective of three types of runtime data, such as monitoring data, system log data, and trace data. Then, based on the fault cause location, resource scheduling, and change management of the service dependency graph, the study discusses the application of service dependency discovery technology to intelligent operation and maintenance. Finally, the study discusses how service dependency discovery technology can accurately discover call dependency or logical dependency and use service dependency graph to conduct change management and predicts future research directions.
    Available online:  May 24, 2023 , DOI: 10.13328/j.cnki.jos.006894
    Abstract:
    Deep learning has achieved great success in image classification, natural language processing, and speech recognition. Data augmentation can effectively increase the scale and diversity of training data, thereby improving the generalization of deep learning models. However, for a given dataset, a well-designed data augmentation strategy relies heavily on expert experience and domain knowledge and requires repeated attempts, which is time-consuming and labor-intensive. In recent years, automated data augmentation has attracted widespread attention from the academic community and the industry through the automated design of data augmentation strategies. To solve the problem that existing automated data augmentation algorithms cannot strike a good balance between prediction accuracy and search efficiency, this study proposes an efficient automated data augmentation algorithm SGES AA based on a self-guided evolution strategy. First, an effective continuous vector representation method is designed for the data augmentation strategy, and then the automated data augmentation problem is converted into a search problem of continuous strategy vectors. Second, a strategy vector search method based on the self-guided evolution strategy is presented. By introducing historical estimation gradient information to guide the sampling and updating of exploration points, it can effectively avoid the local optimal solution while improving the convergence of the search process. The results of extensive experiments on image, text, and speech datasets show that the proposed algorithm is superior to or matches the current optimal automated data augmentation methods without significantly increasing the time consumption of searches.
    Available online:  May 24, 2023 , DOI: 10.13328/j.cnki.jos.006813
    Abstract:
    Migrating from monolithic systems to microservice systems is one of the mainstream options for the industry to realize the reengineering of legacy systems, and microservice architecture refactoring based on monolithic legacy systems is the key to realizing migration. Currently, academia mainly focuses on the research on microservice identification methods, and there are many industry practices of legacy systems refactored into microservices. However, systematic approaches and efficient and robust tools are insufficient. Therefore, based on earlier research on microservices identification and model-driven development method, this study presents MSA-Lab, an integrated design platform for microservice refactoring of monolithic legacy systems based on the model-driven development approach. MSA-Lab analyzes the method call sequence in the running log of the monolithic legacy system, identifies and clusters classes and data tables for constructing abstract microservices, and generates a system architecture design model including the microservice diagram and microservice sequence diagram. The model has two core components: MSA-Generator for automatic microservice identification and design model generation and MSA-Modeller for visualization, interactive modeling, and model syntax constraint checking of microservice static structure and dynamic behavior models. This study conducts experiments in the MSA-Lab platform for effectiveness, robustness, and function transformation completeness on four open-source projects and carries out performance comparison experiments with three same-type tools. The results show that the platform has excellent effectiveness and robustness, function transform completeness for running logs, and superior performance.
    Available online:  May 24, 2023 , DOI: 10.13328/j.cnki.jos.006823
    Abstract:
    How to reduce secure and repeated replies is a challenging problem in the open-domain multi-turn dialogue model. However, the existing open-domain dialogue models often ignore the guiding role of dialogue objectives and how to introduce and select more accurate knowledge information in dialogue history and dialogue objectives. Based on these phenomena, this study proposes a multi-turn dialogue model based on knowledge enhancement. Firstly, the model replaces the notional words in the dialogue history with semaphores and domain words, so as to eliminate ambiguity and enrich the dialogue text representation. Then, the knowledge-enhanced dialogue history and expanded triplet world knowledge are effectively integrated into the knowledge management and knowledge copy modules, so as to integrate information of knowledge, vocabularies, dialogue history, and dialogue objectives and generate diverse responses. The experimental results and visualization on two international benchmark open-domain Chinese dialogue corpora verify the effectiveness of the proposed model in both automatic evaluation and human judgment.
    Available online:  May 24, 2023 , DOI: 10.13328/j.cnki.jos.006822
    Abstract:
    Data replication is an important way to improve the availability of distributed databases. By placing multiple database replicas in different regions, the response speed of local reading and writing operations can be increased. Furthermore, increasing the number of replicas can improve the linear scalability of the read throughput. In view of these advantages, a number of multi-replica distributed database systems have emerged in recent years, including some mainstream systems from the industry such as Google Spanner, CockroachDB, TiDB, and OceanBase, as well as some excellent systems from academia such as Calvin, Aria, and Berkeley Anna. However, these multi-replica databases bring a series of challenges such as consistency maintenance, cross-node transactions, and transaction isolation while providing many benefits. This study summarizes the existing replication architecture, consistency maintenance strategy, cross-node transaction concurrency control, and other technologies. It also analyzes the differences and similarities between several representative multi-replica database systems in terms of distributed transaction processing. Finally, the study builds a cross-region distributed cluster environment on Alibaba Cloud and conducts multiple experiments to study the distributed transaction processing performance of these several representative systems.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006727
    Abstract:
    This study presents the existing and optimized implementation methods for batched lower-upper (LU) matrix decomposition and batched inversion algorithms on the graphics processing unit (GPU). For batched LU decomposition, the study analyzes the number of reads and writes to the global memory when the Left-looking, Right-looking, and other commonly used blocked LU decomposition algorithms are implemented on the GPU. The blocked Left-looking algorithm with less memory access data is selected due to the characteristics of the GPU architecture. In the process of pivoting during LU decomposition, a parallel binary tree search algorithm suitable for the GPU architecture is adopted. In addition, to reduce the impact of the row interchange process caused by the pivoting on the performance of the algorithm, this study proposes two optimization techniques, namely, the Warp-based packet row interchange and row interchange delay. For batched inversion after LU decomposition, this study investigates the correction method employed in the matrix inversion process. When batched inversion is implemented on the GPU, a blocked matrix inversion algorithm with delayed correction is adopted to reduce access to the global memory during the correction. Furthermore, to speed up data reading and writing, the study adopts the optimization method of using more registers and shared memory and that of performing column interchange to reduce memory access data. In addition, a method of dynamic GPU resource allocation during operation is proposed to avoid the idleness of threads and the waste of shared memory and other GPU resources. Compared with the static one-time resource allocation method, the dynamic allocation method improves the performance of the algorithm significantly. Finally, 10000 random matrices with sizes between 33 and 190 data are tested on the TITAN V GPU, and the types of the tested data are single-precision complex, double-precision complex, single-precision real, and double-precision real. The floating-point arithmetic performance of the batched LU decomposition algorithm implemented in this study reaches about 2 TFLOPS, 1.2 TFLOPS, 1 TFLOPS, and 0.67 TFLOPS, respectively. This algorithm achieves the highest speedup of about 9×, 8×, 12×, and 13×, respectively, compared with the implementation in CUBLAS. The highest speedup achieved is about 1.2×–2.5×, 1.2×–3.2×, 1.1×–3× and 1.1×–2.7×, respectively, compared with the implementation in MAGMA. The floating-point arithmetic performance of the proposed batched inversion algorithm can reach about 4 TFLOPS, 2 TFLOPS, 2.2 TFLOPS, and 1.2 TFLOPS, respectively. This algorithm achieves the highest speedup of about 5×, 4×, 7×, and 7×, respectively, compared with the implementation in CUBLAS. The speedup is about 2×–3×, 2×–3×, 2.8×–3.4× and 1.6×–2×, respectively, compared with the implementation in MAGMA.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006728
    Abstract:
    In recent years, the research on influence maximization (IM) for social networks has attracted extensive attention from the scientific community due to the emergence of major application issues, such as information dissemination on the Internet and the blocking of COVID-19’s transmission chain. IM aims to identify a set of optimal influence seed nodes that would maximize the influence of information dissemination according to the propagation model for a specific application issue. The existing IM algorithms mainly focus on unidirectional-link influence propagation models and simulate IM issues as issues of optimizing the selection of discrete influence seed node combinations. However, they have a high computational time complexity and cannot be applied to solve IM issues for signed networks with large-scale conflicting relationships. To solve the above problems, this study starts by building a positive-negative influence propagation model and an IM optimization model readily applicable to signed networks. Then, the issue of selecting discrete seed node combinations is transformed into one of continuous network weight optimization for easier optimization by introducing a deep Q network composed of neural networks to select seed node sets. Finally, this study devises an IM algorithm based on evolutionary deep reinforcement learning for signed networks (SEDRL-IM). SEDRL-IM views the individuals in the evolutionary algorithm as strategies and combines the gradient-free global search of the evolutionary algorithm with the local search characteristics of reinforcement learning. In this way, it achieves the effective search for the optimal solution to the weight optimization issue of the Deep Q Network and further obtains the set of optimal influence seed nodes. Experiments are conducted on the benchmark signed network and real-world social network datasets. The extensive results show that the proposed SEDRL-IM algorithm is superior to the classical benchmark algorithms in both the influence propagation range and the solution efficiency.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006729
    Abstract:
    Social media topic detection aims to mine latent topic information from large-scale short posts. It is a challenging task as posts are short in form and informal in expression and user interactions in social media are complex and diverse. Previous studies only consider the textual content of posts or simultaneously model social contexts in homogeneous situations, ignoring the heterogeneity of social networks. However, different types of user interactions, such as forwarding and commenting, could suggest different behavior patterns and interest preferences and reflect different attention to the topic and understanding of the topic. In addition, different users have different influences on the development and evolution of the same topic. Specifically, compared with ordinary users, the leading authoritative users in a community play a more important role in topic inference. For the above reasons, this study proposes a novel multi-view topic model (MVTM) to infer more complete and coherent topics by encoding heterogeneous social contexts in the microblog conversation network. For this purpose, an attributed multiplex heterogeneous conversation network is built according to the interaction relationships among users and decomposed into multiple views with different interaction semantics. Then, the embedded representation of specific views is obtained by leveraging neighbor-level and interaction-level attention mechanisms, with due consideration given to different types of interactions and the importance of different users. Finally, a multi-view neural variational inference method is designed to capture the deep correlations among different views and adaptively balance their consistency and independence, thereby obtaining more coherent topics. Experiments are conducted on a Sina Weibo dataset covering three months, and the results reveal the effectiveness of the proposed method.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006730
    Abstract:
    Multiple-choice reading comprehension typically adopts the two-stage pipeline framework of evidence extraction and answer prediction, and the effect of answer prediction highly depends on evidence sentence extraction. Traditional evidence extraction methods mostly rely on phrase matching or supervise evidence extraction with noise labels. The resultant unsatisfactory accuracy significantly reduces the performance of answer prediction. To address the above problem, this study proposes a multiple-choice reading comprehension method based on multi-view graph encoding in a joint learning framework. The correlations among the sentences in the text and those of such sentences with question sentences are fully explored from multiple views to effectively model evidence sentences and their relationships. Moreover, evidence extraction and answer prediction tasks are jointly trained so that the strong correlations of the evidence with the answers can be exploited for joint learning, thereby improving the performance of evidence extraction and answer prediction. Specifically, this method encodes texts, questions, and candidate answers jointly with the multi-view graph encoding module. The relationships among the texts, questions, and candidate answers are captured from the three views of statistical characteristics, relative distance, and deep semantics, thereby obtaining question-answer-aware text encoding features. Then, a joint learning module combining evidence extraction with answer prediction is built to strengthen the relationships of evidence with answers through joint training. The evidence extraction submodule is designed to select evidence sentences and fuse the results with text encoding features selectively. The fusion results are then used by the answer prediction submodule to complete the answer prediction. Experimental results on the multiple-choice reading comprehension datasets ReCO and RACE demonstrate that the proposed method attains a higher ability to select evidence sentences from texts and ultimately achieves higher accuracy of answer prediction. In addition, joint learning combining evidence extraction with answer prediction significantly alleviates the error accumulation problem induced by the traditional pipeline framework.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006731
    Abstract:
    Event planning on event-based social networks (EBSNs) has been attracting research efforts for decades. The key insight of the event planning problem is assigning a group of users to a set of events, such that a pre-defined objective function is maximized, subject to a set of constraints. In real applications, important factors, such as event conflicts, event capacities, user capacities, social preferences between users, event preferences, abbreviated as conflict, capacity, and preference (CCP), are necessary to be considered in the event planning. This work summarizes these important factors as conflict, capacity, and preference, denoted by CCP for short. Existing works do not consider CCP when computing event plans. Hence, this paper proposes CCP-based event planning problem on event-based social networks, so that more reasonable event plans can be obtained. Since this is the first time to propose CCP-based event planning problem, none of the existing methods can be directly applied to solve it. Compared with previous event planning problem that only considers part of CCP factors, the challenges of solving the CCP-based event planning problem include the complexity of the new problem, more constraints involved. Hence, this paper proposes these algorithms to solve the CCP-based event planning problem in efficient and effective ways. Extensive experiments are conducted and the results prove the effectiveness and the efficiency of the proposed algorithms.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006733
    Abstract:
    WiFi is one of the most important communication modes at present, and indoor localization systems based on WiFi signals are most promising for widespread deployment and application in daily life. The latest research shows that such a system can achieve submeter-level localization accuracy when it utilizes the channel state information (CSI) obtained during WiFi communication for target localization. However, the accuracy of localization in experimental scenarios depends on many factors, such as the location of the test points, the layout of the WiFi devices, and that of the antennas. Moreover, the WiFi localization systems deployed often fail to provide the desired accuracy since performance prediction methods for WiFi CSI localization are still unavailable. For the above reasons, this study develops a performance prediction model for WiFi CSI localization that applies to diverse scenarios. Specifically, the study defines the error infinitesimal function between a pair of antennas on the basis of the basic physical CSI localization model. The error infinitesimal matrix and the corresponding heat map of localization performance are generated by analyzing the localization space. Then, multi-antenna fusion and multi-device fusion methods are adopted to extend the antenna pairs, thereby constructing a general performance prediction model for CSI localization. Finally, the study proposes integrating the abovementioned heat map with scenario maps to give due consideration to actual scenario maps and ultimately provide a customized performance prediction solution for a given scenario. In addition to the theoretical analysis, this study verifies the effectiveness of the proposed performance prediction model for localization with experimental data in two scenarios. The experimental results show that the actual localization accuracy is consistent with the proposed theoretical model in variation trend, and the model optimizes the localization accuracy by 32%–37%.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006734
    Abstract:
    Time series segmentation is an important research direction in the field of data mining. At present, the time series segmentation technique based on matrix profile (MP) has received increasing attention from researchers and has achieved great research results. However, this technique and its derivative algorithms also have their own short comings. For one thing, the matching of similar subsequences in the case of arcs crossing non-target activity states arises when the fast low-cost semantic segmentation algorithm based on MP is employed for time series segmentation of a given activity state and the nearest neighbors are connected by arcs. For another, the existing segmentation point extraction algorithm uses a given length window when extracting segmentation points. In this case, the segmentation points obtained are highly likely to exhibit large deviations from the real values, which reduces the accuracy. To address the above problems, this study proposes a time series segmentation algorithm limiting the arc cross, namely limit arc curve cross-FLOSS (LAC-FLOSS). This algorithm adds weights to arcs to obtain a kind of weighted arcs and solves the subsequence mismatch problem caused by the state crossing of the arcs by setting a matching distance threshold. In addition, an improved segmentation point extraction algorithm, namely, the improved extract regimes (IER) algorithm, is proposed. This algorithm extracts the extremes from the troughs according to the shape properties of the sequence of corrected arc crossings (CAC), thereby avoiding the problem that segmentation points are obtained at non-inflection points when the windows are used directly. Comparative experiments are conducted on the public datasets datasets_seg and MobiAct, and the results verify the feasibility and effectiveness of the above two solutions.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006735
    Abstract:
    Amid the in-depth integration of information technology and education, the booming online education has become the new normal of the education informatization process and has generated massive amounts of education data. However, online education also faces high dropout rates, low course completion rates, insufficient supervision, and other problems. How to mine and analyze the massive education data is the key to solving these problems. A learning community is a learning organization with learners as its core element, and it emphasizes the interactive communication, resource sharing, and collaborative learning among the learners in the learning process so that common learning tasks or goals can be completed. This study reviews, analyzes, and discusses the prospect of the research on learning communities in the online education environment. Firstly, the background and importance of learning communities in the online education environment are outlined. Secondly, the definitions of a learning community in different disciplines are presented. Thirdly, the construction methods for three types of learning communities, namely, homogeneous, heterogeneous, and hybrid learning communities, are summarized. Fourthly, the management mechanism for learning communities is discussed from the three aspects of sharing, collaboration, and incentive. Last but not least, directions for future research on learning communities are suggested.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006736
    Abstract:
    Video description technology aims to automatically generate textual descriptions with rich content for videos, and it has attracted extensive research interest in recent years. An accurate and elaborate method of video description generation not only should have achieved a global understanding of the video but also depends heavily on the local spatial and time-series features of specific salient objects. How to model a better video feature representation has always been an important but difficult part of video description tasks. In addition, most of the existing work regards a sentence as a chain structure and views a video description task as a process of generating a sequence of words, ignoring the semantic structure of the sentence. Consequently, the currently available algorithms are unable to handle and optimize complex sentence descriptions or avoid logical errors commonly seen in the long sentences generated. To tackle the problems mentioned above, this study proposes a novel generation method for interpretable video descriptions guided by language structure. Due consideration is given to both local object information and the semantic structure of the sentence by designing an attention-based structured tubelet localization mechanism. When it is incorporated with the parse tree constructed from sentences, the proposed method can adaptively attend to corresponding spatial-temporalfeatures with textual contents and further improve the performance of video description generation. Experimental results on mainstream benchmark datasets of video description tasks, i.e., Microsoft research video captioning corpus (MSVD) and Microsoft research video to text (MSR-VTT), show that the proposed approach achieves state-of-the-art performance on most of the evaluation metrics.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006737
    Abstract:
    Due to the continuous breakthrough and development of information and communication technologies, information access has become convenient on the one hand. On the other hand, private information is now easier to leak than before. The combination of the intelligent field and secure multiparty computation (SMC) technology is expected to solve privacy protection problems. Although SMC has solved many different privacy protection problems so far, problems that remain to be settled are numerous. Research results about the SMC of range and the sum of extremums are currently seldom reported. As a common statistical tool, range and sum of extremums have been widely used in practice. Therefore, the secure computation of range and the sum of extremes are of great research significance. This study proposes a new encoding method and solves two types of SMC problems by the method: One is the secure computation of range, and the other is that of the sum of extremums. The new encoding method is combined with the Lifted ElGamal threshold cryptosystem to design a secure range computation protocol for distributed private datasets in the scenario in which multiple parties participate and each party has one data. Then, the new encoding method is slightly modified for the secure computation of the sum of extremums in the same scenario. On this basis, the study further modifies the new encoding method and combines it with the Paillier cryptosystem to design a protocol for the secure computation of range and the sum of extremums on distributed private datasets in the scenario in which two parties participate and each party has more than one data. Furthermore, this study proves that the proposed protocols are secure in the semi-honest model with the simulation paradigm. Finally, the complexities of these protocols are tested by simulation experiments. The results of the efficiency analysis and experiments show that the simple and efficient proposed protocols can be widely used in practical applications and are important tools for solving many other SMC problems.
    Available online:  May 17, 2023 , DOI: 10.13328/j.cnki.jos.006826
    Abstract:
    Recently, with the popularity of ubiquitous computing, intelligent sensing technology has become the focus of researchers, and non-contact sensing based on WiFi is more and more popular in academia and industry because of its excellent generality, low deployment cost, and great user experience. The typical non-contact sensing work based on WiFi includes gesture recognition, breath detection, intrusion detection, behavior recognition, etc. For real-life deployment of these works, one of the major challenges is to avoid the interference of irrelevant behaviors in other irrelevant areas, so it is necessary to judge whether the target is in a specific sensing area or not, which means that the system should be able to determine exactly which side of the boundary line the target is on. However, the existing work cannot find a way to accurately monitor a freely set boundary, which hinders the actual implementation of WiFi-based sensing applications. In order to solve this problem, based on the physical essence of electromagnetic wave diffraction and the Fresnel diffraction model, this study finds a signal feature, namely Rayleigh distribution in Fresnel diffraction model (RFD), when the target passes through the link (the line between the WiFi receiver and transmitter antennas) and reveals the mathematical relationship between the signal feature and human activity. Then, the study realizes a boundary monitoring algorithm through line crossing detection by using the link as the boundary and considering the waveform delay caused by antenna spacing and the features of automatic?gain?control (AGC) when the link is blocked. On this basis, the study also implements two practical applications, that is, intrusion detection system and home state detection system. The intrusion detection system achieves a precision of more than 89% and a recall rate of more than 91%, while the home state detection system achieves an accuracy of more than 89%. While verifying the availability and robustness of the boundary monitoring algorithm, the study also shows the great potential of combining the proposed method with other WiFi-based sensing technologies and provides a direction for the actual deployment of WiFi-based sensing technologies.
    Available online:  May 17, 2023 , DOI: 10.13328/j.cnki.jos.006821
    Abstract:
    As challenges such as serious occlusions and deformations coexist, video segmentation with accurate robustness has become one of the hot topics in computer vision. This study proposes a video segmentation method with absorbing Markov chains and skeleton mapping, which progressively produces accurate object contours through the process of pre-segmentation—optimization—improvement. In the phase of pre-segmentation, based on the twin network and the region proposal network, the study obtains regions of interest for objects, constructs the absorbing Markov chains of superpixels in these regions, and calculates the labels of foreground/background of the superpixels. The absorbing Markov chains can perceive and propagate the object features flexibly and effectively and preliminarily pre-segment the target object from the complex scene. In the phase of optimization, the study designs the short-term and long-term spatial-temporal cue models to obtain the short-term variation and the long-term feature of the object, so as to optimize superpixel labels and reduce errors caused by similar objects and noise. In the phase of improvement, to reduce the artifacts and discontinuities of optimization results, this study proposes an automatic generation algorithm for foreground/background skeleton based on superpixel labels and positions and constructs a skeleton mapping network based on encoding and decoding, so as to learn the pixel-level object contour and finally obtain accurate video segmentation results. Many experiments on standard datasets show that the proposed method is superior to the existing mainstream video segmentation methods and can produce segmentation results with higher region similarity and contour accuracy.
    Available online:  May 17, 2023 , DOI: 10.13328/j.cnki.jos.006814
    Abstract:
    Efficient mobile charging scheduling is a key technology to build wireless rechargeable sensor networks (WRSN) which have long life cycle and sustainable operation ability. The existing charging methods based on reinforcement learning only consider the spatial dimension of mobile charging scheduling, i.e., the path planning of mobile chargers (MCs), while leaving out the temporal dimension of the problem, i.e., the adjustment of the charging duration, and thus these methods have suffered some performance limitations. This study proposes a dynamic spatiotemporal charging scheduling scheme based on deep reinforcement learning (SCSD) and establishes a deep reinforcement learning model for dynamic adjustment of charging sequence scheduling and charging duration. In view of the discrete charging sequence planning and continuous charging duration adjustment in mobile charging scheduling, the study uses DQN to optimize the charging sequence for nodes to be charged and calculates and dynamically adjusts the charging duration of the nodes. By optimizing the two dimensions of space and time respectively, the SCSD proposed in this study can effectively improve the charging performance while avoiding the power failure of nodes. Simulation experiments show that SCSD has significant performance advantages over several well-known typical charging schemes.
    Available online:  May 17, 2023 , DOI: 10.13328/j.cnki.jos.006810
    Abstract:
    As the trusted decentralized application, smart contracts attract widespread attention, whereas their security vulnerabilities threaten the reliability. To this end, researchers employ various advanced technologies (such as fuzz testing, machine learning, and formal verification) to study several vulnerability detection technologies and yield sound effects. This study collects 84 related papers by July 2021 to systematically sort out and analyze existing vulnerability detection technologies of smart contracts. First of all, vulnerability detection technologies are categorized according to their core methodologies. These technologies are analyzed from the aspects of implementation methods, vulnerability categories, and experimental data. Additionally, the differences between domestic and international research in these aspects are compared. Finally, after summarizing the existing technologies, the study discusses the challenges of vulnerability detection technologies and potential research directions.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006811
    Abstract:
    Basic linear algebra subprogram (BLAS) is one of the most basic and important math libraries. The matrix-matrix operations covered in the level-3 BLAS functions are particularly significant for a standard BLAS library and are widely employed in many large-scale scientific and engineering computing applications. Additionally, level-3 BLAS functions are computing intensive functions and play a vital role in fully exploiting the computing performance of processors. Multi-core parallel optimization technologies are studied for level-3 BLAS functions on SW26010-Pro, a domestic processor. According to the memory hierarchy of SW26010-Pro, this study designs a multi-level blocking algorithm to exploit the parallelism of matrix operations. Then, a data-sharing scheme based on remote memory access (RMA) mechanism is proposed to improve the data transmission efficiency among CPEs. Additionally, it employs triple buffering and parameter tuning to fully optimize the algorithm and hide the memory access costs of direct memory access (DMA) and the communication overhead of RMA. Besides, the study adopts two hardware pipelines and several vectorized arithmetic/memory access instructions of SW26010-Pro and improves the floating-point computing efficiency of level-3 BLAS functions by writing assembly code manually for matrix-matrix multiplication, matrix equation solving, and matrix transposition. The experimental results show that level-3 BLAS functions can significantly improve the performance on SW26010-Pro by leveraging the proposed parallel optimization. The floating-point computing efficiency of single-core level-3 BLAS is up to 92% of the peak performance, while that of multi-core level-3 BLAS is up to 88% of the peak performance.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006815
    Abstract:
    With the development of deep learning and steganography, deep neural networks are widely used in image steganography, especially in a new research direction, namely embedding an image message in an image. The mainstream steganography of embedding an image message in an image based on deep neural networks requires cover images and secret images to be input into a steganographic model to generate stego-images. But recent studies have demonstrated that the steganographic model only needs secret images as input, and then the output secret perturbation is added to cover images, so as to embed secret images. This novel embedding method that does not rely on cover images greatly expands the application scenarios of steganography and realizes the universality of steganography. However, this method currently only verifies the feasibility of embedding and recovering secret images, and the more important evaluation criterion for steganography, namely concealment, has not been considered and verified. This study proposes a high-capacity universal steganography generative adversarial network (USGAN) model based on an attention mechanism. By using the attention module, the USGAN encoder can adjust the perturbation intensity distribution of the pixel position on the channel dimension in the secret image, thereby reducing the influence of the secret perturbation on the cover images. In addition, in this study, the CNN-based steganalyzer is used as the target model of USGAN, and the encoder learns to generate a secret adversarial perturbation through adversarial training with the target model so that the stego-image can become an adversarial example for attacking the steganalyzer at the same time. The experimental results show that the proposed model can not only realize a universal embedding method that does not rely on cover images but also further improves the concealment of steganography.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006816
    Abstract:
    How brains realize learning and perception is an essential question for both artificial intelligence and neuroscience communities. Since the existing artificial neural networks (ANNs) are different from the real brain in terms of structures and computing mechanisms, they cannot be directly used to explore the mechanisms of learning and dealing with perceptual tasks in the real brain. The dendritic neuron model is a computational model to model and simulate the information processing process of neuron dendrites in the brain and is closer to biological reality than ANNs. The use of the dendritic neural network model to deal with and learn perceptual tasks plays an important role in understanding the learning process in the real brain. However, current learning models based on dendritic neural networks mainly focus on simplified dendritic models and are unable to model the entire signal-processing mechanisms of dendrites. To solve this problem, this study proposes a learning model of the biophysically detailed neural network of medium spiny neurons (MSNs). The neural network can fulfill corresponding perceptual tasks through learning. Experimental results show that the proposed model can achieve high performance on the classical image classification task. In addition, the neural network shows strong robustness under noise interference. By further analyzing the network features, this study finds that the neurons in the network after learning show stimulus selectivity, which is a classical phenomenon in neuroscience. This indicates that the proposed model is biologically plausible and implies that stimulus selectivity is an essential property of the brain in fulfilling perceptual tasks through learning.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006820
    Abstract:
    In large-scale and complex software systems, requirement analysis and generation are accomplished through a top-down process, and the construction of tracking relationships between cross-level requirements is very important for project management, development, and evolution. The loosely-coupled contribution approach of open-source systems requires each participant to easily understand the context and state of the requirements, which relies on cross-level requirement tracking. The issue description log is a common way of presenting requirements in open-source systems. It has no fixed template, and its content is diverse (including text, code, and debugging information). Furthermore, the terms can be freely used, and the gap in abstraction level between cross-level requirements is large, which brings great challenges to automatic tracking. In this paper, a correlation feedback method for key feature dimensions is proposed. Through static analysis of the project’s code structure, code-related terms and their correlation strength are extracted, and a code vocabulary base is constructed to alleviate the gap in abstraction level and the inconsistency of terminology between cross-level requirements. By measuring the importance of terms to requirement description and screening key feature dimensions on this basis, the inquiry statement is optimized to effectively reduce the noise of requirement description length, content form, and other aspects. Experiments with two scenarios on three open-source systems suggest that the proposed method outperforms baseline approaches in cross-level requirement tracking and improves F2 value to 29.01%, 7.75.1%, and 59,21% compared with vector space model (VSM), standard Rocchio, and trace bidirectional encoder representations from transformers (BERT), respectively.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006824
    Abstract:
    Remaining process time prediction is important for preventing and intervening in abnormal business operations. For predicting the remaining time, existing approaches have achieved high accuracy through deep learning techniques. However, most of these techniques involve complex model structures, and the prediction results are difficult to be explained, namely, unexplainable issues. In addition, the prediction of the remaining time usually uses the key attribute, namely activity, or selects several other attributes as the input features of the predicted model according to the domain knowledge. However, a general feature selection method is missing, which may affect both prediction accuracy and model explainability. To tackle these two challenges, this study introduces a remaining process time prediction framework based on an explainable feature-based hierarchical (EFH) model. Specifically, a feature self-selection strategy is first proposed, and the attributes that have a positive impact on the prediction task are obtained as the input features of the model through the backward feature deletion based on priority and the forward feature selection based on feature importance. Then an EFH model is proposed. The prediction results of each layer are obtained by adding different features layer by layer, so as to explain the relationship between input features and prediction results. The study also uses the light gradient boosting machine (LightGBM) and long short-term memory (LSTM) algorithms to implement the proposed approach, and the framework is general and not limited to the algorithms selected in this study. Finally, the proposed approach is compared with other methods on eight real-life event logs. The experimental results show that the proposed approach can select effective features and improve prediction accuracy. In addition, the prediction results are explained.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006801
    Abstract:
    The Olympic heritage is the treasure of the world. The integration of technology, culture, and art is crucial to the diversified presentation and efficient dissemination of the heritage of the Beijing Winter Olympics. As an important trend form of digital museums in the information era, online exhibition halls lay a good foundation in the research on individual digital museums and interactive technologies, but so far, no systematic, intelligent, interactive, and friendly system of the Winter Olympics digital museum has been built. This study proposes an online exhibition hall construction method with interactive feedback for the Beijing 2022 Winter Olympics. By constructing an interactive exhibition hall with intelligent virtual agent, it has further explored the role of interactive feedback in disseminating intangible cultural heritage in a knowledge dissemination-based digital museum. To explore the influence of audio-visual interactive feedback on spreading Olympic spiritual culture in the exhibition hall and improve the user experience, the study conducts a user experiment with 32 participants. The results show that the constructed exhibition hall can greatly promote the dissemination of Olympic culture and spirit, and the introduction of audio-visual interactive feedback in the exhibition hall can improve users’ perceptual control, thereby improving the user experience.
    Available online:  April 27, 2023 , DOI: 10.13328/j.cnki.jos.006685
    Abstract:
    Symmetric searchable encryption (SSE) can retrieve encrypted data without disclosing user privacy and has been widely studied and applied in cloud storage. However, in SSE schemes, semi-honest or dishonest servers may tamper with the data in files and return the untrusted files to users, so it is necessary to verify these files. Most existing verifiable SSE schemes are verified by the users locally, and malicious users may forge verification results, which cannot ensure verification fairness. To this end, this study proposes a verifiable dynamic symmetric searchable encryption scheme based on blockchain, VDSSE). VDSSE employs symmetric encryption to achieve forward security in the dynamic updating, and on this basis, the blockchain is utilized to verify the search results. During the verification, a new verification tag, Vtag, is proposed. The accumulation of Vtag is leveraged to compress the verification information, reduce the storage cost of verification information on the blockchain, and effectively support the dynamic verification of SSE schemes. Finally, experimental evaluation and security analysis are conducted on VDSSE to verify the feasibility and security of the scheme.
    Available online:  April 27, 2023 , DOI: 10.13328/j.cnki.jos.006689
    Abstract:
    In rich-resource scenarios, using similarity translation as the target prototype sequence can improve the performance of neural machine translation. However, in low-resource scenarios, due to the lack of parallel corpus resources, the prototype sequence cannot be matched, or the sequence quality is poor. To address this problem, this study proposes a low-resource neural machine translation approach with multi-strategy prototype generation, and the approach includes two phases. (1) Keyword matching and distributed representation matching are combined to retrieve prototype sequences, and the pseudo prototype generation approach is leveraged to generate available prototype sequences during retrieval failures. (2) The conventional encoder-decoder framework is improved for the effective employment of prototype sequences. The encoder side utilizes additional encoders to receive prototype sequences. The decoder side, while employing a gating mechanism to control information flow, adopts improved loss functions to reduce the negative impact of low-quality prototype sequences on the model. The experimental results on multiple datasets show that the proposed method can effectively improve the translation performance compared with the baseline models.
    Available online:  April 27, 2023 , DOI: 10.13328/j.cnki.jos.006720
    Abstract:
    Sparse triangular solve (SpTRSV) is a vital operation in preconditioners. In particular, in scientific computing program that solves partial differential equation systems iteratively, structured SpTRSV is a common type of issue and often a performance bottleneck that needs to be addressed by the scientific computing program. The commercial mathematical libraries tailored to the graphics processing unit (GPU) platform, represented by CUSPARSE, parallelize SpTRSV operations by level-scheduling methods. However, this method is weakened by time-consuming preprocessing and serious GPU thread idle when it is employed to deal with structured SpTRSV issues. This study proposes a parallel algorithm tailored to structured SpTRSV issues. The proposed algorithm leverages the special non-zero element distribution pattern of structured SpTRSV issues during task allocation to skip the preprocessing and analysis of the non-zero element structure of the input issue. Furthermore, the element-wise operation strategy used in the existing level-scheduling methods is modified. As a result, the problem of GPU thread idle is effectively alleviated, and the memory access latency of some non-zero elements in the matrix is concealed. This study also adopts a state variable compression technique according to the task allocation characteristics of the proposed algorithm, significantly improving the cache hit rate of the algorithm in state variable operations. Additionally, several hardware features of the GPU, including predicated execution, are investigated to comprehensively optimize algorithm implementation. The proposed algorithm is tested on NVIDIA V100 GPU, achieving an average 2.71× acceleration over CUSPARSE and a peak effective memory-access bandwidth of 225.2 GB/s. The modified element-wise operation strategy, combined with a series of other optimization measures for GPU hardware, attains a prominent optimization effect by yielding a nearly115% increase in the effective memory-access bandwidth of the proposed algorithm.
    Available online:  April 27, 2023 , DOI: 10.13328/j.cnki.jos.006678
    Abstract:
    As a software system is a complex artifact, the interaction between classes exerts a potential impact on software quality, with the cascading propagation effect of software defects as a typical case. How to accurately predict the reasonable relationship between classes in the software system and optimize the design structure is still an open problem in software quality assurance. From the perspective of software network, this study comprehensively considers the interactions between classes in a software system (class external graph, CEG), and those between internal methods of each class (class internal graph, CIG). The software system is abstracted into a software network with a graph of graphs structure. As a result, a class interaction prediction method based on the graph of graphs convolutional network is proposed. Firstly, the initial characteristics of class nodes are obtained through the convolution of each CIG. Then the representation vector of class nodes is updated through the convolution of CEG, and finally, the evaluation values between class nodes are calculated for interaction prediction. The experimental results on six Java open source projects show that the graph of graphs structure is helpful to improve the representation of software system structure. The average growth rates of the area under the curve (AUC) and average precision (AP) of the proposed method are more than 5.5% compared with those of the conventional network embedding methods. In addition, the average growth rates of AUC and AP are more than 9.36% and 5.22%, respectively compared with those of the two peer methods.
    Available online:  April 26, 2023 , DOI: 10.13328/j.cnki.jos.006805
    Abstract:
    The uncertainty of tasks in mobile edge computing scenarios makes task offloading and resource allocation more complex and difficult. Therefore, a continuous offloading and resource allocation method of uncertain tasks in mobile edge computing is proposed. Firstly, a continuous offloading model of uncertain tasks in mobile edge computing is built, and the multi-batch processing technology based on duration slice partition is employed to address task uncertainty. A multi-device computing resource coordination mechanism is designed to improve the carrying capacity of computation-intensive tasks. Secondly, an adaptive strategy selection algorithm based on load balancing is put forward to avoid channel congestion and additional energy consumption caused by the over-allocation of computing resources. Finally, the uncertain task scenario model is simulated based on Poisson distribution, and experimental results show that the reduction of time slice length can reduce the total energy consumption of the system. In addition, the proposed algorithm can achieve task offloading and resource allocation more effectively and can reduce energy consumption by up to 11.8% compared with comparison algorithms.
    Available online:  April 26, 2023 , DOI: 10.13328/j.cnki.jos.006807
    Abstract:
    Emotional dialogue technology focuses on the “emotional quotient” of conversational robots, aiming to give the robots the ability to observe, understand and express emotions as humans do. This technology can be seen as the intersection of emotional computing and dialogue technology, and can simultaneously consider the “intelligent quotient” and “emotional quotient” of conversational robots to realize spiritual companionship, emotional comfort, and psychological guidance for users. Combined with the characteristics of emotions in dialogues, this study provides a comprehensive analysis of emotional dialogue technology: 1) Three important technical points including emotion recognition, emotion management, and emotion expression in dialogue scenarios are shown, and the technology of emotional dialogues in multimodal scenarios is expanded. 2) This study presents the latest research progress on technology points related to emotional dialogues and summarizes the main challenges and possible solutions correspondingly. 3) Data resources for emotional dialogue technologies are introduced. 4) The difficulty and prospect of emotional dialogue technology are pointed out.
    Available online:  April 26, 2023 , DOI: 10.13328/j.cnki.jos.006809
    Abstract:
    In a hybrid cloud environment, enterprise business applications and data are often transferred across different cloud services. For complex and diversified cloud service environments, most hybrid cloud applications adopt access control policies made around only access subjects and adjust the policies manually, which cannot meet the fine-grained dynamic access control requirements at different stages of the data life cycle. This study proposes AHCAC, an adaptive access control method oriented to data life cycle in a hybrid cloud environment. Firstly, the the policy description idea based on key attributes are employed to unify the heterogeneous policies of the full life cycle of data under the hybrid cloud. Especially, the “stage” attribute is introduced to explicitly identify the life-cycle state of data, which is the basis for achieving fine-grained access control oriented to data life cycle. Secondly, in view of the similarity and consistency of access control policy with the same life-cycle stage, the policy distance is defined, and a hierarchical clustering algorithm based on the policy distance is proposed to construct the corresponding data access control policy in each life-cycle stage. Finally, when the life-cycle stage of data is changed, the adaptation and loading of policies of corresponding data stages in the policy evaluation are triggered through key attribute matching, which realizes the adaptive access control oriented to the data life cycle. This study also conducts experiments to verify the effectiveness and feasibility of the proposed method on OpenStack and open-source policy evaluation engine Balana.
    Available online:  April 19, 2023 , DOI: 10.13328/j.cnki.jos.006760
    Abstract:
    In recent years, the localization and tracking of moving targets have been widely used in scenes including indoor navigation, smart homes, security monitoring, and smart medical services. Radio frequency (RF)-based contactless localization and tracking have attracted extensive attention from researchers. Among them, the commercial IR-UWB-based technology can achieve target localization and tracking at low costs and power consumption and has strong development potential. However, most of the existing studies have the following problems: 1) Limited tracking scenes. Modeling and processing methods are only for outdoor or relatively empty indoor scenes under ideal conditions. 2) Limited movement states of targets and unduly ideal modeling. 3) Low tracking accuracy caused by fake moving targets. To solve these problems, this study proposes a moving target tracking method using IR-UWB on the basis of understanding the composition of the received signal spectrum in multipath scenes. First, the dynamic components of the originally received signal spectrum are extracted. Then, the Gaussian blur-based multipath elimination and distance extraction algorithm is employed to eliminate multipath interference, which only retains primary reflection information directly related to the moving target and therefore accurately obtains the distance variation curve of the target. Subsequently, a multi-view fusion algorithm is proposed to fuse the distance information of the devices from different views to achieve accurate localization and tracking of a single freely moving target. In addition, a real-time moving target tracking system based on the low-cost commercial IR-UWB radar is established. The experimental results in the real indoor home scene show that the error between the center position of the human body estimated by the system and the real motion trajectory is always within 20 cm. Moreover, the system remains robust even if influencing factors such as the experimental environment, experimenter, activity speed, and equipment height are altered.
    Available online:  April 19, 2023 , DOI: 10.13328/j.cnki.jos.006761
    Abstract:
    SPN construction is the most widely used overall construction of block ciphers at present, which is adopted by block ciphers such as AES and ARIA. The security analysis of SPN ciphers is a research hotspot in cryptanalysis. The application of the subspace trail cryptanalysis to the typical two-dimensional SPN ciphers and typical three-dimensional SPN ciphers can yield the corresponding subspace trails and general properties based on the subspace trails separately. These properties are independent of the secret key and the detailed definitions of the S-box and MixColumns matrix. They can be specifically described as follows: For a typical two-dimensional SPN cipher whose state can be formalized into a two-dimensional array of n×m, the number of different ciphertext pairs belonging to the same coset of the mixed subspace in the ciphertexts obtained by five rounds of encryption of all plaintexts belonging to the same coset of the quasi-diagonal subspace must be a multiple of 2n–1. For a typical three-dimensional SPN cipher whose state can be formalized into a three-dimensional array of l×n×m, the number of different ciphertext pairs belonging to the same coset of the mixed subspace in the ciphertexts obtained by seven rounds of encryption of all plaintexts belonging to the same coset of the quasi-diagonal subspace must be a multiple of 2nl–1. In addition, this study not only proves these properties but also makes experimental verification on the internal permutations of PHOTON and small-scale variants of Rijndael, 3D, and Saturnin algorithms. The experimental results are completely consistent with these properties.
    Available online:  April 19, 2023 , DOI: 10.13328/j.cnki.jos.006797
    Abstract:
    In edge computing scenarios, some tasks to be performed will be offloaded to the edge server, which can reduce the load of mobile devices, enhance the performance of mobile applications, and lower the cost of mobile devices. For delay-sensitive tasks, it is critical to ensure they are completed within the deadlines. However, the limited resource of edge servers results in the fact that when data transmission and task processing from multiple devices are received at the same time, some tasks have to wait in queue before they are scheduled. As a result, the long waiting time may cause time-out failure, which will also make it impossible to balance the performance goals of several devices. Therefore, this study optimizes the task scheduling order on the edge server based on computation offloading. Firstly, the task scheduling is modeled as a long-term optimization issue, and the online learning method based on a combination multi-arm bandit is employed to dynamically adjust the scheduling order of the server. Secondly, the dynamically changing order of task execution will lead to different levels of performance enhancement for task offloading, which will influence the validity of offloading decisions. The deep-Q learning method with perturbed reward is adopted to determine the execution sites for tasks to improve the robustness of offloading strategies. Simulation results show that the proposed strategy can balance multiple user objectives and lower the system cost simultaneously.
    Available online:  April 19, 2023 , DOI: 10.13328/j.cnki.jos.006799
    Abstract:
    Several methods have been proposed to address complex questions of knowledge base question answering (KBQA). However, the complex semantic composition and the possible absence of inference paths lead to the poor reasoning effect of complex questions. To this end, this study proposes the CGL-KBQA method based on the global and local features of knowledge graphs. The method employs the knowledge embedding technique to extract the topological structure and semantic features of knowledge graphs as the global features of the candidate entity node, and models the complex questions as a composite triple classification task based on the entity representation and question composition. At the same time, the core inference paths generated by graphs during the search process are utilized as local features, which are then combined with the semantic similarity of questions to construct different dimensional features of the candidate entities and finally form a hybrid feature scorer. Since the final inference paths may be missing, this study also designs a cluster module with unsupervised multi-clustering methods to select final answer clusters directly according to the feature representation of candidate entities, thereby making reasoning under incomplete KG possible. Experimental results show that the proposed method performs well on two common KBQA datasets, especially when KG is incomplete.
    Available online:  April 13, 2023 , DOI: 10.13328/j.cnki.jos.006798
    Abstract:
    In recent years, with the rapid development of blockchain, the types of cryptocurrencies and anonymous transactions have been increasingly diversified. How to make optimal decisions in the transaction type of cryptocurrency market is the concern of users. The users’ decision-making goal is to minimize transaction costs and maximize privacy while ensuring that transactions are packaged. The cryptocurrency trading market is complex, and cryptocurrency technologies differ greatly from each other. Existing studies focus on the Bitcoin market, and few of them discuss other anonymous currency markets such as Zcash and users’ anonymous demands. Therefore, this study proposes a game-based general cryptocurrency trading market model and explores the trading market and users’ decisions on transaction types and costs by combining the anonymous needs of users and employing game theory. Taking Zcash, the most representative optional cryptocurrency, as an example, it analyzes the trading market in combination with the CoinJoin transaction, simulates the trading process about how users and miners find the optimal strategy, and discusses the impact of block size, discount factors, and the number of users on the trading market and user behaviors. Additionally, the model is simulated in a variety of market types to conduct in-depth discussion of the experimental results. Taking a three-type trading market as an example, in the context of vicious fee competition in the trading market, when plnum = 75, θ= 0.4, st = 100, sz = 400, all users are inclined to choose CoinJoin in the early transaction stage (first 500 rounds). In the middle and late part of the market (15002000 rounds), 97% of users with a privacy sensitivity below 0.7 tend to choose CoinJoin, while 73% of users with a privacy sensitivity above 0.7 tend to choose shielded transactions. CoinJoin transactions and block sizes above 400 can alleviate the vicious competition of transaction fees to some extent. The proposed model can help researchers understand the game of different cryptocurrency trading markets, analyze user trading behavior, and reveal market operation rules.
    Available online:  April 13, 2023 , DOI: 10.13328/j.cnki.jos.006780
    Abstract:
    Core network slicing achieves flexible networking by combining virtualized network functions (VNFs). However, the failure of any VNF due to software and hardware failures will cause an interruption of the slice service. Since network slices share resources, a specific isolation mechanism is required to meet slice robustness demands. Most of the existing availability guarantee mechanisms focus on random VNF failures, and some of them involving external attacks rarely consider special isolation requirements of network slices. To realize slice availability guarantee under isolation mechanisms, this study proposes a method to guarantee network slice availability based on multi-level isolation. First, an availability guarantee model of core network resource awareness is built to meet the isolation requirements with consuming the least number of backup resources. Then, an isolation level assessment model is proposed to evaluate the isolation level of VNFs. Finally, a multi-level isolated backup algorithm (MLIBA) is proposed to solve the availability guarantee problem. In addition, an equivalent backup instance-based calculation method is put forward to address the PP-complete problem of availability calculation for a shared backup. Simulation results show that the proposed availability calculation method has high accuracy, and the introduction of multi-level isolation can double the robustness of slices. The comparison with existing studies shows that under the same isolation constraints and availability targets, the proposed method can reduce resource consumption by 20%–70% and increase the proportion of effective resources by 5%–30%.
    Available online:  April 13, 2023 , DOI: 10.13328/j.cnki.jos.006806
    Abstract:
    By transferring the knowledge of the source domain to the target domain with similar tasks, domain adaptation aims to assist the latter to learn better. When the data label set of the target domain is a subset of the source domain labels, the domain adaptation of this type of scenario is called partial domain adaptation (PDA). Compared with general domain adaptation, although PDA is more general, it is more challenging with few related studies, especially with the lack of systematic reviews. To fill this gap, this study conducts a comprehensive review, analysis and summary of existing PDA methods, and provides an overview and reference of subject research for the relevant community. Firstly, an overview of the PDA background, concepts, and application fields is summarized. Secondly, according to the modeling characteristics, existing PDA methods are divided into two categories: promoting positive transfer and alleviating negative transfer, and this study reviews and analyzes them respectively. Then, the commonly used experimental benchmark datasets are categorized and summarized. Finally, the problems in existing PDA studies are analyzed to point out possible future development directions.
    Available online:  April 13, 2023 , DOI: 10.13328/j.cnki.jos.006803
    Abstract:
    As a new technology that combines reversible data hiding and fragile watermarking, image reversible authentication (RA) can not only realize the fragile authentication of images but also recover the original carrier image without distortion while extracting the authentication code. Thus, it is of great significance to authenticate the originality and integrity of images. Existing reversible authentication methods have low authentication accuracy and cannot effectively protect images with complex textures or some areas with complex textures in the images. To this end, this study proposes a new reversible authentication method. Firstly, images to be authenticated are divided into blocks, and the obtained sub-blocks are classified as differential blocks (DB) and shifting blocks (SB) according to their embedding capacity. Different reversible embedding methods are employed to embed the authentication codes into different types of blocks. It also adopts a hierarchical embedding strategy to increase embedding capacity and improve the authentication effects of each sub-block. On the authentication side, tamper detection and localization can be realized by the authentication code extracted from each sub-block. In addition, this method can be combined with dilation and corrosion in morphology to refine tamper detection marks and further improve the detection accuracy rate. Experimental results show that the proposed method can protect images with smooth texture and complex texture under the same authentication accuracy, and can also realize independent authentication and restoration of almost all sub-blocks, which has widespread applicability.
    Available online:  April 04, 2023 , DOI: 10.13328/j.cnki.jos.006674
    Abstract:
    Code review is an important mechanism in the distributed development of modern software. In code review, providing the context information of the current changes can help code reviewers understand the evolution of a certain source code quickly, thereby enhancing the efficiency and quality of code review. Existing studies have provided some commit history tracking methods and corresponding tools, but these methods cannot further extract auxiliary information relevant to code review from historical data. Therefore, this study proposes a novel code change tracking approach for code review named C2Tracker. Given a fine-grained code change at the method (function) level, C2Tracker can automatically track the history commits which are related to the code changes. Furthermore, the frequent co-occurrence changed code elements and relevant code changes are mined to help reviewers understand the current code changes and make decisions. Experimental verification is conducted on ten well-known open-source projects. The results show that the accuracy of C2Tracker in tracking historical commits, mining frequent co-occurrence code elements, and tracking related code change fragments are 97%, 95%, and 97%, respectively. Compared with existing review methods, C2Tracker greatly improves its code review efficiency and quality in specific cases. Additionally, reviewers acknowledge that it can play a significant role in helping improve the efficiency and quality of most review cases.
    Available online:  April 04, 2023 , DOI: 10.13328/j.cnki.jos.006675
    Abstract:
    Weakly supervised object localization aims to train target locators only by image-level labels instead of accurate location annotations for algorithm training. Some existing methods can only identify the most discriminative region of the target object and are incapable of covering the complete object, or can easily be misled by irrelevant background information, thereby leading to inaccurate object locations. Therefore, this study proposes a weakly supervised object localization algorithm based on attention mechanism and categorical hierarchy. The proposed method extracts a more complete object area by performing mean segmentation on the attention map of the convolutional neural network. In addition, the category hierarchy network is utilized to weaken the attention caused by background areas, which achieves more accurate object location results. Extensive experimental results on multiple public datasets show that the proposed method can yield better localization effects than other weakly supervised object localization methods under various evaluation metrics.
    Available online:  April 04, 2023 , DOI: 10.13328/j.cnki.jos.006676
    Abstract:
    Reference counts are widely employed in large-scale low-level systems including Linux kernel to manage shared resources, and should be consistent with the number of objects referring to resources. Otherwise, bugs of improper update of reference counts may be caused, and resources can never be released or will be released earlier. To detect improper updates of reference counts, available static detection methods have to know the functions which increase reference counts or decrease the counts. However, manually collecting prior knowledge of reference counts is too time-consuming and may be incomplete. Though mining-based methods can reduce the dependency on prior knowledge, it is difficult to effectively detect path-sensitive bugs containing improper updates of reference counts. To this end, this study proposes a method RTDMiner that deeply integrates data mining into static analysis to detect improper updates of reference counts. First, according to the general principles of reference counts, the data mining technique is leveraged to identify functions that raise or reduce reference counts. Then, a path-sensitive static analysis method is employed to detect defective paths that increase reference counts instead of decreasing the counts. To reduce false positives, the study adopts the data mining technique to identify exceptional patterns during detection. The experiment results on the Linux kernel demonstrate that the proposed method can automatically identify functions increasing or decreasing reference counts with the precision of nearly 90%. Moreover, 24 out of the top 50 suspicious bugs detected by RTDMiner have been confirmed to be real bugs by kernel maintainers.
    Available online:  April 04, 2023 , DOI: 10.13328/j.cnki.jos.006669
    Abstract:
    The programmable data plane (PDP), allowing offloading and accelerating network applications, creates revolutionary development opportunities for such applications. Also, it promotes the innovation and evolution of the network by supporting the rapid implementation and deployment of new protocols and services. It has thus been a research hotspot in the field of the network in recent years. With its general computing architecture and rich on-chip resources and extended interfaces, field-programmable gate array (FPGA) provides a variety of implementations of PDP for a wider range of application scenarios. It also offers the possibility to explore more general PDP abstraction. Therefore, FPGA-based PDP (F-PDP) has been widely concerned by the academic and industrial communities. In this study, F-PDP abstraction is described by category. Then, the research progress of key technologies for building network applications with F-PDP is outlined, and programmable network devices based on F-PDP are presented. After that, the application research based on F-PDP in recent years is reviewed in detail from three aspects: improving network performance, building a network measurement framework, and deploying network security applications. Finally, the possible future research trends of F-PDP are discussed.
    Available online:  April 04, 2023 , DOI: 10.13328/j.cnki.jos.006670
    Abstract:
    Bayesian network (BN), as a preliminary framework for representing and inferring uncertain knowledge, is widely used in social network, knowledge graph, medical diagnosis, etc. The centric computing task of BN-based analysis, diagnosis, and decision-support in specific fields includes multiple probabilistic inferences. However, the high time complexity is doomed on the same BN by using the traditional inference methods, due to the several intermediate results of probability calculations that cannot be shared and reused among different inferences. Therefore, to improve the overall efficiency of multiple inferences on the same BN, this study proposes the method of BN embedding and corresponding probabilistic inferences. First, by incorporating the idea of graph embedding, the study proposes a BN embedding method based on the autoencoder and attention mechanism by transforming BN into the point mutual information matrix to preserve the directed a cyclic graph and conditional probability parameters simultaneously. Specifically, each coding layer of the autoencoder generates node embedding by using the correlation between a node and its neighbors (parent and child nodes) to preserve the probabilistic dependencies. Then, the method for probabilistic inferences to measure the joint probability by using the distance between embedding vectors is proposed. Experimental results show that the proposed method outperforms other state-of-the-art methods in efficiency, achieving accurate results of probabilistic inferences.
    Available online:  March 29, 2023 , DOI: 10.13328/j.cnki.jos.006748
    Abstract:
    The authenticated data structure (ADS) solves the problem of untrusted servers in outsourced data storage scenarios as users can verify the correctness and integrity of the query results returned by untrusted servers through the ADS. Nevertheless, the security of data owners is difficult to guarantee, and attackers can tamper with the ADS stored by data owners to impede the integrity and correctness verification of query results. Data owners can store the ADS on the blockchain to solve the above problem by leveraging the immutable nature of the blockchain. However, the existing ADS implementation schemes have high maintenance costs on the blockchain and most of them only support the verifiable query of static data. At present, an efficient ADS tailored to the blockchain is still to be designed. By analyzing the gas consumption mechanism of smart contracts and the gas consumption of the ADS based on the traditional Merkle hash tree (MHT), this study proposes SMT, a new ADS, which achieves efficient and verifiable query of streaming data and has a lower gas consumption on the blockchain. Finally, the study verifies the efficiency of SMT both theoretically and experimentally and proves the security of SMT through security analysis.
    Available online:  March 15, 2023 , DOI: 10.13328/j.cnki.jos.006802
    Abstract:
    Multi-label text classification methods based on deep learning lack multi-granularity learning of text information and the utilization of constraint relations between labels. To solve these problems, this study proposes a multi-label text classification method with enhancing multi-granularity information relations. First, this method embeds text and labels in the same space by joint embedding and employs the BERT pre-trained model to obtain the implicit vector feature representation of text and labels. Then, three multi-granularity information relations enhancing modules including document-level information shallow label attention (DISLA) classification module, word-level information deep label attention (WIDLA) classification module, and label constraint relation matching auxiliary module are constructed. The first two modules carry out multi-granularity learning from shared feature representation: the shallow interactive learning between document-level text information and label information, and the deep interactive learning between word-level text information and label information. The auxiliary module improves the classification performance by learning the relation between labels. Finally, the comparison with current mainstream multi-label text classification algorithms on three representative datasets shows that the proposed method achieves the best performance on main indicators of Micro-F1, Macro-F1, nDCG@k, and P@k.
    Available online:  March 08, 2023 , DOI: 10.13328/j.cnki.jos.006756
    Abstract:
    Density peaks clustering (DPC) is a density-based clustering algorithm that can intuitively determine the number of clusters, identify clusters of any shape, and automatically detect and exclude abnormal points. However, DPC still has some shortcomings: The DPC algorithm only considers the global distribution, and the clustering performance is poor for datasets with large cluster density differences. In addition, the point allocation strategy of DPC is likely to cause a domino effect. Hence, this study proposes a DPC algorithm based on representative points and K-nearest neighbors (KNN), namely, RKNN-DPC. First, the KNN density is constructed, and the representative points are introduced to describe the global distribution of samples and propose a new local density. Then, the KNN information of samples is used to propose a weighted KNN allocation strategy to relieve the domino effect. Finally, a comparative experiment is conducted with five clustering algorithms on artificial datasets and real datasets. The experimental results show that the RKNN-DPC algorithm can more accurately identify cluster centers and obtain better clustering results.
    Available online:  March 02, 2023 , DOI: 10.13328/j.cnki.jos.006759
    Abstract:
    The graphical password mitigates the burden of memorizing traditional textual passwords and simplifies the process of entering passwords, which has been widely applied to user authentication of mobile devices in recent years. Existing graphical password authentication schemes face critical threats. First, graphical passwords are vulnerable to shoulder-surfing attacks, namely that users’ graphical passwords may be leaked if attackers capture their login information through eyes or cameras. More seriously, these schemes are subject to credential leakage attacks. In other words, as the server stores authentication credentials related to the graphical passwords of users to verify their identities, if attackers obtain these credentials, they can perform offline password guessing attacks to retrieve users’ graphical passwords. To solve the above problems, this study proposes a secure graphical password authentication scheme, dubbed GADL. GADL embeds random challenge values into the graphical passwords of users to resist shoulder-surfing attacks, and thus attackers cannot obtain users’ passwords even if they capture their login information. To address credential database leakage of the server, GADL utilizes a deterministic threshold blind signature technique to protect users’ graphical passwords. In this technique, multiple key servers are utilized to assist users in the credential generation, which ensures that attackers cannot perform offline guessing attacks to obtain any knowledge of users’ passwords even if they obtain users’ credentials. The security analysis given in this study proves that GADL is resistant to the aforementioned attacks. In addition, the comprehensive performance evaluation of GADL demonstrates its high performance in terms of computation, storage, and communication overhead and proves that it can be easily deployed on mobile devices.
    Available online:  March 02, 2023 , DOI: 10.13328/j.cnki.jos.006763
    Abstract:
    The emergence of the dynamic link library (DLL) provides great convenience for developers, which improves the interaction between the operating system (OS) and applications. However, the potential security problems of DLL cannot be ignored. Determining how to mine DLL-hijacking vulnerabilities during the running of Windows installers is important to ensure the security of Windows OS. In this paper, the attribute features of numerous installers are collected and extracted, and the double-layer bi-directional long short-term memory (BiLSTM) neural network is applied for machine learning from the perspectives of installers, the invocation modes of DLL from installers, and the DLL file itself. The multi-dimensional features of the vulnerability data set are extracted, and unknown DLL-hijacking vulnerabilities are mined. In experiments, DLL-hijacking vulnerabilities can be effectively detected from Windows installers, and 10 unknown vulnerabilities are discovered and assigned CNVD authorizations. In addition, the effectiveness and integrity of this method are further verified by comparison with other vulnerability analyzers.
    Available online:  March 02, 2023 , DOI: 10.13328/j.cnki.jos.006764
    Abstract:
    Entity resolution widely exists in data tasks such as data quality control, information retrieval, and data integration. Traditional entity resolution methods mainly focus on relational data, while with the development of big data technology, the application requirements of cross-modal data are generated due to the proliferation of different modal data including texts and images. Hence, cross-modal data entity resolution has become a fundamental problem in big data processing and analysis. In this study, the research development of cross-modal entity resolution is reviewed, and its definition and evaluation indexes are introduced. Then, with the construction of inter-modal relationships and the maintenance of intra-modal relationships as the main line, existing research results are surveyed. In addition, widely used methods are tested on different open datasets, and their differences and reasons behind them are analyzed. Finally, the problems in the present research are concluded, on the basis of which the future research trends are given.
    Available online:  March 02, 2023 , DOI: 10.13328/j.cnki.jos.006779
    Abstract:
    With the development of cloud computing and service architectures including software as a service (SaaS) and function as a service (FaaS), data centers, as the service provider, constantly face resource management. The quality of service (QoS) should be guaranteed, and the resource cost should be controlled. Therefore, a method to accurately measure computing power consumption becomes a key research issue for improving resource utilization and keeping the load pressure in the acceptable range. Due to mature virtualization technologies and developing parallel technologies, the traditional estimation metric CPU utilization fails to address interference caused by resource competition, thus leading to accuracy loss. However, the hyper-threading (HT) technology is employed as the main data center processor, which makes it urgent to estimate the computing power of HT processors. To address this estimation challenge, this study proposes the APU method to estimate the computing power consumption for HT processors based on the understanding of the HT running mechanism and thread behavior modeling. Considering that users with different authorities can access different system levels, two implementation schemes are put forward: one based on the hardware support and the other based on the operating system (OS). The proposed method adopts CPU utilization as the input without demands for other dimensions. Additionally, it reduces the development and deployment costs of new monitoring tools without the support of special hardware architectures, thereby making the method universal and easy to apply. Finally, SPEC benchmarks further prove the effectiveness of the method. The estimation errors of the three benchmarks are reduced from 20%, 50%, and 20% to less than 5%. For further proving the applicability, the APU method is leveraged to ByteDance clusters for showing its effects in case studies.
    Available online:  February 22, 2023 , DOI: 10.13328/j.cnki.jos.006757
    Abstract:
    Mixed precision has made many advances in deep learning and precision tuning and optimization. Extensive research shows that mixed precision optimization for stencil computation is challenging. Moreover, the research achievements secured by the polyhedral model in the field of automatic parallelization indicate that the model provides a good mathematical abstraction for loop nesting, on the basis of which loop transformations can be performed. This study designs and implements an automatic mixed precision optimizer for Stencil computation on the basis of polyhedral compilation technology. By performing iterative domain partitioning, data flow analysis, and scheduling tree transformation on the intermediate representation layers, this study implements the source-to-source automatic generation of mixed precision codes for Stencil computation for the first time. The experiments demonstrate that the code after automatic mixed precision optimization can give full play to its parallelism potential and improve the performance of the program by reducing precision redundancy. With high-precision computing as the benchmark, the maximum speedup is 1.76, and the geometric average speedup is 1.15 on the x86 architecture; on the new-generation Sunway architecture, the maximum speedup is 1.64, and the geometric average speedup is 1.20.
    Available online:  February 22, 2023 , DOI: 10.13328/j.cnki.jos.006758
    [Abstract] (773) [HTML] (0) [PDF 7.38 M] (1235)
    Abstract:
    With the increasingly powerful performance of neural network models, they are widely used to solve various computer-related tasks and show excellent capabilities. However, a clear understanding of the operation mechanism of neural network models is lacking. Therefore, this study reviews and summarizes the current research on the interpretability of neural networks. A detailed discussion is rendered on the definition, necessity, classification, and evaluation of research on model interpretability. With the emphasis on the focus of interpretable algorithms, a new classification method for the interpretable algorithms of neural networks is proposed, which provides a novel perspective for the understanding of neural networks. According to the proposed method, this study sorts out the current interpretable methods for convolutional neural networks and comparatively analyzes the characteristics of interpretable algorithms falling within different categories. Moreover, it introduces the evaluation principles and methods of common interpretable algorithms and expounds on the research directions and applications of interpretable neural networks. Finally, the problems confronted in this regard are discussed, and possible solutions to these problems are given.
    Available online:  February 15, 2023 , DOI: 10.13328/j.cnki.jos.006765
    [Abstract] (622) [HTML] (0) [PDF 4.99 M] (4189)
    Abstract:
    In recent years, software construction, operation, and evolution have encountered many new requirements, such as the need for efficient switching or configuration in development and testing environments, application isolation, resource consumption reduction, and higher efficiency of testing and deployment. These requirements pose great challenges to developers in developing and maintaining software. Container technology has the potential of releasing developers from the heavy workload of development and maintenance. Of particular note, Docker, as the de facto industrial standard for containers, has recently become a popular research area in the academic community. To help researchers understand the status and trends of research on Docker containers, this study conducts a systematic literature review by collecting 75 high-level papers in this field. First, quantitative methods are used to investigate the basic status of research on Docker containers, including research quantity, research quality, research areas, and research methods. Second, the first classification framework for research on Docker containers is presented in this study, and the current studies are systematically classified and reviewed from the dimensions of the core, platform, and support. Finally, the development trends of Docker container technology are discussed, and seven future research directions are summarized.
    Available online:  February 15, 2023 , DOI: 10.13328/j.cnki.jos.006750
    Abstract:
    Entity recognition is a key task of information extraction. With the development of information extraction technology, researchers turn the research direction from the recognition of simple entities to the recognition of complex ones. Complex entities usually have no explicit features, and they are more complicated in syntactic constructions and parts of speech, which makes the recognition of complex entities a great challenge. In addition, existing models widely use span-based methods to identify nested entities. As a result, they always have an ambiguity in the detection of entity boundaries, which affects recognition performance. In response to the above challenge and problem, this paper proposes an entity recognition model GIA-2DPE based on prior semantic knowledge and type embedding. The model uses keyword sequences of entity categories as prior semantic knowledge to improve the cognition of entities, utilizes type embedding to capture potential features of different entity types, and then combines prior knowledge with entity-type features through the gated interactive attention mechanism to assist in the recognition of complex entities. Moreover, the model uses 2D probability encoding to predict entity boundaries and combines boundary features and contextual features to enhance accurate boundary detection, thereby improving the performance of nested entity recognition. This study conducts extensive experiments on seven English datasets and two Chinese datasets. The results show that GIA-2DPE outperforms state-of-the-art models and achieves a 10.4% F1 boost compared with the baseline in entity recognition tasks on the ScienceIE dataset.
    Available online:  February 15, 2023 , DOI: 10.13328/j.cnki.jos.006754
    Abstract:
    Tag-aware recommendation algorithms use tagged data to enhance the recommendation models’ understanding of user preferences and item attributes, which attract extensive attention in the field. Most existing methods, however, neglect the diversities of user concerns, item attributes, and tag semantics and interfere with the correlation inference of the three, which affects the recommendation results. Therefore, this paper introduces the disentangled graph collaborative filtering (DGCF) method into the tag-aware recommendation task and proposes a DGCF-based explainable tag-aware recommendation (DETRec) method. It disentangles the perspectives of users, items, and tags to provide explainable recommendation references. Specifically, DETRec utilizes a correlation graph construction module to model the user–item–tag correlations. Then, it employs a neighborhood routing mechanism and a message propagation mechanism to disentangle the nodes to form the sub-graphs of attributes and thereby describe the nodal correlations under different attributes. Finally, it generates recommendation references on the basis of these attribute sub-graphs. This study implements two types of DETRec instantiation: 1) DETRec based on a single graph (DETRec-S), which describes all correlations of user, item, and tag nodes in a single graph, and 2) DETRec based on multiple graphs (DETRec-M), which utilizes three bipartite graphs to describe the user–item, item–tag, and user–tag correlations separately. Extensive experiments on three public datasets demonstrate that the above two types of DETRec instantiation are significantly superior to the baseline model and generate the references corresponding to the recommendation results. Hence, DETRec is an effective explainable tag-aware recommendation algorithm.
    Available online:  February 15, 2023 , DOI: 10.13328/j.cnki.jos.006762
    Abstract:
    The extended Berkeley packet filter (eBPF) mechanism in the Linux kernel can safely load user-provided untrusted programs into the kernel. In the eBPF mechanism, the verifier checks these programs and ensures that they will not cause the kernel to crash or access the kernel address space maliciously. In recent years, the eBPF mechanism has developed rapidly, and its verifier has become more complex as more and more new features are added. This study observes two problems of the complex eBPF verifier. One is the “false negative” problem: There are many bugs in the complex security check logic of the verifier, and attackers can leverage these bugs to design malicious eBPF programs that can pass the verifier to attack the kernel. The other is the “false positive” problem: Since the verifier adopts the static check method, only conservative checks can be performed due to the lack of runtime information. This may cause the originally safe program to fail the check of the verifier and only support limited semantics, which brings difficulties to the development of eBPF programs. Further analysis shows that the static simulation execution check mechanism in the eBPF verifier features massive codes, high complexity, and conservative analysis, which are the main reasons for security vulnerabilities and false positives. Therefore, this study proposes to replace the static simulation execution check mechanism in the eBPF verifier with a lightweight dynamic check method. The bugs and conservative checks that originally existed in the eBPF verifier due to simulation execution no longer exist, and hence, the above-mentioned “false negative” and “false positive” problems can be eliminated. Specifically, the eBPF program is run in a kernel sandbox, which dynamically checks the memory access of the program in the runtime to prevent it from accessing the kernel memory illegally. For efficient implementation of a lightweight kernel sandbox, the Intel protection keys for supervisor (PKS), a new hardware feature, is used to perform a zero-overhead memory access check, and an efficient interaction method between the kernel and the eBPF program in the sandbox is presented. The evaluation results show that this study can eliminate memory security vulnerabilities of the eBPF verifier (this type of vulnerability has accounted for more than 60% of the total vulnerabilities of the eBPF verifier since 2020). Moreover, in the scenario of high-throughput network packet processing, the performance overhead brought by the lightweight kernel sandbox is lower than 3%.
    Available online:  January 18, 2023 , DOI: 10.13328/j.cnki.jos.006663
    Abstract:
    Distributed hash table (DHT) is widely used in distributed storage because of its efficient data addressing. Nevertheless, traditional DHT-based storage has to store data in specified nodes to achieve efficient data addressing, which restricts the application scope of the DHT technology severely. Taking heterogeneous storage networks for example, the storage space, bandwidth, and stability of nodes vary greatly. Choosing appropriate data storage nodes according to data features and the performance differences among the nodes can greatly improve the data access efficiency. However, the tight coupling between data and storage location disqualifies the traditional DHT-based storage from being applied to heterogeneous storage networks. Therefore, this study proposes a vRoute algorithm to decouple the data identifier from storage location in DHT-based storage. By building a distributed data index based on Bloom Filter, the vRoute algorithm allows data to be stored in any node of the network without reducing the efficiency of data addressing. It is implemented by extending the Kademlia algorithm, and its validity is verified theoretically. Finally, the simulation experiments show that vRoute achieves a data addressing efficiency close to that of the traditional DHT algorithm with low storage and network overhead.
    Available online:  January 18, 2023 , DOI: 10.13328/j.cnki.jos.006649
    Abstract:
    In a published study, the problem of using Turing reduction to solve ε-NN is studied. In other words, given a query point q, a point set P, and an approximate factor ε, the purpose is to return the approximate nearest neighbor of q in P with an approximation ratio of not more than 1+ε. Moreover, a Turing reduction algorithm with O(logn) query time complexity is proposed, where the query time is the number of times that the oracle is invoked. The comparison indicates that the O(logn) query time is the lowest compared to that of all the existing algorithms. However, the disadvantage of the proposed algorithm is that there is a factor of O((d/ε)d) in the preprocessing time complexity and space complexity. When the number of dimensions d is high, or the approximation factor ε is small, the factor would become unacceptable. Therefore, this study revises the reduction algorithm and analyzes the expected time complexity and space complexity of the algorithm when the input point set follows the Poisson point process. As a result, the expected preprocessing time complexity is reduced to O(nlogn), and the expected space complexity is reduced to O(nlogn), while the expected query time complexity remains O(logn). In this sense, the future work raised in the published study is completed.
    Available online:  January 13, 2023 , DOI: 10.13328/j.cnki.jos.006818
    Abstract:
    With the development of Internet of Things (IoT) technology, IoT devices are widely applied in many areas of production and life. However, IoT devices also bring severe challenges to equipment asset management and security management. Firstly, Due to the diversity of IoT device types and access modes, it is often difficult for network administrators to know the IoT device types and operating status in the network. Secondly, IoT devices are becoming the focus of cyber attacks due to their limited computing and storage resources, which makes it difficult to deploy traditional defense measures. Therefore, it is important to acknowledge the IoT devices in the network through device identification and detect anomalies based on the device identification results, so as to ensure the normal operation of IoT devices. In recent years, academia has carried out a lot of research on the above issues. This study systematically reviews the work related to IoT device identification and anomaly detection. In terms of device identification, existing research can be divided into passive identification methods and active identification methods according to whether data packets are sent to the network. The passive identification methods are further investigated according to the identification method, identification granularity, and application scenarios. The study also investigates the active identification methods according to the identification method, identification granularity, and detection granularity. In terms of anomaly detection, the existing work can be divided into detection methods based on machine learning algorithms and rule-matching methods based on behavioral norms. On this basis, challenges in IoT device identification and anomaly detection are summarized, and the future development direction is proposed.
    Available online:  January 04, 2023 , DOI: 10.13328/j.cnki.jos.006665
    Abstract:
    As the scale of business data increases, distributed online analytical processing (OLAP) is widely performed in business intelligence (BI), enterprise resource planning (ERP), user behavior analysis, and other application scenarios to support large-scale data analysis. Moreover, distributed OLAP overcomes the limitations of single-machine storage and stores data in memory to improve the performance of OLAP. However, after the in-memory distributed OLAP eliminates disk input/output (I/O), the join operation becomes one of its new performance bottlenecks. As a common practice in OLAP, the join operation involves a huge amount of data accessing and computation operations. By analyzing existing methods for the join operation, this study presents a graph structure indexing method that can accelerate the join operation and a new join method called LinkJoin based on it. Graph structure indexing stores the in-memory position of data in the form of a graph structure according to the join relationship specified by users. The join method based on graph structure indexing reduces the amount of data accessing and computation operations with a low complexity equivalent to that of the hash join. This study expands the state-of-the-art open-source in-memory OLAP system called MonetDB from a single-machine system to a distributed one and designs and implements a join method based on graph structure indexing on it. A series of designs and optimizations are also conducted in the aspects of graph indexing structure, columnar storage, and distributed execution engine to improve the distributed OLAP performance of the system. The test results show that in the TPC-H benchmark tests, the join operation based on graph structure indexing improves the performance on queries with join operations by 1.64 times on average and 4.1 times at most. For the join operation part of these queries, it enhances the performance by 9.8–22.1 times.
    Available online:  December 30, 2022 , DOI: 10.13328/j.cnki.jos.006777
    Abstract:
    Smoothed particle hydrodynamics (SPH) is one key technology for fluid simulation. With the growing demand for applications of SPH fluid simulation technology in production practices, many relevant studies have emerged in recent years, which improve the visual authenticity, efficiency, and stability simulated by physical properties including fluid incompressibility, viscosity, and surface tension. Additionally, some researchers focus on high-quality simulation in complex scenarios and a unified simulation framework with multiple scenarios and materials, thereby enhancing the application efficiency of SPH fluid simulation technology. This study discusses and summarizes related research on SPH fluid simulation technology from the above aspects, and proposes a prospect for the technology.
    Available online:  December 30, 2022 , DOI: 10.13328/j.cnki.jos.006724
    [Abstract] (381) [HTML] (0) [PDF 5.81 M] (1085)
    Abstract:
    The task of knowledge tracing is to trace the changes in students’ knowledge state and predict their future performance in learning according to their historical learning records. In recent years, knowledge-tracing models based on attention mechanisms are markedly superior to traditional knowledge-tracing models in both flexibility and prediction performance. Only taking into account exercises involving single concept, most of the existing deep models cannot directly deal with exercises involving multiple concepts, which are, nevertheless, vast in intelligent education systems. In addition, how to improve interpretability is one of the key challenges facing deep knowledge tracing models. To solve the above problems, this study proposes a deep knowledge tracing model based on the embedding off used multiple concepts that considers the relationships among the concepts in exercises involving multiple concepts. Furthermore, the study puts forward two novel embedding methods for multiple concepts and combines educational psychology models with forgetting factors to improve prediction performance and interpretability. Experiments reveal the superiority of the proposed model over existing models in prediction performance on large-scale real datasets and verify the effectiveness of each module of the proposed model.
    Available online:  December 30, 2022 , DOI: 10.13328/j.cnki.jos.006800
    [Abstract] (456) [HTML] (0) [PDF 7.89 M] (1090)
    Abstract:
    With the rapid growth and further application of deep learning (DL), the scale of DL training continues to expand, and memory insufficiency has become one of the major bottlenecks threatening DL availability. Memory swapping mechanism is the key mechanism to alleviate the memory problem of DL training. This mechanism leverages the “time-varying” memory requirement of DL training and moves the data between specific computing accelerating device memory and external storage according to demands. The operation of DL training tasks can be ensured by replacing an accumulated memory requirement with an instant one. This study surveys the memory swapping mechanism for DL training from the aspect of time-varying memory requirements. Key studies of an operator feature-based memory swapping-out mechanism, a data dependency based swapping-in mechanism, and efficiency-driven joint swapping-in and swapping-out decisions are summarized. Finally, the development prospect of this technology is pointed out.
    Available online:  December 30, 2022 , DOI: 10.13328/j.cnki.jos.006804
    [Abstract] (733) [HTML] (0) [PDF 5.38 M] (1169)
    Abstract:
    Stochastic configuration network (SCN), as an emerging incremental neural network model, is different from other randomized neural network methods. It can configure the parameters of hidden layer nodes through supervision mechanisms, thereby ensuring the fast convergence performance of SCN. Due to the advantages of high learning efficiency, low human intervention, and strong generalization ability, SCN has attracted a large number of national and international scholars and developed rapidly since it was proposed in 2017. In this study, SCN research is summarized from the aspects of basic theories, typical algorithm variants, application fields, and future research directions of SCN. Firstly, the algorithm principles, universal approximation capacity, and advantages of SCN are analyzed theoretically. Secondly, typical variants of SCN are studied, such as DeepSCN, 2DSCN, Robust SCN, Ensemble SCN, Distributed SCN, Parallel SCN, and Regularized SCN. Then, the applications of SCN in different fields, including hardware implementation, computer vision, medical data analysis, fault detection and diagnosis, and system modeling and prediction are introduced. Finally, the development potential of SCN in convolutional neural network architectures, semi-supervised learning, unsupervised learning, multi-view learning, fuzzy neural network, and recurrent neural network is pointed out.
    Available online:  December 28, 2022 , DOI: 10.13328/j.cnki.jos.006662
    Abstract:
    Recent research studies on social recommendation have focused on the joint modeling of the explicit and implicit relations in social networks and overlooked the special phenomenon that high-order implicit relations are not equally important to each user. The importance of high-order implicit relations to users with plenty of neighbors differs greatly from that to users with few neighbors. In addition, due to the randomness of social relation construction, explicit relations are not always available. This study proposes a novel adaptive high-order implicit relations modeling (AHIRM) method, and the model consists of three components. Specifically, unreliable relations are filtered, and potential reliable relations are identified, thereby mitigating the adverse effects of unreliable relations and alleviating the data sparsity issue. Then, an adaptive random walk algorithm is designed to capture neighbors at different orders for users according to normalized node centrality, construct high-order implicit relations among the users, and ultimately reconstruct the social network. Finally, the graph convolutional network (GCN) is employed to aggregate information about neighbor nodes. User embeddings are thereby updated to model the high-order implicit relations and further alleviate the data sparsity issue. The influence of social structure and personal preference are both considered during modeling, and the process of social influence propagation is simulated and retained. Comparative verification of the proposed model and the existing algorithms are conducted on the LastFM, Douban, and Gowalla datasets, and the results verify the effectiveness and rationality of the proposed AHIRM model.
    Available online:  December 08, 2022 , DOI: 10.13328/j.cnki.jos.006654
    Abstract:
    Community is an important attribute of information networks. Community search, as an important content of information network analysis, aims to find a set of nodes that meet the conditions specified by the user. As heterogeneous information networks contain more comprehensive and richer structural and semantic information, community search in such networks has received extensive attention in recent years. However, the existing community search methods for heterogeneous information networks cannot be directly applied when the search conditions are complex. For this reason, this study defines community search under complex conditions and proposes search algorithms considering asymmetric meta-paths, constrained meta-paths, and prohibited node constraints. These three algorithms respectively use the meta-path completion strategy, the strategy of adjusting batch search with labeling, and the way of dividing complex search conditions to search communities. Moreover, two optimization algorithms respectively based on the pruning strategy and the approximate strategy are designed to improve the efficiency of the search algorithm with prohibited node constraints. A large number of experiments are performed on real datasets, and the experimental results verify the effectiveness and efficiency of the proposed algorithms.
    Available online:  October 26, 2022 , DOI: 10.13328/j.cnki.jos.006753
    [Abstract] (829) [HTML] (0) [PDF 1.88 M] (1602)
    Abstract:
    As a complement and extension of the terrestrial network, the satellite network contributes to the acceleration of bridging the digital divide between different regions and can expand the coverage and service range of the terrestrial network. However, the satellite network features highly dynamic topology, long transmission delay, and limited on-board computing and storage capacity. Hence, various technical challenges, including routing scalability and transmission stability, are encountered in the organic integration of the satellite network and the terrestrial network and the construction of a global space-ground integrated network (SGIN). Considering the research challenges of SGIN, this paper describes the international and domestic research progress of SGIN in terms of network architecture, routing, transmission, multicast-based content delivery, etc., and then discusses the research trends.
    Available online:  October 26, 2022 , DOI: 10.13328/j.cnki.jos.006755
    Abstract:
    The critical reliability and availability of distributed systems are threatened by crash recovery bugs caused by incorrect crash recovery mechanisms and their implementations. The detection of crash recovery bugs, however, can be extremely challenging since these bugs only manifest themselves when a node crashes under special timing conditions. This study presents a novel approach Deminer to automatically detect crash recovery bugs in distributed systems. Observations in the large-scale distributed systems show that node crashes that interrupt the execution of related I/O write operations, which store a piece of data (i.e., common data) in different places, e.g., different storage paths or nodes, are more likely to trigger crash recovery bugs. Therefore, Deminer detects crash recovery bugs by automatically identifying and injecting such error-prone node crashes under the usage guidance of common data. Deminer first tracks the usage of critical data in a correct run. Then, it identifies I/O write operation pairs that use the common data and predicts error-prone injection points of a node crash on the basis of the execution trace. Finally, Deminer tests the predicted injection points of the node crash and checks failure symptoms to expose and confirm crash recovery bugs. A prototype of Deminer is implemented and evaluated on the latest versions of four widely used distributed systems, i.e., ZooKeeper, HBase, YARN, and HDFS. The experimental results show that Deminer is effective in finding crash recovery bugs. Deminer has detected six crash recovery bugs.
    Available online:  October 26, 2022 , DOI: 10.13328/j.cnki.jos.006752
    Abstract:
    Most traditional information hiding methods embed secret data by modifying cover data, which inevitably leaves traces of modification in cover data, and hence, it is difficult to resist the detection of the existing steganalysis algorithms. Consequently, the technique of coverless information hiding emerges, which hides secret data without modifying cover data. To improve the hiding capacity and robustness of coverless information hiding, this study proposes a constructive data hiding method based on texture synthesis and recognition with image style transfer. Firstly, natural images and texture images of different categories are used to construct the content image database and the textural style image database, respectively. A mapping dictionary of binary codes is established according to the categories of natural images in the content image database. Secondly, the labeled textural image database should be constructed and input into the convolutional neural network as a training dataset, and the texture image recognition model can be obtained by iterative training. In this way, the secret data can be extracted from stego images at the receiving end. During secret data hiding, natural images are selected from the content image database according to to-be-embedded secret data fragments, which are synthesized to form a stego mosaic image. Then, a texture image is randomly selected from the textural style image database, and the stego texture image can be generated by the selected texture image and the stego mosaic image with the strategy of style transfer to achieve secret data hiding. During secret data extraction, the obtained texture image recognition model can accurately identify the original categories of stego texture images corresponding to natural images, and secret data can be finally extracted by reference to the mapping dictionary. The experimental results demonstrate that the proposed method can achieve the stego texture image with a satisfactory visual effect and a high hiding capacity, and it illustrates strong robustness to attacks such as JPEG compression and Gaussian noise.
    Available online:  October 26, 2022 , DOI: 10.13328/j.cnki.jos.006749
    [Abstract] (654) [HTML] (0) [PDF 4.17 M] (1246)
    Abstract:
    Code change is a kind of key behavior in software evolution, and its quality has a large impact on software quality. Modeling and representing code changes is the basis of many software engineering tasks, such as just-in-time defect prediction and recovery of software product traceability. The representation learning technologies for code changes have attracted extensive attention and have been applied to diverse applications in recent years. This type of technology targets at learning to represent the semantic information in code changes as low-dimensional dense real-valued vectors, namely, learning the distributed representation of code changes. Compared with the conventional methods of manually designing code change features, such technologies offers the advantages of automatic learning, end-to-end training, and accurate representation. However, this field is still faced with some challenges, such as great difficulties in utilizing structural information and the absence of benchmark datasets. This study surveys and summarizes the recent progress of studies and applications of representation learning technologies for code changes, and it mainly consists of the following four parts. (1) The study presents the general framework of representation learning of code changes and its application. (2) Subsequently, it reviews the currently available representation learning technologies for code changes and summarizes their respective advantages and disadvantages. (3) Then, the downstream applications of such technologies are summarized and classified. (4) Finally, this study discusses the challenges and potential opportunities ahead of representation learning technologies for code changes and suggests the directions for the future development of this type of technology.
    Available online:  September 20, 2022 , DOI: 10.13328/j.cnki.jos.006664
    Abstract:
    Smart contracts running on the blockchain can hardly be modified after deployment, and their call and execution rely on a consensus procedure. Consequently, existing debugging methods that require the modification of the smart contract code or the interruption of execution cannot be directly applied to smart contracts. Since the running of a smart contract is composed of ordered execution of blockchain transactions, tracing the execution of the transactions is an effective approach to render the smart contract more debuggable. The major goal of tracing blockchain transaction execution is to unveil how a blockchain transaction produces such a result in execution. The execution of a blockchain transaction relies on the internal state of the blockchain, and this state is determined by the execution results of previous transactions, which results in transitive dependencies. Such dependencies and the characteristics of the execution environment the blockchain provides bring challenges to tracing. The tracing of blockchain transaction execution is mainly faced with three challenges: how to obtain enough information for tracing from the production environment in which the smart contract is deployed, how to obtain the dependencies among the blockchain transactions, and how to ensure the consistency between the result of tracing and the real execution online. This study proposes a tracing method for blockchain transaction execution based on recording and replay. By building a recording and replay mechanism in the contract container, the proposed method enables the recording of state reading and writing operations during transaction execution without modifying the contract code and interrupting the running of the smart contract. A transaction dependency analysis method based on state reading and writing is proposed to support the retracing of previous transactions linked by dependencies on demand. Moreover, a verification mechanism for reading and writing operation recording is designed to ensure that the consistency between the replaying execution and the real online execution can be verified. The tracing method can trace the execution of the blockchain transaction that calls the smart contract, which can be used in debugging of smart contracts. When loss is caused by the failure of smart contracts, the tracing result can be used as evidence. Experiments are conducted for a performance comparison between storing recorded reading and writing operations on chain and off chain. The advantages and effectiveness of the proposed method in tracing blockchain transaction execution are revealed by a case study.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006666
    [Abstract] (883) [HTML] (0) [PDF 6.45 M] (2131)
    Abstract:
    During software development, developers use third-party libraries extensively to achieve code reuse. Due to the dependencies among different third-party libraries, the incompatibilities among them lead to errors during the installing, loading, or calling of those libraries and ultimately result in system anomalies. Such a problem is called a dependency conflict (DC, also referred to as conflict dependency or CD) issue of third-party libraries. The root cause of such issues is that the third-party libraries loaded fail to cover the required features (e.g., methods) cited by the software. DC issues often occur during the download and install, project compiling, and running of third-party libraries and are difficult to locate. Fixing DC issues requires developers to know the differences among the versions of the third-party libraries they use accurately, and the complex dependencies among the third-party libraries increase the difficulty in this work. To identify the DC issues in the software before its running and to deal with the system anomalies caused by those issues during running, researchers around the world have conducted various studies on such issues. This study presents a systematic review of this research topic from four aspects, including the empirical analysis of third-party library usage, the cause analysis of DC issues, and the detection methods and common fixing ways for such issues. Finally, the potential research opportunities in this field are discussed.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006671
    [Abstract] (1189) [HTML] (0) [PDF 5.32 M] (2022)
    Abstract:
    Inverse reinforcement learning (IRL), also known as inverse optimal control (IOC), is an important research method of reinforcement learning and imitation learning. IRL solves a reward function from expert samples, and the optimal strategy is then solved to imitate expert strategies. In recent years, fruitful achievements have been yielded by IRL in imitation learning, with widespread application in vehicle navigation, path recommendation, and robotic optimal control. First, this study presents the theoretical basis of IRL. Then, from the perspective of reward function construction methods, IRL algorithms based on linear and non-linear reward functions are analyzed. The algorithms include maximum marginal IRL, maximum entropy IRL, maximum entropy deep IRL, and generative adversarial imitation learning. In addition, frontier research directions of IRL are reviewed to compare and analyze relevant representative algorithms containing IRL with incomplete expert demonstrations, multi-agent IRL, IRL with sub-optimal expert demonstrations, and guiding IRL. Finally, the primary challenges of IRL and future developments in its theoretical and application significance are summarized.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006672
    Abstract:
    The basic concept of the multiobjective optimization evolutionary algorithm based on decomposition (MOEA/D) is to transform a multiobjective optimization problem into a set of subproblems (single-objective or multiobjective) for optimization solutions. Since MOEA/D was proposed in 2007, it has attracted extensive attention from Chinese and international scholars and become one of the most representative multiobjective optimization evolutionary algorithms. This study summarizes the research progress on MOEA/D in the past thirteen years. The advances include algorithm improvements of MOEA/D, research of MOEA/D on many-objective optimization and constraint optimization,and application of MOEA/D in some practical issues. Then, several representative improved algorithms of MOEA/D are compared through experiments. Finally, the study presents several potential research topics of MOEA/D in the future.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006677
    [Abstract] (830) [HTML] (0) [PDF 7.50 M] (1411)
    Abstract:
    Distributed ledger (DL), as a distributed data management architecture, maintains data records (the ledgers) across distributed nodes based on consensus mechanisms and protocols. It can comprehensively record all information of data ownership, transmission, and trading chains in distributed ledgers. Additionally, data will be not tampered and denied throughout the life cycle of data production and transactions, providing an endorsement for data rights confirmation, protection, and audit. Blockchain is a typical implementation of DL systems. With the emerging digital economy applications including digital currency and data asset trading, DL technologies receive increasingly widespread attention. However, system performance is one of the key technical bottlenecks for large-scale application of DL systems, and ledger performance optimization has become a focus of the academia and industry. The study investigates the methods, technologies, and typical solutions of DL performance optimization from four perspectives of system architecture, ledger data structure, consensus mechanism, and message communication.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006679
    [Abstract] (898) [HTML] (0) [PDF 5.88 M] (1455)
    Abstract:
    Deep learning (DL) systems have powerful learning and reasoning capabilities and are widely employed in many fields including unmanned vehicles, speech recognition, intelligent robotics, etc. Due to the dataset limit and dependence on manually labeled data, DL systems are prone to unexpected behaviors. Accordingly, the quality of DL systems has received widespread attention in recent years, especially in safety-critical fields. Fuzz testing with strong fault-detecting ability is utilized to test DL systems, which becomes a research hotspot. This study summarizes existing fuzz testing for DL systems in the aspects of test case generation (including seed queue construction, seed selection, and seed mutation), test result determination, and coverage analysis. Additionally, commonly used datasets and metrics are introduced. Finally, the study prospects for the future development of this field.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006680
    Abstract:
    In recent years, deep learning technology has made remarkable progress in many computer vision tasks. More and more researchers have tried to apply it to medical image processing, such as the segmentation of an atomical structures in high-throughput medical images (CT, MRI), which can improve the efficiency of image reading for doctors. Deep learning models for training medical image processing need a large amount of labeled data, and the data from a separate medical institution can not meet this requirement. Moreover, due to the difference in medical equipment and acquisition protocols, the data from different medical institutions are largely heterogeneous. This results in the difficulty in obtaining reliable results on the data of a certain medical institution with the model trained by data from other medical institutions. In addition, the distribution of different medical data in patients’ disease stages is uneven, thereby reducing the reliability of the model. Technologies including domain adaptation and multi-site learning emerge to reduce the impact of data heterogeneity and improve the generalization ability of the model. As a research hotspot in transfer learning, domain adaptation is intended to transfer knowledge learned from the source domain to data of the unlabeled target domain. Multi-site learning and federated learning with non-independent and identically distributed data aim to improve the robustness of the model by learning a common representation on multiple datasets. This study investigates, analyzes, and summarizes domain adaptation, multi-site learning, and federated learning with non-independent and identically distributed datasets in recent years, providing references for related research.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006681
    [Abstract] (410) [HTML] (0) [PDF 4.87 M] (1184)
    Abstract:
    Heterogeneous information network is a representation of heterogeneous data. How to integrate complex semantic information of heterogeneous data is one of the challenges faced by recommendation systems. A high-order embedded learning framework for heterogeneous information networks based on weak ties featured by semantic information and information transmission abilities is constructed. The framework includes three modules of initial information embedding, high-order information embedding aggregation, and recommendation prediction. The initial information embedding module first adopts the best trust path selection algorithm to avoid information loss caused by sampling a fixed number of neighbors in a full-relational heterogeneous information network. Then the newly defined importance measure factors of multi-task shared characteristics based on multi-head attention are adopted to filter out the semantic information of each node. Additionally, combined with the interactive structure, the network nodes are effectively characterized. The high-order information embedding aggregation module realizes high-order information expression by integrating weak ties and good knowledge representation ability of network embedding. The hierarchical propagation mechanism of heterogeneous information networks is utilized to aggregate the characteristics of sampled nodes into the nodes to be predicted. The recommendation prediction module employs the influence recommendation method of high-order information to complete the recommendation. The framework is characterized by rich embedded nodes, fusion of shared attributes, and implicit interactive information. Finally, the experiments have verified that UI-HEHo can effectively improve the accuracy of rating prediction, as well as the pertinence, novelty and diversity of recommendation generation. Especially in application scenarios with sparse data, UI-HEHo yields good recommendation effects.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006684
    [Abstract] (732) [HTML] (0) [PDF 6.40 M] (1346)
    Abstract:
    Distributed systems play an important role in computing environments. Consensus protocols are employed to guarantee consistency among nodes. Design errors in the consensus protocols might cause failure in system operation and bring catastrophic consequences to humans and the environment. Therefore, it is important to prove the correctness of consensus protocols. Formal verification can strictly prove the correctness of target properties in designed models, which is suitable for verifying consensus protocols. However, the expanding scale of distributed systems results in more complicated issues and challenges to formal verification of consensus protocols. The method for formal verification of the consensus protocol design and increase in the verification scale are significant research issues in the formal verification of consensus protocols. This study investigates the current research on the employment of formal methods to verify consensus protocols, summarizes the key modeling methods and verification technologies, and proposes future research directions in this field.
    Available online:  October 18, 2017 , DOI:
    [Abstract] (2795) [HTML] (0) [PDF 525.21 K] (4623)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,315-325页. 原文链接如下:https://doi.org/10.1145/3106237.3106242, 读者如需引用该文请标引原文出处。
    Available online:  October 18, 2017 , DOI:
    [Abstract] (2768) [HTML] (0) [PDF 352.38 K] (5757)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,303-314页. 原文链接如下:https://doi.org/10.1145/3106237.3106239, 读者如需引用该文请标引原文出处。
    Available online:  September 11, 2017 , DOI:
    [Abstract] (3244) [HTML] (0) [PDF 276.42 K] (2808)
    Abstract:
    GitHub, a popular social-software-development platform, has fostered a variety of software ecosystems where projects depend on one another and practitioners interact with each other. Projects within an ecosystem often have complex inter-dependencies that impose new challenges in bug reporting and fixing. In this paper, we conduct an empirical study on cross-project correlated bugs, i.e., causally related bugs reported to different projects, focusing on two aspects: 1) how developers track the root causes across projects; and 2) how the downstream developers coordinate to deal with upstream bugs. Through manual inspection of bug reports collected from the scientific Python ecosystem and an online survey with developers, this study reveals the common practices of developers and the various factors in fixing cross-project bugs. These findings provide implications for future software bug analysis in the scope of ecosystem, as well as shed light on the requirements of issue trackers for such bugs.
    Available online:  June 21, 2017 , DOI:
    [Abstract] (3289) [HTML] (0) [PDF 169.43 K] (2903)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在IEEE Transactions on Software Engineering 2017 已录用待发表. 原文链接如下:http://ieeexplore.ieee.org/document/7792694, 读者如需引用该文请标引原文出处。
    Available online:  June 13, 2017 , DOI:
    [Abstract] (4493) [HTML] (0) [PDF 174.91 K] (3336)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 39th International Conference on Software Engineering, Pages 27-37, Buenos Aires, Argentina — May 20 - 28, 2017, IEEE Press Piscataway, NJ, USA ?2017, ISBN: 978-1-5386-3868-2 原文链接如下:http://dl.acm.org/citation.cfm?id=3097373, 读者如需引用该文请标引原文出处。
    Available online:  January 25, 2017 , DOI:
    [Abstract] (3368) [HTML] (0) [PDF 254.98 K] (2678)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 871-882. DOI: https://doi.org/10.1145/2950290.2950364 原文链接如下:http://dl.acm.org/citation.cfm?id=2950364, 读者如需引用该文请标引原文出处。
    Available online:  January 18, 2017 , DOI:
    [Abstract] (3849) [HTML] (0) [PDF 472.29 K] (2663)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Pages 133—143, Seattle WA, USA, November 2016. 原文链接如下:http://dl.acm.org/citation.cfm?id=2950327, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017 , DOI:
    [Abstract] (3601) [HTML] (0) [PDF 293.93 K] (2469)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE'16), 810 – 821, November 13 - 18, 2016. 原文链接如下:https://doi.org/10.1145/2950290.2950310, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017 , DOI:
    [Abstract] (3950) [HTML] (0) [PDF 244.61 K] (2805)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE 2016, 原文链接如下:http://dl.acm.org/citation.cfm?doid=2950290.2950313, 读者如需引用该文请标引原文出处。
    Available online:  December 12, 2016 , DOI:
    [Abstract] (3483) [HTML] (0) [PDF 358.69 K] (2816)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE'16会议上Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 原文链接如下:http://dl.acm.org/citation.cfm?id=2950340, 读者如需引用该文请标引原文出处。
    Available online:  September 30, 2016 , DOI:
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在ASE2016会议上http://ase2016.org/ 原文链接如下:http://dl.acm.org/citation.cfm?id=2970366 读者如需引用该文请标引原文出处。
    Available online:  September 09, 2016 , DOI:
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 俊杰的文章发表在ASE2016会议上,http://ase2016.org/。 原文链接如下:http://dl.acm.org/citation.cfm?doid=2970276.2970300 请读者标引时请引注原文出处。
    Available online:  September 07, 2016 , DOI:
    Abstract:
    CCF 软件工程专业委员会白晓颖教授(清华大学)推荐。 原文发表在 ASE 2016 Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering。 全文链接:http://dx.doi.org/10.1145/2970276.2970307。 重要提示:读者如引用该文时请标注原文出处。
    Available online:  August 29, 2016 , DOI:
    Abstract:
    CCF软件工程专业委员会白晓颖教授(清华大学)推荐。 该论文发表在ACM Transactions on Software Engineering and Methodology (TOSEM, Vol. 25, No. 2, Article 13, May 2016),被ICSE 2016主会邀请为“Journal first”报告, 全文参见http://dl.acm.org/citation.cfm?id=2876443。 论文作者包括北京大学的周明辉,马秀娟,张路和梅宏,以及田纳西大学的Audris Mockus。 重要提示:读者如引用该文时请标注原文出处。
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291, DOI:
    [Abstract] (36714) [HTML] (0) [PDF 832.28 K] (78037)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2010,21(3):427-437, DOI:
    [Abstract] (32533) [HTML] (0) [PDF 308.76 K] (37135)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2011,22(1):71-83, DOI:10.3724/SP.J.1001.2011.03958
    [Abstract] (29412) [HTML] (0) [PDF 781.42 K] (52868)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2016,27(1):45-71, DOI:10.13328/j.cnki.jos.004914
    [Abstract] (28566) [HTML] (1732) [PDF 880.96 K] (29120)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2009,20(5):1337-1348, DOI:
    [Abstract] (27654) [HTML] (0) [PDF 1.06 M] (43343)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2008,19(1):48-61, DOI:
    [Abstract] (27579) [HTML] (0) [PDF 671.39 K] (59768)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2009,20(2):271-289, DOI:
    [Abstract] (26617) [HTML] (0) [PDF 675.56 K] (41496)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2005,16(1):1-7, DOI:
    [Abstract] (21777) [HTML] (0) [PDF 614.61 K] (19470)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2004,15(3):428-442, DOI:
    [Abstract] (20340) [HTML] (0) [PDF 1009.57 K] (15624)
    Abstract:
    With the rapid development of e-business, web applications based on the Web are developed from localization to globalization, from B2C(business-to-customer) to B2B(business-to-business), from centralized fashion to decentralized fashion. Web service is a new application model for decentralized computing, and it is also an effective mechanism for the data and service integration on the web. Thus, web service has become a solution to e-business. It is important and necessary to carry out the research on the new architecture of web services, on the combinations with other good techniques, and on the integration of services. In this paper, a survey presents on various aspects of the research of web services from the basic concepts to the principal research problems and the underlying techniques, including data integration in web services, web service composition, semantic web service, web service discovery, web service security, the solution to web services in the P2P (Peer-to-Peer) computing environment, and the grid service, etc. This paper also presents a summary of the current art of the state of these techniques, a discussion on the future research topics, and the challenges of the web services.
    2010,21(8):1834-1848, DOI:
    [Abstract] (20104) [HTML] (0) [PDF 682.96 K] (53931)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2005,16(5):857-868, DOI:
    [Abstract] (19607) [HTML] (0) [PDF 489.65 K] (28762)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2009,20(1):54-66, DOI:
    [Abstract] (19238) [HTML] (0) [PDF 1.41 M] (48537)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2012,23(4):962-986, DOI:10.3724/SP.J.1001.2012.04175
    [Abstract] (18386) [HTML] (0) [PDF 2.09 M] (29815)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):32-45, DOI:10.3724/SP.J.1001.2012.04091
    [Abstract] (18312) [HTML] (0) [PDF 408.86 K] (29207)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2009,20(3):524-545, DOI:
    [Abstract] (17163) [HTML] (0) [PDF 1.09 M] (20969)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2009,20(1):124-137, DOI:
    [Abstract] (16642) [HTML] (0) [PDF 1.06 M] (20976)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2009,20(11):2965-2976, DOI:
    [Abstract] (16202) [HTML] (0) [PDF 442.42 K] (14186)
    Abstract:
    This paper studies uncertain graph data mining and especially investigates the problem of mining frequent subgraph patterns from uncertain graph data. A data model is introduced for representing uncertainties in graphs, and an expected support is employed to evaluate the significance of subgraph patterns. By using the apriori property of expected support, a depth-first search-based mining algorithm is proposed with an efficient method for computing expected supports and a technique for pruning search space, which reduces the number of subgraph isomorphism testings needed by computing expected support from the exponential scale to the linear scale. Experimental results show that the proposed algorithm is 3 to 5 orders of magnitude faster than a na?ve depth-first search algorithm, and is efficient and scalable.
    2004,15(8):1208-1219, DOI:
    [Abstract] (16198) [HTML] (0) [PDF 948.49 K] (12760)
    Abstract:
    With the explosive growth of network applications and complexity, the threat of Internet worms against network security becomes increasingly serious. Especially under the environment of Internet, the variety of the propagation ways and the complexity of the application environment result in worm with much higher frequency of outbreak, much deeper latency and more wider coverage, and Internet worms have been a primary issue faced by malicious code researchers. In this paper, the concept and research situation of Internet worms, exploration function component and execution mechanism are first presented, then the scanning strategies and propagation model are discussed, and finally the critical techniques of Internet worm prevention are given. Some major problems and research trends in this area are also addressed.
    2009,20(5):1226-1240, DOI:
    [Abstract] (16043) [HTML] (0) [PDF 926.82 K] (15201)
    Abstract:
    This paper introduces the concrete details of combining the automated reasoning techniques with planning methods, which includes planning as satisfiability using propositional logic, Conformant planning using modal logic and disjunctive reasoning, planning as nonmonotonic logic, and Flexible planning as fuzzy description logic. After considering experimental results of International Planning Competition and relevant papers, it concludes that planning methods based on automated reasoning techniques is helpful and can be adopted. It also proposes the challenges and possible hotspots.
    2003,14(10):1717-1727, DOI:
    [Abstract] (15879) [HTML] (0) [PDF 839.25 K] (13511)
    Abstract:
    Sensor networks are integration of sensor techniques, nested computation techniques, distributed computation techniques and wireless communication techniques. They can be used for testing, sensing, collecting and processing information of monitored objects and transferring the processed information to users. Sensor network is a new research area of computer science and technology and has a wide application future. Both academia and industries are very interested in it. The concepts and characteristics of the sensor networks and the data in the networks are introduced, and the issues of the sensor networks and the data management of sensor networks are discussed. The advance of the research on sensor networks and the data management of sensor networks are also presented.
    2009,20(2):350-362, DOI:
    [Abstract] (15814) [HTML] (0) [PDF 1.39 M] (38667)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2015,26(1):62-81, DOI:10.13328/j.cnki.jos.004701
    [Abstract] (15447) [HTML] (1681) [PDF 1.04 M] (24250)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2014,25(4):839-862, DOI:10.13328/j.cnki.jos.004558
    [Abstract] (15159) [HTML] (1746) [PDF 1.32 M] (18225)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2009,20(10):2729-2743, DOI:
    [Abstract] (14252) [HTML] (0) [PDF 1.12 M] (10225)
    Abstract:
    In a multi-hop wireless sensor network (WSN), the sensors closest to the sink tend to deplete their energy faster than other sensors, which is known as an energy hole around the sink. No more data can be delivered to the sink after an energy hole appears, while a considerable amount of energy is wasted and the network lifetime ends prematurely. This paper investigates the energy hole problem, and based on the improved corona model with levels, it concludes that the assignment of transmission ranges of nodes in different coronas is an effective approach for achieving energy-efficient network. It proves that the optimal transmission ranges for all areas is a multi-objective optimization problem (MOP), which is NP hard. The paper proposes an ACO (ant colony optimization)-based distributed algorithm to prolong the network lifetime, which can help nodes in different areas to adaptively find approximate optimal transmission range based on the node distribution. Furthermore, the simulation results indicate that the network lifetime under this solution approximates to that using the optimal list. Compared with existing algorithms, this ACO-based algorithm can not only make the network lifetime be extended more than two times longer, but also have good performance in the non-uniform node distribution.
    2012,23(1):1-20, DOI:10.3724/SP.J.1001.2012.04100
    [Abstract] (14083) [HTML] (0) [PDF 1017.73 K] (29438)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2012,23(5):1148-1166, DOI:10.3724/SP.J.1001.2012.04195
    [Abstract] (14047) [HTML] (0) [PDF 946.37 K] (16374)
    Abstract:
    With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here, the features, influence and related products of cloud databases are first discussed. Then, research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.
    2000,11(11):1460-1466, DOI:
    [Abstract] (13951) [HTML] (0) [PDF 520.69 K] (10544)
    Abstract:
    Intrusion detection is a highlighted topic of network security research in recent years. In this paper, first the necessity o f intrusion detection is presented, and its concepts and models are described. T hen, many intrusion detection techniques and architectures are summarized. Final ly, the existing problems and the future direction in this field are discussed.
    2013,24(8):1786-1803, DOI:10.3724/SP.J.1001.2013.04416
    [Abstract] (13654) [HTML] (0) [PDF 1.04 M] (15639)
    Abstract:
    Many specific application oriented NoSQL database systems are developed for satisfying the new requirement of big data management. This paper surveys researches on typical NoSQL database based on key-value data model. First, the characteristics of big data, and the key technique issues supporting big data management are introduced. Then frontier efforts and research challenges are given, including system architecture, data model, access mode, index, transaction, system elasticity, load balance, replica strategy, data consistency, flash cache, MapReduce based data process and new generation data management system etc. Finally, research prospects are given.
    2004,15(4):571-583, DOI:
    [Abstract] (13538) [HTML] (0) [PDF 1005.17 K] (9177)
    Abstract:
    For most peer-to-peer file-swapping applications, sharing is a volunteer action, and peers are not responsible for their irresponsible bartering history. This situation indicates the trust between participants can not be set up simply on the traditional trust mechanism. A reasonable trust construction approach comes from the social network analysis, in which trust relations between individuals are set up upon recommendations of other individuals. Current p2p trust model could not promise the convergence of iteration for trust computation, and takes no consideration for model security problems, such as sybil attack and slandering. This paper presents a novel recommendation-based global trust model and gives a distributed implementation method. Mathematic analyses and simulations show that, compared to the current global trust model, the proposed model is more robust on trust security problems and more complete on iteration for computing peer trust.
    2002,13(7):1228-1237, DOI:
    [Abstract] (13512) [HTML] (0) [PDF 500.04 K] (13182)
    Abstract:
    Software architecture (SA) is emerging as one of the primary research areas in software engineering recently and one of the key technologies to the development of large-scale software-intensive system and software product line system. The history and the major direction of SA are summarized, and the concept of SA is brought up based on analyzing and comparing the several classical definitions about SA. Based on summing up the activities about SA, two categories of study about SA are extracted out, and the advancements of researches on SA are subsequently introduced from seven aspects.Additionally,some disadvantages of study on SA are discussed,and the causes are explained at the same.Finally,it is concluded with some singificantly promising tendency about research on SA.
    2006,17(7):1588-1600, DOI:
    [Abstract] (13463) [HTML] (0) [PDF 808.73 K] (13592)
    Abstract:
    Routing technology at the network layer is pivotal in the architecture of wireless sensor networks. As an active branch of routing technology, cluster-based routing protocols excel in network topology management, energy minimization, data aggregation and so on. In this paper, cluster-based routing mechanisms for wireless sensor networks are analyzed. Cluster head selection, cluster formation and data transmission are three key techniques in cluster-based routing protocols. As viewed from the three techniques, recent representative cluster-based routing protocols are presented, and their characteristics and application areas are compared. Finally, the future research issues in this area are pointed out.
    2011,22(1):115-131, DOI:10.3724/SP.J.1001.2011.03950
    [Abstract] (13431) [HTML] (0) [PDF 845.91 K] (27023)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2008,19(zk):112-120, DOI:
    [Abstract] (13410) [HTML] (0) [PDF 594.29 K] (13872)
    Abstract:
    An ad hoc network is a collection of wireless mobile nodes dynamically forming a temporary network without the use of any existing network infrastructure or centralized administration. Due to bandwidth constraint and dynamic topology of mobile ad hoc networks, multipath supported routing is a very important research issue. In this paper, we present an entropy-based metric to support stability multipath on-demand routing (SMDR). The key idea of SMDR protocol is to construct the new metric-entropy and select the stability multipath with the help of entropy metric to reduce the number of route reconstruction so as to provide QoS guarantee in the ad hoc network whose topology changes continuously. Simulation results show that, with the proposed multipath routing protocol, packet delivery ratio, end-to-end delay, and routing overhead ratio can be improved in most of cases. It is an available approach to multipath routing decision.
    2009,20(1):11-29, DOI:
    [Abstract] (13402) [HTML] (0) [PDF 787.30 K] (13437)
    Abstract:
    Constrained optimization problems (COPs) are mathematical programming problems frequently encountered in the disciplines of science and engineering application. Solving COPs has become an important research area of evolutionary computation in recent years. In this paper, the state-of-the-art of constrained optimization evolutionary algorithms (COEAs) is surveyed from two basic aspects of COEAs (i.e., constraint-handling techniques and evolutionary algorithms). In addition, this paper discusses some important issues of COEAs. More specifically, several typical algorithms are analyzed in detail. Based on the analyses, it concluded that to obtain competitive results, a proper constraint-handling technique needs to be considered in conjunction with an appropriate search algorithm. Finally, the open research issues in this field are also pointed out.
    2015,26(1):26-39, DOI:10.13328/j.cnki.jos.004631
    [Abstract] (13211) [HTML] (1740) [PDF 763.52 K] (14031)
    Abstract:
    In recent years, transfer learning has provoked vast amount of attention and research. Transfer learning is a new machine learning method that applies the knowledge from related but different domains to target domains. It relaxes the two basic assumptions in traditional machine learning: (1) the training (also referred as source domain) and test data (also referred target domain) follow the independent and identically distributed (i.i.d.) condition; (2) there are enough labeled samples to learn a good classification model, aiming to solve the problems that there are few or even not any labeled data in target domains. This paper surveys the research progress of transfer learning and introduces its own works, especially the ones in building transfer learning models by applying generative model on the concept level. Finally, the paper introduces the applications of transfer learning, such as text classification and collaborative filtering, and further suggests the future research direction of transfer learning.
    2013,24(1):50-66, DOI:10.3724/SP.J.1001.2013.04276
    [Abstract] (13163) [HTML] (0) [PDF 0.00 Byte] (15863)
    Abstract:
    As an important application of acceleration in the cloud, the distributed caching technology has received considerable attention in industry and academia. This paper starts with a discussion on the combination of cloud computing and distributed caching technology, giving an analysis of its characteristics, typical application scenarios, stages of development, standards, and several key elements, which have promoted its development. In order to systematically know the state of art progress and weak points of the distributed caching technology, the paper builds a multi-dimensional framework, DctAF. This framework is constituted of 6 dimensions through analyzing the characteristics of cloud computing and boundary of the caching techniques. Based on DctAF, current techniques have been analyzed and summarized; comparisons among several influential products have also been made. Finally, the paper describes and highlights the several challenges that the cache system faces and examines the current research through in-depth analysis and comparison.
    2003,14(9):1621-1628, DOI:
    [Abstract] (12947) [HTML] (0) [PDF 680.35 K] (18631)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2008,19(8):1902-1919, DOI:
    [Abstract] (12843) [HTML] (0) [PDF 521.73 K] (12860)
    Abstract:
    Visual language techniques have exhibited more advantages in describing various software artifacts than one-dimensional textual languages during software development, ranging from the requirement analysis and design to testing and maintenance, as diagrammatic and graphical notations have been well applied in modeling system. In addition to an intuitive appearance, graph grammars provide a well-established foundation for defining visual languages with the power of precise modeling and verification on computers. This paper discusses the issues and techniques for a formal foundation of visual languages, reviews related practical graphical environments, presents a spatial graph grammar formalism, and applies the spatial graph grammar to defining behavioral semantics of UML diagrams and developing a style-driven framework for software architecture design.
    2008,19(8):1947-1964, DOI:
    [Abstract] (12825) [HTML] (0) [PDF 811.11 K] (9278)
    Abstract:
    Wide-Spread deployment for interactive information visualization is difficult. Non-Specialist users need a general development method and a toolkit to support the generic data structures suited to tree, network and multi-dimensional data, special visualization techniques and interaction techniques, and well-known generic information tasks. This paper presents a model driven development method for interactive information visualization. First, an interactive information visualization interface model (IIVM) is proposed. Then, the development method for interactive information visualization based on IIVM is presented. The Daisy toolkit is introduced, which includes Daisy model builder, Daisy IIV generator and runtime framework with Daisy library. Finally, an application example is given. Experimental results show that Daisy can provide a general solution for development for interactive information visualization.
    2003,14(9):1635-1644, DOI:
    [Abstract] (12768) [HTML] (0) [PDF 622.06 K] (11137)
    Abstract:
    Computer forensics is the technology field that attempts to prove thorough, efficient, and secure means to investigate computer crime. Computer evidence must be authentic, accurate, complete and convincing to juries. In this paper, the stages of computer forensics are presented, and the theories and the realization of the forensics software are described. An example about forensic practice is also given. The deficiency of computer forensics technique and anti-forensics are also discussed. The result comes out that it is as the improvement of computer science technology, the forensics technique will become more integrated and thorough.
    2002,13(10):1952-1961, DOI:
    [Abstract] (12734) [HTML] (0) [PDF 570.96 K] (11033)
    Abstract:
    The crucial technologies related to personalization are introduced in this paper, which include the representation and modification of user profile, the representation of resource, the recommendation technology, and the architecture of personalization. By comparing with some existing prototype systems, the key technologies about how to implement personalization are discussed in detail. In addition, three representative personalization systems are analyzed. At last, some research directions for personalization are presented.
    2010,21(2):231-247, DOI:
    [Abstract] (12605) [HTML] (0) [PDF 1.21 M] (15504)
    Abstract:
    In this paper, a framework is proposed for handling fault of service composition through analyzing fault requirements. Petri nets are used in the framework for fault detecting and its handling, which focuses on targeting the failure of available services, component failure and network failure. The corresponding fault models are given. Based on the model, the correctness criterion of fault handling is given to analyze fault handling model, and its correctness is proven. Finally, CTL (computational tree logic) is used to specify the related properties and enforcement algorithm of fault analysis. The simulation results show that this method can ensure the reliability and consistency of service composition.
    2012,23(1):82-96, DOI:10.3724/SP.J.1001.2012.04101
    [Abstract] (12591) [HTML] (0) [PDF 394.07 K] (13557)
    Abstract:
    Botnets are one of the most serious threats to the Internet. Researchers have done plenty of research and made significant progress. However, botnets keep evolving and have become more and more sophisticated. Due to the underlying security limitation of current system and Internet architecture, and the complexity of botnet itself, how to effectively counter the global threat of botnets is still a very challenging issue. This paper first introduces the evolving of botnet’s propagation, attack, command, and control mechanisms. Then the paper summarizes recent advances of botnet defense research and categorizes into five areas: Botnet monitoring, botnet infiltration, analysis of botnet characteristics, botnet detection and botnet disruption. The limitation of current botnet defense techniques, the evolving trend of botnet, and some possible directions for future research are also discussed.
    2008,19(7):1565-1580, DOI:
    [Abstract] (12371) [HTML] (0) [PDF 815.02 K] (15215)
    Abstract:
    Software defect prediction has been one of the active parts of software engineering since it was developed in 1970's. It plays a very important role in the analysis of software quality and balance of software cost. This paper investigates and discusses the motivation, evolvement, solutions and challenges of software defect prediction technologies, and it also categorizes, analyzes and compares the representatives of these prediction technologies. Some case studies for software defect distribution models are given to help understanding.
    2010,21(7):1620-1634, DOI:
    [Abstract] (12329) [HTML] (0) [PDF 765.23 K] (18954)
    Abstract:
    As an application of mobile ad hoc networks (MANET) on Intelligent Transportation Information System, the most important goal of vehicular ad hoc networks (VANET) is to reduce the high number of accidents and fatal consequences dramatically. One of the most important factors that would contribute to the realization of this goal is the design of effective broadcast protocols. This paper introduces the characteristics and application fields of VANET briefly. Then, it discusses the characteristics, performance, and application areas with analysis and comparison of various categories of broadcast protocols in VANET. According to the characteristic of VANET and its application requirement, the paper proposes the ideas and breakthrough direction of information broadcast model design of inter-vehicle communication.
    2017,28(1):1-16, DOI:10.13328/j.cnki.jos.005139
    [Abstract] (12307) [HTML] (1916) [PDF 1.75 M] (7941)
    Abstract:
    Knapsack problem (KP) is a well-known combinatorial optimization problem which includes 0-1 KP, bounded KP, multi-constraint KP, multiple KP, multiple-choice KP, quadratic KP, dynamic knapsack KP, discounted KP and other types of KPs. KP can be considered as a mathematical model extracted from variety of real fields and therefore has wide applications. Evolutionary algorithms (EAs) are universally considered as an efficient tool to solve KP approximately and quickly. This paper presents a survey on solving KP by EAs over the past ten years. It not only discusses various KP encoding mechanism and the individual infeasible solution processing but also provides useful guidelines for designing new EAs to solve KPs.
    2010,21(5):916-929, DOI:
    [Abstract] (12106) [HTML] (0) [PDF 944.50 K] (16577)
    Abstract:
    Data deduplication technologies can be divided into two categories: a) identical data detection techniques, and b) similar data detection and encoding techniques. This paper presents a systematic survey on these two categories of data deduplication technologies and analyzes their advantages and disadvantages. Besides, since data deduplication technologies can affect the reliability and performance of storage systems, this paper also surveys various kinds of technologies proposed to cope with these two aspects of problems. Based on the analysis of the current state of research on data deduplication technologies, this paper makes several conclusions as follows: a) How to mine data characteristic information in data deduplication has not been completely solved, and how to use data characteristic information to effectively eliminate duplicate data also needs further study; b) From the perspective of storage system design, it still needs further study how to introduce proper mechanisms to overcome the reliability limitations of data deduplication techniques and reduce the additional system overheads caused by data deduplication techniques.
    2006,17(9):1848-1859, DOI:
    [Abstract] (12023) [HTML] (0) [PDF 770.40 K] (19917)
    Abstract:
    In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
    2008,19(10):2706-2719, DOI:
    [Abstract] (11991) [HTML] (0) [PDF 778.29 K] (10933)
    Abstract:
    Web search engine has become a very important tool for finding information efficiently from the massive Web data. With the explosive growth of the Web data, traditional centralized search engines become harder to catch up with the growing step of people's information needs. With the rapid development of peer-to-peer (P2P) technology, the notion of P2P Web search has been proposed and quickly becomes a research focus. The goal of this paper is to give a brief summary of current P2P Web search technologies in order to facilitate future research. First, some main challenges for P2P Web search are presented. Then, key techniques for building a feasible and efficient P2P Web search engine are reviewed, including system topology, data placement, query routing, index partitioning, collection selection, relevance ranking and Web crawling. Finally, three recently proposed novel P2P Web search prototypes are introduced.
    2004,15(12):1751-1763, DOI:
    [Abstract] (11943) [HTML] (0) [PDF 928.33 K] (7348)
    Abstract:
    This paper presents a research work in children Truing test(CTT).The main defference between our test program and other ones is its knowledge-based character,which is supported by a massive commonsense knowledge base.The motivation,design,techniques,experimental results and platform(including a knowledge engine and a cinverstation engine)of the CTT are described in this paper.Finally,some cincluding thoughts about the CTT and AI are given.
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291, DOI:
    [Abstract] (36714) [HTML] (0) [PDF 832.28 K] (78037)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2008,19(1):48-61, DOI:
    [Abstract] (27579) [HTML] (0) [PDF 671.39 K] (59768)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2010,21(8):1834-1848, DOI:
    [Abstract] (20104) [HTML] (0) [PDF 682.96 K] (53931)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2011,22(1):71-83, DOI:10.3724/SP.J.1001.2011.03958
    [Abstract] (29412) [HTML] (0) [PDF 781.42 K] (52868)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2009,20(1):54-66, DOI:
    [Abstract] (19238) [HTML] (0) [PDF 1.41 M] (48537)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2009,20(5):1337-1348, DOI:
    [Abstract] (27654) [HTML] (0) [PDF 1.06 M] (43343)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2009,20(2):271-289, DOI:
    [Abstract] (26617) [HTML] (0) [PDF 675.56 K] (41496)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2009,20(2):350-362, DOI:
    [Abstract] (15814) [HTML] (0) [PDF 1.39 M] (38667)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2004,15(10):1493-1504, DOI:
    [Abstract] (8941) [HTML] (0) [PDF 937.72 K] (38084)
    Abstract:
    Graphics processing unit (GPU) has been developing rapidly in recent years at a speed over Moor抯 law, and as a result, various applications associated with computer graphics advance greatly. At the same time, the highly processing power, parallelism and programmability available nowadays on the contemporary GPU provide an ideal platform on which the general-purpose computation could be made. Starting from an introduction to the development history and the architecture of GPU, the technical fundamentals of GPU are described in the paper. Then in the main part of the paper, the development of various applications on general purpose computation on GPU is introduced, and among those applications, fluid dynamics, algebraic computation, database operations, and spectrum analysis are introduced in detail. The experience of our work on fluid dynamics has been also given, and the development of software tools in this area is introduced. Finally, a conclusion is made, and the future development and the new challenge on both hardware and software in this subject are discussed.
    2010,21(3):427-437, DOI:
    [Abstract] (32533) [HTML] (0) [PDF 308.76 K] (37135)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2014,25(9):1889-1908, DOI:10.13328/j.cnki.jos.004674
    [Abstract] (11400) [HTML] (2131) [PDF 550.98 K] (33156)
    Abstract:
    This paper first introduces the key features of big data in different processing modes and their typical application scenarios, as well as corresponding representative processing systems. It then summarizes three development trends of big data processing systems. Next, the paper gives a brief survey on system supported analytic technologies and applications (including deep learning, knowledge computing, social computing, and visualization), and summarizes the key roles of individual technologies in big data analysis and understanding. Finally, the paper lays out three grand challenges of big data processing and analysis, i.e., data complexity, computation complexity, and system complexity. Potential ways for dealing with each complexity are also discussed.
    2013,24(11):2476-2497, DOI:10.3724/SP.J.1001.2013.04486
    [Abstract] (9891) [HTML] (0) [PDF 1.14 M] (33098)
    Abstract:
    Probabilistic graphical models are powerful tools for compactly representing complex probability distributions, efficiently computing (approximate) marginal and conditional distributions, and conveniently learning parameters and hyperparameters in probabilistic models. As a result, they have been widely used in applications that require some sort of automated probabilistic reasoning, such as computer vision and natural language processing, as a formal approach to deal with uncertainty. This paper surveys the basic concepts and key results of representation, inference and learning in probabilistic graphical models, and demonstrates their uses in two important probabilistic models. It also reviews some recent advances in speeding up classic approximate inference algorithms, followed by a discussion of promising research directions.
    2012,23(4):962-986, DOI:10.3724/SP.J.1001.2012.04175
    [Abstract] (18386) [HTML] (0) [PDF 2.09 M] (29815)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):1-20, DOI:10.3724/SP.J.1001.2012.04100
    [Abstract] (14083) [HTML] (0) [PDF 1017.73 K] (29438)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2012,23(1):32-45, DOI:10.3724/SP.J.1001.2012.04091
    [Abstract] (18312) [HTML] (0) [PDF 408.86 K] (29207)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2016,27(1):45-71, DOI:10.13328/j.cnki.jos.004914
    [Abstract] (28566) [HTML] (1732) [PDF 880.96 K] (29120)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2018,29(5):1471-1514, DOI:10.13328/j.cnki.jos.005519
    [Abstract] (5426) [HTML] (2308) [PDF 4.38 M] (29005)
    Abstract:
    Computer aided detection/diagnosis (CAD) can improve the accuracy of diagnosis,reduce false positive,and provide decision supports for doctors.The main purpose of this paper is to analyze the latest development of computer aided diagnosis tools.Focusing on the top four fatal cancer's incidence positions,major recent publications on CAD applications in different medical imaging areas are reviewed in this survey according to different imaging techniques and diseases.Further more,multidimentional analysis is made on the researches from image data sets,algorithms and evaluation methods.Finally,existing problems,research trend and development direction in the field of medical image CAD system are discussed.
    2005,16(5):857-868, DOI:
    [Abstract] (19607) [HTML] (0) [PDF 489.65 K] (28762)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2011,22(1):115-131, DOI:10.3724/SP.J.1001.2011.03950
    [Abstract] (13431) [HTML] (0) [PDF 845.91 K] (27023)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2013,24(1):77-90, DOI:10.3724/SP.J.1001.2013.04339
    [Abstract] (11018) [HTML] (0) [PDF 0.00 Byte] (25530)
    Abstract:
    Task parallel programming model is a widely used parallel programming model on multi-core platforms. With the intention of simplifying parallel programming and improving the utilization of multiple cores, this paper provides an introduction to the essential programming interfaces and the supporting mechanism used in task parallel programming models and discusses issues and the latest achievements from three perspectives: Parallelism expression, data management and task scheduling. In the end, some future trends in this area are discussed.
    2015,26(1):62-81, DOI:10.13328/j.cnki.jos.004701
    [Abstract] (15447) [HTML] (1681) [PDF 1.04 M] (24250)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2021,32(2):349-369, DOI:10.13328/j.cnki.jos.006138
    [Abstract] (6752) [HTML] (3475) [PDF 2.36 M] (22937)
    Abstract:
    Few-shot learning is defined as learning models to solve problems from small samples. In recent years, under the trend of training model with big data, machine learning and deep learning have achieved success in many fields. However, in many application scenarios in the real world, there is not a large amount of data or labeled data for model training, and labeling a large number of unlabeled samples will cost a lot of manpower. Therefore, how to use a small number of samples for learning has become a problem that needs to be paid attention to at present. This paper systematically combs the current approaches of few-shot learning. It introduces each kind of corresponding model from the three categories: fine-tune based, data augmentation based, and transfer learning based. Then, the data augmentation based approaches are subdivided into unlabeled data based, data generation based, and feature augmentation based approaches. The transfer learning based approaches are subdivided into metric learning based, meta-learning based, and graph neural network based methods. In the following, the paper summarizes the few-shot datasets and the results in the experiments of the aforementioned models. Next, the paper summarizes the current situation and challenges in few-shot learning. Finally, the future technological development of few-shot learning is prospected.
    2017,28(4):959-992, DOI:10.13328/j.cnki.jos.005143
    [Abstract] (8712) [HTML] (2484) [PDF 3.58 M] (22563)
    Abstract:
    The development of mobile internet and the popularity of mobile terminals produce massive trajectory data of moving objects under the era of big data. Trajectory data has spatio-temporal characteristics and rich information. Trajectory data processing techniques can be used to mine the patterns of human activities and behaviors, the moving patterns of vehicles in the city and the changes of atmospheric environment. However, trajectory data also can be exploited to disclose moving objects' privacy information (e.g., behaviors, hobbies and social relationships). Accordingly, attackers can easily access moving objects' privacy information by digging into their trajectory data such as activities and check-in locations. In another front of research, quantum computation presents an important theoretical direction to mine big data due to its scalable and powerful storage and computing capacity. Applying quantum computing approaches to handle trajectory big data could make some complex problem solvable and achieve higher efficiency. This paper reviews the key technologies of processing trajectory data. First the concept and characteristics of trajectory data is introduced, and the pre-processing methods, including noise filtering and data compression, are summarized. Then, the trajectory indexing and querying techniques, and the current achievements of mining trajectory data, such as pattern mining and trajectory classification, are reviewed. Next, an overview of the basic theories and characteristics of privacy preserving with respect to trajectory data is provided. The supporting techniques of trajectory big data mining, such as processing framework and data visualization, are presented in detail. Some possible ways of applying quantum computation into trajectory data processing, as well as the implementation of some core trajectory mining algorithms by quantum computation are also described. Finally, the challenges of trajectory data processing and promising future research directions are discussed.
    2011,22(6):1299-1315, DOI:10.3724/SP.J.1001.2011.03993
    [Abstract] (10654) [HTML] (0) [PDF 987.90 K] (21231)
    Abstract:
    Attribute-Based encryption (ABE) scheme takes attributes as the public key and associates the ciphertext and user’s secret key with attributes, so that it can support expressive access control policies. This dramatically reduces the cost of network bandwidth and sending node’s operation in fine-grained access control of data sharing. Therefore, ABE has a broad prospect of application in the area of fine-grained access control. After analyzing the basic ABE system and its two variants, Key-Policy ABE (KP-ABE) and Ciphertext-Policy ABE (CP-ABE), this study elaborates the research problems relating to ABE systems, including access structure design for CP-ABE, attribute key revocation, key abuse and multi-authorities ABE with an extensive comparison of their functionality and performance. Finally, this study discusses the need-to-be solved problems and main research directions in ABE.
    2009,20(1):124-137, DOI:
    [Abstract] (16642) [HTML] (0) [PDF 1.06 M] (20976)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2009,20(3):524-545, DOI:
    [Abstract] (17163) [HTML] (0) [PDF 1.09 M] (20969)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2006,17(9):1848-1859, DOI:
    [Abstract] (12023) [HTML] (0) [PDF 770.40 K] (19917)
    Abstract:
    In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
    2004,15(11):1583-1594, DOI:
    [Abstract] (8476) [HTML] (0) [PDF 1.57 M] (19796)
    Abstract:
    Uncertainty exists widely in the subjective and objective world. In all kinds of uncertainty, randomness and fuzziness are the most important and fundamental. In this paper, the relationship between randomness and fuzziness is discussed. Uncertain states and their changes can be measured by entropy and hyper-entropy respectively. Taken advantage of entropy and hyper-entropy, the uncertainty of chaos, fractal and complex networks by their various evolution and differentiation are further studied. A simple and effective way is proposed to simulate the uncertainty by means of knowledge representation which provides a basis for the automation of both logic and image thinking with uncertainty. The AI (artificial intelligence) with uncertainty is a new cross-discipline, which covers computer science, physics, mathematics, brain science, psychology, cognitive science, biology and philosophy, and results in the automation of representation, process and thinking for uncertain information and knowledge.
    2005,16(1):1-7, DOI:
    [Abstract] (21777) [HTML] (0) [PDF 614.61 K] (19470)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2012,23(8):2058-2072, DOI:10.3724/SP.J.1001.2012.04237
    [Abstract] (9861) [HTML] (0) [PDF 800.05 K] (19380)
    Abstract:
    The Distributed denial of service (DDoS) attack is a major threat to the current network. Based on the attack packet level, the study divides DDoS attacks into network-level DDoS attacks and application-level DDoS attacks. Next, the study analyzes the detection and control methods of these two kinds of DDoS attacks in detail, and it also analyzes the drawbacks of different control methods implemented in different network positions. Finally, the study analyzes the drawbacks of the current detection and control methods, the development trend of the DDoS filter system, and corresponding technological challenges are also proposed.
    2014,25(1):37-50, DOI:10.13328/j.cnki.jos.004497
    [Abstract] (9417) [HTML] (1797) [PDF 929.87 K] (19186)
    Abstract:
    This paper surveys the state of the art of speech emotion recognition (SER), and presents an outlook on the trend of future SER technology. First, the survey summarizes and analyzes SER in detail from five perspectives, including emotion representation models, representative emotional speech corpora, emotion-related acoustic features extraction, SER methods and applications. Then, based on the survey, the challenges faced by current SER research are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, and presents detailed comparison and analysis between these methods.
    2010,21(7):1620-1634, DOI:
    [Abstract] (12329) [HTML] (0) [PDF 765.23 K] (18954)
    Abstract:
    As an application of mobile ad hoc networks (MANET) on Intelligent Transportation Information System, the most important goal of vehicular ad hoc networks (VANET) is to reduce the high number of accidents and fatal consequences dramatically. One of the most important factors that would contribute to the realization of this goal is the design of effective broadcast protocols. This paper introduces the characteristics and application fields of VANET briefly. Then, it discusses the characteristics, performance, and application areas with analysis and comparison of various categories of broadcast protocols in VANET. According to the characteristic of VANET and its application requirement, the paper proposes the ideas and breakthrough direction of information broadcast model design of inter-vehicle communication.
    2018,29(10):2966-2994, DOI:10.13328/j.cnki.jos.005551
    [Abstract] (8673) [HTML] (3072) [PDF 610.06 K] (18669)
    Abstract:
    In recent years, the rapid development of Internet technology and Web applications has triggered the explosion of various data on the Internet, which generates a large amount of valuable knowledge. How to organize, represent and analyze these knowledge has attracted much attention. Knowledge graph was thus developed to organize these knowledge in a semantical and visualized manner. Knowledge reasoning over knowledge graph then becomes one of the hot research topics and plays an important role in many applications such as vertical search and intelligent question-answer. The goal of knowledge reasoning over knowledge graph is to infer new facts or identify erroneous facts according to existing ones. Unlike traditional knowledge reasoning, knowledge reasoning over knowledge graph is more diversified, due to the simplicity, intuitiveness, flexibility, and richness of knowledge representation in knowledge graph. Starting with the basic concept of knowledge reasoning, this paper presents a survey on the recently developed methods for knowledge reasoning over knowledge graph. Specifically, the research progress is reviewed in detail from two aspects:One-Step reasoning and multi-step reasoning, each including rule based reasoning, distributed embedding based reasoning, neural network based reasoning and hybrid reasoning. Finally, future research directions and outlook of knowledge reasoning over knowledge graph are discussed.
    2003,14(9):1621-1628, DOI:
    [Abstract] (12947) [HTML] (0) [PDF 680.35 K] (18631)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2005,16(10):1743-1756, DOI:
    [Abstract] (9807) [HTML] (0) [PDF 545.62 K] (18455)
    Abstract:
    This paper presents a survey on the theory of provable security and its applications to the design and analysis of security protocols. It clarifies what the provable security is, explains some basic notions involved in the theory of provable security and illustrates the basic idea of random oracle model. It also reviews the development and advances of provably secure public-key encryption and digital signature schemes, in the random oracle model or the standard model, as well as the applications of provable security to the design and analysis of session-key distribution protocols and their advances.
    2013,24(2):295-316, DOI:10.3724/SP.J.1001.2013.04336
    [Abstract] (9714) [HTML] (0) [PDF 0.00 Byte] (18364)
    Abstract:
    Under the new application mode, the traditional hierarchy data centers face several limitations in size, bandwidth, scalability, and cost. In order to meet the needs of new applications, data center network should fulfill the requirements with low-cost, such as high scalability, low configuration overhead, robustness and energy-saving. First, the shortcomings of the traditional data center network architecture are summarized, and new requirements are pointed out. Secondly, the existing proposals are divided into two categories, i.e. server-centric and network-centric. Then, several representative architectures of these two categories are overviewed and compared in detail. Finally, the future directions of data center network are discussed.
    2020,31(7):2245-2282, DOI:10.13328/j.cnki.jos.006037
    [Abstract] (2613) [HTML] (2222) [PDF 967.02 K] (18291)
    Abstract:
    Ultrasonography is the first choice of imaging examination and preoperative evaluation for thyroid and breast cancer. However, ultrasonic characteristics of benign and malignant nodules are commonly overlapped. The diagnosis heavily relies on operator's experience other than quantitative and stable methods. In recent years, medical imaging analysis based on computer technology has developed rapidly, and a series of landmark breakthroughs have been made, which provides effective decision supports for medical imaging diagnosis. In this work, the research progress of computer vision and image recognition technologies in thyroid and breast ultrasound images is studied. A series of key technologies involved in automatic diagnosis of ultrasound images is the main lines of the work. The major algorithms in recent years are summarized and analyzed, such as ultrasound image preprocessing, lesion localization and segmentation, feature extraction and classification. Moreover, multi-dimensional analysis is made on the algorithms, data sets, and evaluation methods. Finally, existing problems related to automatic analysis of those two kinds of ultrasound imaging are discussed, research trend and development direction in the field of ultrasound images analysis are discussed.
    2014,25(4):839-862, DOI:10.13328/j.cnki.jos.004558
    [Abstract] (15159) [HTML] (1746) [PDF 1.32 M] (18225)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2013,24(5):1078-1097, DOI:10.3724/SP.J.1001.2013.04390
    [Abstract] (11577) [HTML] (0) [PDF 1.74 M] (17901)
    Abstract:
    The control and data planes are decoupled in software-defined networking, which provide a new solution for research on new network applications and future Internet technologies. The development status of OpenFlow-based SDN technologies is surveyed in this paper. The research background of decoupled architecture of network control and data transmission in OpenFlow network is summarized first, and the key components and research progress including OpenFlow switch, controller, and SDN technologies are introduced. Moreover, current problems and solutions of OpenFlow-based SDN technologies are analyzed in four aspects. Combined with the development status in recent years, the applications used in campus, data center, network management and network security are summarized. Finally, future research trends are discussed.
    2010,21(7):1605-1619, DOI:
    [Abstract] (9761) [HTML] (0) [PDF 856.25 K] (17496)
    Abstract:
    The rapid development of Internet leads to an increase in system complexity and uncertainty. Traditional network management can not meet the requirement, and it shall evolve to fusion based Cyberspace Situational Awareness (CSA). Based on the analysis of function shortage and development requirement, this paper introduces CSA as well as its origin, conception, objective and characteristics. Firstly, a CSA research framework is proposed and the research history is investigated, based on which the main aspects and the existing issues of the research are analyzed. Meanwhile, assessment methods are divided into three categories: Mathematics model, knowledge reasoning and pattern recognition. Then, this paper discusses CSA from three aspects: Model, knowledge representation and assessment methods, and then goes into detail about main idea, assessment process, merits and shortcomings of novel methods. Many typical methods are compared. The current application research of CSA in the fields of security, transmission, survivable, system evaluation and so on is presented. Finally, this paper points the development directions of CSA and offers the conclusions from issue system, technical system and application system.
    2009,20(6):1393-1405, DOI:
    [Abstract] (11860) [HTML] (0) [PDF 831.86 K] (17421)
    Abstract:
    Combinatorial testing can use a small number of test cases to test systems while preserving fault detection ability. However, the complexity of test case generation problem for combinatorial testing is NP-complete. The efficiency and complexity of this testing method have attracted many researchers from the area of combinatorics and software engineering. This paper summarizes the research works on this topic in recent years. They include: various combinatorial test criteria, the relations between the test generation problem and other NP-complete problems, the mathematical methods for constructing test cases, the computer search techniques for test generation and fault localization techniques based on combinatorial testing.
    2008,19(11):2803-2813, DOI:
    [Abstract] (9051) [HTML] (0) [PDF 319.20 K] (17183)
    Abstract:
    A semi-supervised clustering method based on affinity propagation (AP) algorithm is proposed in this paper. AP takes as input measures of similarity between pairs of data points. AP is an efficient and fast clustering algorithm for large dataset compared with the existing clustering algorithms, such as K-center clustering. But for the datasets with complex cluster structures, it cannot produce good clustering results. It can improve the clustering performance of AP by using the priori known labeled data or pairwise constraints to adjust the similarity matrix. Experimental results show that such method indeed reaches its goal for complex datasets, and this method outperforms the comparative methods when there are a large number of pairwise constraints.
    2009,20(8):2241-2254, DOI:
    [Abstract] (6601) [HTML] (0) [PDF 1.99 M] (17114)
    Abstract:
    Inspired from the idea of data fields, a community discovery algorithm based on topological potential is proposed. The basic idea is that a topological potential function is introduced to analytically model the virtual interaction among all nodes in a network and, by regarding each community as a local high potential area, the community structure in the network can be uncovered by detecting all local high potential areas margined by low potential nodes. The experiments on some real-world networks show that the algorithm requires no input parameters and can discover the intrinsic or even overlapping community structure in networks. The time complexity of the algorithm is O(m+n3/γ)~O(n2), where n is the number of nodes to be explored, m is the number of edges, and 2<γ<3 is a constant.
    2009,20(3):567-582, DOI:
    [Abstract] (8156) [HTML] (0) [PDF 780.38 K] (16736)
    Abstract:
    The research on the software quality model and software quality evaluation model has always been a hot topic in the area of software quality assurance and assessment. A great amount of domestic and foreignresearches have been done in building software quality model and quality assessment model, and so far certainaccomplishments have been achieved in these areas. In recent years, platform building and systematization havebecome the trends of developing basic softwares based on operating systems. Therefore, the quality evaluation ofthe foundational software platform becomes an essential issue to be solved. This article analyzes and concludes thecurrent development of researches on software quality model and software quality assessment model focusing onsummarizing and depicting the developing process of quality evaluation of foundational software platform. It alsodiscusses the future development of researches on quality assessment of foundational software platform in brief, trying to establish a good foundation for it.
    2017,28(1):160-183, DOI:10.13328/j.cnki.jos.005136
    [Abstract] (8464) [HTML] (2834) [PDF 3.12 M] (16705)
    Abstract:
    Image segmentation is the process of dividing the image into a number of regions with similar properties, and it's the preprocessing step for many image processing tasks. In recent years, domestic and foreign scholars mainly focus on the content-based image segmentation algorithms. Based on extensive research on the existing literatures and the latest achievements, this paper categorizes image segmentation algorithms into three types:graph theory based method, pixel clustering based method and semantic segmentation method. The basic ideas, advantage and disadvantage of typical algorithms belong to each category, especially the most recent image semantic segmentation algorithms based on deep neural network are analyzed, compared and summarized. Furthermore, the paper introduces the datasets which are commonly used as benchmark in image segmentation and evaluation criteria for algorithms, and compares several image segmentation algorithms with experiments as well. Finally, some potential future research work is discussed.
    2011,22(3):381-407, DOI:10.3724/SP.J.1001.2011.03934
    [Abstract] (10363) [HTML] (0) [PDF 614.69 K] (16693)
    Abstract:
    The popularity of the Internet and the boom of the World Wide Web foster innovative changes in software technology that give birth to a new form of software—networked software, which delivers diversified and personalized on-demand services to the public. With the ever-increasing expansion of applications and users, the scale and complexity of networked software are growing beyond the information processing capability of human beings, which brings software engineers a series of challenges to face. In order to come to a scientific understanding of this kind of ultra-large-scale artificial complex systems, a survey research on the infrastructure, application services, and social interactions of networked software is conducted from a three-dimensional perspective of cyberization, servicesation, and socialization. Interestingly enough, most of them have been found to share the same global characteristics of complex networks such as “Small World” and “Scale Free”. Next, the impact of the empirical study on software engineering research and practice and its implications for further investigations are systematically set forth. The convergence of software engineering and other disciplines will put forth new ideas and thoughts that will breed a new way of thinking and input new methodologies for the study of networked software. This convergence is also expected to achieve the innovations of theories, methods, and key technologies of software engineering to promote the rapid development of software service industry in China.
    2013,24(4):825-842, DOI:10.3724/SP.J.1001.2013.04369
    [Abstract] (8143) [HTML] (0) [PDF 1.09 M] (16627)
    Abstract:
    Honeypot is a proactive defense technology, introduced by the defense side to change the asymmetric situation of a network attack and defensive game. Through the deployment of the honeypots, i.e. security resources without any production purpose, the defenders can deceive attackers to illegally take advantage of the honeypots and capture and analyze the attack behaviors to understand the attack tools and methods, and to learn the intentions and motivations. Honeypot technology has won the sustained attention of the security community to make considerable progress and get wide application, and has become one of the main technical means of the Internet security threat monitoring and analysis. In this paper, the origin and evolution process of the honeypot technology are presented first. Next, the key mechanisms of honeypot technology are comprehensively analyzed, the development process of the honeypot deployment structure is also reviewed, and the latest applications of honeypot technology in the directions of Internet security threat monitoring, analysis and prevention are summarized. Finally, the problems of honeypot technology, development trends and further research directions are discussed.
    2009,20(8):2199-2213, DOI:
    [Abstract] (10152) [HTML] (0) [PDF 2.05 M] (16615)
    Abstract:
    This paper analyzes the previous study of applying P2P technology in mobile Internet. It first introduces the P2P technology and the conception of mobile Internet, and presents the challenges and service pattern of P2P technology in mobile Internet. Second, the architectures of P2P technology in mobile Internet are described in terms of centralized architecture, super node architecture and ad hoc architecture, respectively. Further more, the resource location algorisms and cross-layer optimizations are introduced based on two different terminal access patterns. Detailed analyses of different key technologies are presented and the disadvantages are pointed out. At last, this paper outlines future research directions.
    2010,21(5):916-929, DOI:
    [Abstract] (12106) [HTML] (0) [PDF 944.50 K] (16577)
    Abstract:
    Data deduplication technologies can be divided into two categories: a) identical data detection techniques, and b) similar data detection and encoding techniques. This paper presents a systematic survey on these two categories of data deduplication technologies and analyzes their advantages and disadvantages. Besides, since data deduplication technologies can affect the reliability and performance of storage systems, this paper also surveys various kinds of technologies proposed to cope with these two aspects of problems. Based on the analysis of the current state of research on data deduplication technologies, this paper makes several conclusions as follows: a) How to mine data characteristic information in data deduplication has not been completely solved, and how to use data characteristic information to effectively eliminate duplicate data also needs further study; b) From the perspective of storage system design, it still needs further study how to introduce proper mechanisms to overcome the reliability limitations of data deduplication techniques and reduce the additional system overheads caused by data deduplication techniques.
    2018,29(1):42-68, DOI:10.13328/j.cnki.jos.005320
    [Abstract] (9276) [HTML] (2267) [PDF 2.54 M] (16495)
    Abstract:
    The Internet has penetrated into all aspects of human society and has greatly promoted social progress. At the same time, various forms of cybercrimes and network theft occur frequently, bringing great harm to our society and national security. Cyber security has become a major concern to the public and the government. As a large number of Internet functionalities and applications are implemented by software, software plays a crucial role in cyber security research and practice. In fact, almost all cyberattacks were carried out by exploiting vulnerabilities in system software or application software. It is increasingly urgent to investigate the problems of software security in the new age. This paper reviews the state of the art of malware, software vulnerabilities and software security mechanism, and analyzes the new challenges and trends that the software ecosystem is currently facing.