• Current Issue
  • Online First
  • Archive
  • Click Rank
  • Most Downloaded
  • 综述文章
  • 专刊文章
  • 分辑系列
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2022,33(9):3139-3151, DOI: 10.13328/j.cnki.jos.006622
    [Abstract] (606) [HTML] (112) [PDF 8.92 M] (634)
    Abstract:
    Text-based image editing is popular in multimedia and is of great application value, which is also a challenging task as the source image is edited on the basis of a given text, and there is a large cross-modal difference between the image and text. The existing methods can hardly achieve effective direct control and correction of the editing process, but image editing is user preference-oriented, and some editing modules can be bypassed or enhanced by controllability improvement to obtain the results of user preference. Therefore, this study proposes a novel autoencoder-based image editing model according to text descriptions. In this model, an autoencoder is first introduced in stacked generative adversarial networks (SGANs) to provide convenient and direct interactive configuration and editing interfaces. The autoencoder can transform high-dimension feature space between multiple layers into color space and directly correct the intermediate editing results under the color space. Then, a symmetrical detail correction module is constructed to enhance the detail of the edited image and improve controllability, which takes the source image and the edited image as symmetrical exchangeable input to correct the previously input edited image by the fusion of text features. Experiments on the MS-COCO and CUB200 datasets demonstrate that the proposed model can effectively and automatically edit images on the basis of linguistic descriptions while providing user-friendly and convenient corrections to the editing.
    2022,33(9):3152-3164, DOI: 10.13328/j.cnki.jos.006620
    [Abstract] (467) [HTML] (58) [PDF 7.53 M] (593)
    Abstract:
    Zero-shot sketch-based image retrieval uses sketches of unseen classes as query samples for retrieving images of those classes. This task is thus faced with two challenges: the modal gap between a sketch and the image and inconsistencies between seen and unseen classes. Previous approaches tried to eliminate the modal gap by projecting the sketch and the image into a common space and bridge the semantic inconsistencies between seen and unseen classes with semantic embeddings (e.g., word vectors and word similarity). This study proposes a cross-modal self-distillation approach to investigate generalizable features from the perspective of knowledge distillation without the involvement of semantic embeddings in training. Specifically, the knowledge of the pre-trained image recognition network is transferred to the student network through traditional knowledge distillation. Then, according to the cross-modal correlation between a sketch and the image, cross-modal self-distillation indirectly transfers the above knowledge to the recognition of the sketch modality to enhance the discriminative and generalizable features of sketch features. To further promote the integration and propagation of the knowledge within the sketch modality, this study proposes sketch self-distillation. By learning discriminative and generalizable features from the data, the student network eliminates the modal gap and semantic inconsistencies. Extensive experiments conducted on three benchmark datasets, namely Sketchy, TU-Berlin, and QuickDraw, demonstrate the superiority of the proposed cross-modal self-distillation approach to the state-of-the-art ones.
    2022,33(9):3165-3179, DOI: 10.13328/j.cnki.jos.006624
    [Abstract] (481) [HTML] (74) [PDF 7.27 M] (730)
    Abstract:
    The encoder-decoder network based on U-Net and its variants have achieved excellent performance in semantic segmentation of medical images. However, some spatial details are lost during feature extraction, which affects the accuracy of segmentation, and the generalization ability and robustness of these models are unsatisfactory. Therefore, this study proposes a deep convolutional encoder-decoder network with saliency guidance and uncertainty supervision to solve the semantic segmentation problem in multimodal medical images. In this method, the initially generated saliency map and the uncertainty probability map are used as the supervised information to optimize the parameters of the semantic segmentation network. Specifically, the saliency map is generated by the saliency detection network to preliminarily locate the target region in an image, and on this basis, the set of pixel points with uncertain classification is calculated to generate the uncertainty probability map. Then, the two maps are sent into the multi-scale feature fusion network together with the original image to guide the network to focus on the learning of the features in the target region and to enhance the representational capacity of regions with uncertain classification and complex boundaries. In this way, the segmentation performance of the network can be improved. The experimental results reveal that the proposed method can capture more semantic information and outperforms existing semantic segmentation methods in semantic segmentation of multimodal medical images, with strong generalization capability and robustness.
    2022,33(9):3180-3194, DOI: 10.13328/j.cnki.jos.006619
    [Abstract] (446) [HTML] (66) [PDF 9.36 M] (562)
    Abstract:
    A TV logo represents important semantic information of videos. However, its detection and recognition are faced with many problems, including varied categories, complex structures, limited areas, low information content, and severe background disturbance. To improve the generalization ability of the detection model, this study proposes synthesizing TV logo data to construct a training dataset by superimposing TV logo images on background images. Further, a two-stage scalable logo detection and recognition (SLDR) method is put forward, which uses the batch-hard metric learning method to rapidly train the matching model and determine the category of TV logos. In addition, the detection targets can be expanded to unknown categories due to the separation mechanism of detection and recognition in SLDR. The experimental results reveal that synthetic data can effectively improve the generalization ability and detection precision of models, and the SLDR method can achieve comparable precision with the end-to-end model without updating the detection model.
    2022,33(9):3195-3209, DOI: 10.13328/j.cnki.jos.006621
    [Abstract] (435) [HTML] (49) [PDF 7.89 M] (550)
    Abstract:
    Video summarization is an indispensable and critical task in computer vision, the goal of which is to generate a concise and complete video summary by selecting the most informative part of a video. A generated video summary is a set of representative video frames (such as video keyframes) or a short video formed by stitching key video segments in time sequence. Although the study on video summarization has made considerable progress, the existing methods have the problems of deficient temporal information and incomplete feature representation, which can easily affect the correctness and completeness of a video summary. To solve the problems, this study proposes a model based on a spatiotemporal transform network, which includes three modules, i.e., the embedding layer, the feature transformation and fusion layer, and the output layer. Specifically, the embedding layer can simultaneously embed spatial and temporal features, and the feature transformation and fusion layer can realize the transformation and fusion of multi-modal features; finally, the output layer generates the video summary by segment prediction and key shot selection. The spatial and temporal features are embedded separately to fix the problem of deficient temporal information in existing models, and the transformation and fusion of multi-modal features can solve the problem of incomplete feature representation. Sufficient experiments and analyses on two benchmark datasets are conducted, and the results verify the effectiveness of the proposed model.
    2022,33(9):3210-3222, DOI: 10.13328/j.cnki.jos.006623
    [Abstract] (633) [HTML] (81) [PDF 5.87 M] (732)
    Abstract:
    Image captioning is of great theoretical significance and application value, which has attracted wide attention in computer vision and natural language processing. The existing attention mechanism-based image captioning methods integrate the current word and visual cues at the same moment to generate the target word, but they neglect the visual relevance and contextual information, which results in a difference between the generated caption and the ground truth. To address this problem, this paper presents the visual relevance and context dual attention (VRCDA) method. The visual relevance attention incorporates the attention vector of the previous moment into the traditional visual attention to ensure visual relevance, and the context attention is used to obtain much complete semantic information from the global context for better use of the context. In this way, the final image caption is generated via visual relevance and context information. The experiments on the MSCOCO and Flickr30k benchmark datasets demonstrate that VRCDA can effectively describe the image semantics, and compared with several state-of-the-art methods of image captioning, VRCDA can yield superior performance in all evaluation metrics.
    2022,33(9):3223-3235, DOI: 10.13328/j.cnki.jos.006386
    [Abstract] (155) [HTML] (23) [PDF 5.76 M] (250)
    Abstract:
    Graphical bandit is an important model for sequential decision making under uncertainty and has been applied in various real-world scenarios such as social network, electronic commerce, and recommendation system. Existing work on graphical bandits only investigates how to identify the best arm rapidly so as to minimize the cumulative regret while ignoring the privacy protection issue arising in many real-world applications. To overcome this deficiency, a differentially private algorithm is proposed, termed as graph-based arm elimination with differential privacy (GAP), for graphical bandits. On the one hand, GAP updates the arm selection strategy based on empirical mean rewards of arms in an epoch manner. The empirical mean rewards are perturbed by Laplace noise, which makes it hard for malicious attackers to infer rewards of arms from the output of the algorithm, and thus protects the privacy. On the other hand, in each epoch, GAP carefully constructs an independent set of the feedback graph and only explores arms in the independent set, which effectively utilize the information in the graph feedback. It is proved that GAP is differential private and its regret bound matches the theoretical lower bound. Experimental results on synthetic datasets demonstrate that GAP can effectively protect the privacy and achieve cumulative regret comparable to that of existing non-private graphical bandits algorithm.
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    Available online:  September 30, 2022 , DOI: 10.13328/j.cnki.jos.006436
    Abstract:
    Control-flow hijacking attacks exploit memory corruption vulnerabilities to grab control of the program, and then hijack the program to execute malicious code, which brings a great threat to system security. In order to prevent control-flow hijacking attacks, researchers have presented a series of defense methods. Control-flow integrity is a runtime defense method that prevents illegal transfer of process control-flow to ensure that control-flow is always within the range required by the program. In recent years, more and more research works are devoted to solving related problems of control-flow integrity, such as presenting new control-flow integrity schemes, new control-flow integrity scheme evaluation methods, etc. This study explains the basic principles of control flow integrity, and then classifies existing control flow integrity schemes. The existing evaluation methods and evaluation indicators of the control-flow integrity scheme are introduced at the same time. Finally, the thoughts on potential future work on control-flow integrity is summarized, which, hopefully, will provide an outlook of the research direction in the future.
    Available online:  September 30, 2022 , DOI: 10.13328/j.cnki.jos.006438
    Abstract:
    Self-paced learning (SPL) is a learning regime inspired by the learning process of humans and animals that gradually incorporates samples into training set from easy to complex by assigning a weight to each training sample. SPL incorporates a self-paced regularizer into the objective function to control the learning process. At present, there are various forms of SP regularizers and different regularizers may lead to distinct learning performance. Mixture weighting regularizer has the characteristics of both hard weighting and soft weighting. Therefore, it is widely used in many SPL-based applications. However, the current mixture weighting method only considers logarithmic soft weighting, which is relatively simple. In addition, in comparison with soft weighting or hard weighting, more parameters are introduced in the mixture weighting scheme. In this study, an adaptive mixture weighting SP regularizer is proposed to overcome the above issues. On the one hand, the representation form of weights can be adjusted adaptively during the learning process; on the other hand, the SP parameters introduced by mixture weighting can be adapted according to the characteristics of sample loss distribution, so as to be fully free of the empirically adjusted parameters. The experimental results on action recognition and multimedia event detection show that the proposed method is able to adjust the weighting form and parameters adaptively.
    Available online:  September 30, 2022 , DOI: 10.13328/j.cnki.jos.006439
    Abstract:
    With the problem of the aging population becomes serious, more attention is payed to the safety of the elderly when they are at home alone. In order to provide early warning, alarm, and report of some dangerous behaviors, several domestic and foreign research institutions are focusing on studying the intelligent monitoring of the daily activities of the elderly in robot-view. For promoting the industrialization of these technologies, this work mainly studies how to automatically recognize the daily activities of the elderly, such as “drinking water”, “washing hands”, “reading a book”, “reading a newspaper”. Through the investigation of the daily activity videos of the elderly, it is found that the semantics of the daily activities of the elderly are obviously fine-grained. For example, the semantics of “drinking water” and “taking medicine” are highly similar, and only a small number of video frames can accurately reflect their category semantics. To effectively address such problem of the elderly behavior recognition, this work proposes a new multimodal multi-granularity graph convolutional network (MM-GCN), by applying the graph convolution network on four modalities, i.e., the skeleton (“point”), bone (“line”), frame (“frame”), and proposal (“segment”), to model the activities of the elderly, and capture the semantics under the four granularities of “point-line-frame-proposal”. Finally, the experiments are conducted to validate the activity recognition performance of the proposed method on ETRI-Activity3D (110000+ videos, 50+ classes), which is the largest daily activities dataset for the elderly. Compared with the state-of-the-art methods, the proposed MM-GCN achieves the highest recognition accuracy. In addition, in order to verify the robustness of MM-GCN for the normal human action recognition tasks, the experiment is also carried out on the benchmark NTU RGB+D, and the results show that MM-GCN is comparable to the SOTA methods.
    Available online:  September 30, 2022 , DOI: 10.13328/j.cnki.jos.006440
    Abstract:
    Password hardening encryption (PHE) is an emerging primitive in recent years. It can resist offline attack brought by keyword guessing attack from server via adding a third party with crypto services joining the decryption process. This primitive enhances the password authentication protocol and adds encryption functionality. This paper presents an active attack from server in the first scheme that introduced this primitive. This attack combines the idea from a cutting-edge threat called algorithm substitution attack which is undetectable and makes the server capable of launching offline attack. This result shows that the original PHE scheme can not resist attacks from malicious server. Then this paper tries to summarize the property that an algorithm substitution attack resistant scheme should have. After that this paper presents a PHE scheme that can resist such kind of attacks from malicious server with simulation results. Finally, this paper concludes the result and gives some expectation for future systematic research on interactive protocols under algorithm substitution attack.
    Available online:  September 30, 2022 , DOI: 10.13328/j.cnki.jos.006536
    Abstract:
    Cross-modal hashing can greatly improve the efficiency of cross-modal retrieval by mapping data of different modalities into more compact hash codes. Nevertheless, existing cross-modal hashing methods usually use a binary similarity matrix, which cannot accurately describe the semantic similarity relationships between samples and suffer from the squared complexity problem. In order to better mine the semantic similarity relationships of data, this study presents a label enhancement based discrete cross-modal hashing method (LEDCH). It first leverages the prior knowledge of transfer learning to generate the label distribution of samples, then constructs a stronger similarity matrix through the label distribution, and generates the hash codes by an efficient discrete optimization algorithm with a small quantization error. Finally, experimental results on two benchmark datasets validate the effectiveness of the proposed method on cross-modal retrieval tasks.
    Available online:  September 30, 2022 , DOI: 10.13328/j.cnki.jos.006545
    Abstract:
    Recently, with the continuous improvement of realism requirements of movies, games, virtual reality applications, etc., the real-time rendering of translucent materials such as human organs and milk has become more and more important. For most of the current subsurface scattering calculation methods, it is difficult to correct the scattering range. To tackle this estimation issue, a new subsurface scattering calculation formula is proposed to accurately represent the maximum scattering distance. First, the brute-force Monte Carlo photon tracking results are simulated to obtain the reflectance profile results. Second, the selected polynomial model is used to fit the reflectance profile to calculate the precise maximum scattering range at the shading point. To begin with, a new importance sampling scheme is proposed to reduce the number of Monte Carlo samples, thereby increasing the computational efficiency. In addition, the required parameters are only provided by the reflectance on the shading points and the mean free path of the material, so as to flexibly adjust the rendering effect. Experiments results have shown that the proposed model can avoid the previous error estimation of the scattering range, and has more accurate rendering results of the complex reflectivity area of the material. Meanwhile, the rendering rate meets real-time requirements.
    Available online:  September 30, 2022 , DOI: 10.13328/j.cnki.jos.006541
    Abstract:
    The asymmetric flow generated by the widely deployed address translation technology brings challenges to the design of load balancing system. To solve the problem of insufficient use of multi-core processors and network card hardware capabilities by software load balancers, an asymmetric flow load balancing method based on flow characteristics is proposed. Firstly, a data packet dispatching algorithm to dispatch packets into expected CPU core via hardware is proposed. Then, an elephant flow detection algorithm is constructed by analyzing of the temporal and spatial characteristics of packet sequences. Finally, based on detected results, a load balance offloading method is proposed. The experimental results show that, asymmetric flow load balancing method can correctly handle the asymmetric flow. Meanwhile, the average throughput rate increases by 14.5%.
    Available online:  September 30, 2022 , DOI: 10.13328/j.cnki.jos.006432
    Abstract:
    In order to solve the dilemma that particle swarm optimization algorithm (PSO) cannot well balance the exploration and exploitation, a density peak based multi subpopulation particle swarm optimization algorithm is proposed with dimensionally reset strategy (DPMPSO). In the proposed DPMPSO, the idea of relative distance originated from density peak clustering is firstly adopted and then it is combined with the fitness value of particles to divide the whole swarm into two subpopulations: the top subpopulation and the bottom subpopulation. Secondly, the learning strategy is designed, focusing on local search for the top subpopulation and the learning strategy paying more attention to global search for the bottom subpopulation, which can well balance the exploration and exploitation. Finally, particles that fall into local optima will be reset by crossover with the global optima dimensionally, which can not only effectively avoid premature but also significantly reduce invalid iteration. The experiment results on 10 benchmark problems and CEC2017 optimization problems demonstrate that DPMPSO performs better than some representative PSOs and other optimization algorithms with significant difference.
    Available online:  September 30, 2022 , DOI: 10.13328/j.cnki.jos.006433
    Abstract:
    With the free supervised signals/labels created by pretext tasks, self-supervised learning (SSL) can learn effective representation from unlabeled data, which has been verified in various downstream tasks. Existing pretext tasks usually first perform explicit linear or nonlinear transformations on the original view data, thus forming multiple augmented view data, then learn the representation by predicting the corresponding transformations or maximizing the consistency among the above views. It is found that such self-supervised augmentations (i.e., the augmentations of the data itself and self-supervised labels) benefit the learning of not only the unsupervised pretext tasks but also the supervised classification task. Nevertheless, few work focus on this at present, while existing works either take the pretexts as the auxiliary of downstream classification task and adopt the multi-task learning or jointly model the downstream task labels and self-supervised labels in a multi-label learning way. Actually, there are inherent differences between downstream and pretext tasks (e.g., semantic, task difficulty, etc.), which inevitably result in the competitions between them and bring risks to the learning of downstream tasks. To challenge this issue, this study proposes a simple yet effective SSL multi-view learning framework (SSL-MV), which avoids the learning interference of self-supervised labels on downstream labels through performing the same learning as downstream tasks on the augmented data views. More interestingly, with the multi-view learning, the proposed framework naturally owns the integration inference ability, which significantly improves the performance of downstream supervised classification tasks. Extensive experiments on benchmark datasets demonstrate the effectiveness of SSL-MV.
    Available online:  September 23, 2022 , DOI: 10.13328/j.cnki.jos.006530
    Abstract:
    This study proposes a convolutional neural network (CNN) based Transformer to solve the panoptic segmentation task. The method draws on the inherent advantages of the CNN in image feature learning and avoids increase in the amount of calculation when the Transformer is transplanted into the vision task. The CNN-based Transformer is attributed to the two basic structures of the projector performing the feature domain transformation and the extractor responsible for the feature extraction. The effective combination of the projector and the extractor forms the framework of the CNN-based Transformer. Specifically, the projector is implemented by a lattice convolution that models the spatial relationship of the image by designing and optimizing the convolution filter configuration. The extractor is performed by a chain network that improves feature extraction capabilities by chain block stacking. Considering the framework and the substantial function of panoptic segmentation, the CNN-based Transformer is successfully applied to solve the panoptic segmentation task. The experimental results on the MS COCO and Cityscapes datasets demonstrate that the proposed method has excellent performance.
    Available online:  September 23, 2022 , DOI: 10.13328/j.cnki.jos.006532
    Abstract:
    As an effective technique for black-box state machine models of software systems, model learning (a.k.a. automata learning) can be divided into active and passive learning. Based on given input and output alphabets, the minimum complete state machine of the target system can be obtained in polynomial time through active interaction with the black box system. And the algorithm of equivalence query is still a big obstacle to the development and application of active automata learning tools. This study discusses the influence of counterexamples on the learning algorithms with the discrimination tree, and defines the comparison rules of hypotheses, and proposes two principles of constructing test cases. According to the principle, the Wp-method equivalence query algorithm is improved to produce better hypotheses and effectively reduce the number of queries and symbols. Based on the LearnLib, three kinds of automata are used as experimental objects to verify the effectiveness of the principle and the improved algorithm.
    Available online:  September 23, 2022 , DOI: 10.13328/j.cnki.jos.006543
    Abstract:
    The security of traditional cryptographic algorithms is based on the black-box attack model. In this attack model, the attacker can only obtain the input and output of the cryptographic algorithm, but not the internal details of the cryptographic algorithm. In recent years, the concept of white-box attack model has been proposed. In the white-box attack model, attackers can not only obtain the input and output of cryptographic algorithm, but also directly observe or change the internal data of cryptographic algorithm. In order to ensure the security of existing cryptographic algorithms under white-box attack environment, redesigning the existing cryptographic algorithms through white-box cryptography technology without changing their functions is called white-box implementation of existing cryptographic algorithms. It is of great significance to study the design and analysis of the white-box implementation scheme for solving the issue of digital rights management. In recent years, a kind of side channel analysis method for white-box implementation schemes has emerged. This kind of analysis method only needs to know a few internal details of white-box implementation schemes, then it can extract the key. Therefore, it is the analysis method with practical threat to the existing white-box implementation schemes. It is of great practical significance to analyze the existing white-box implementation schemes to ensure the security of the schemes. The typical representative of this kind of analysis method is the differential computation analysis (DCA) based on the principle of differential power analysis. This study analyzes the Bai-Wu white-box SM4 scheme based on DCA. Based on the research results of the statistical characteristics of n-order uniform random invertible matrix on GF(2), an improved DCA (IDCA) is proposed, which can significantly improve the analysis efficiency on the premise of almost constant success rate. The results also show that the Bai-Wu white-box SM4 scheme can not guarantee the security in the face of DCA, therefore, it must be further improved to meet the security requirements of practical scenarios.
    Available online:  September 20, 2022 , DOI: 10.13328/j.cnki.jos.006658
    Abstract:
    With the emergence of data islands and the attention to personal privacy protection, the application mode of centralized learning is restricted. Federated learning, as a distributed machine learning framework, can accomplish model training without leaking users' data. It has attracted great attention since it appeared. With the promotion of federated learning applications, its security and privacy have also begun to be questioned. This paper offers a systematic summary and analysis of the existing research achievements of the domestic and foreign researchers on the security and privacy of federated learning models in recent years. Firstly, this paper introduces the background of federated learning, clarifies its definition and workflow, and analyzes its existing vulnerabilities. Secondly, the security threats and privacy risks of federated learning are systematically analyzed and compared, and the existing protection methods are concluded. Finally, this paper prospects future challenges and works in this research area.
    Available online:  September 20, 2022 , DOI: 10.13328/j.cnki.jos.006664
    Abstract:
    The smart contract running on the blockchain can hardly be modified after deployment, and its execution relies on a consensus procedure. Thus, existing debugging methods that require modification of the smart contract or interrupting the execution process cannot be applied to the smart contract. Since the running of smart contract is composed of ordered execution of transactions, the ability to trace the execution of the transaction can make the smart contract more debuggable. The goal of tracing blockchain transaction execution is to unveil how a blockchain transaction produces such result in execution. The execution of a transaction relies on the internal state of the blockchain system, and the state is determined by previous transactions that executed, which results in transitive dependency. Such dependency and the execution environment which the blockchain provides brings challenges for tracing. There are three main challenges:how to obtain enough information for tracing from the production environment in which the smart contract is deployed, how to obtain the dependency between transactions, and how to ensure the consistency between the result of tracing and the real execution process online. In this paper, a tracing method for blockchain transaction execution based on recoding and replay is proposed. By building the recording and replay mechanism in the smart contract container, the reading and writing operations of state can be recorded without modification of the smart contract, and the running of the smart contract is not interrupt. A transaction dependency analysis method based on state reading and writing operation is proposed, to support retracing previous transactions linked by dependency on demand. Moreover, the verification mechanism for reading and writing operation record is proposed, to ensure the consistence between the replaying execution and the real online execution can be verified. The tracing method can trace the execution of the blockchain transaction that calls the smart contract, which can be used in debugging of smart contracts. When loss is caused by the failure of smart contracts, the tracing result can be used as evidence. Experiments demonstrate the comparison of performance between storing recorded reading and writing operations on chain and off chain. The advantages and effectiveness of this method are revealed via case study.
    Available online:  September 20, 2022 , DOI: 10.13328/j.cnki.jos.006531
    Abstract:
    The security against chosen-ciphertext attack (CCA) can effectively figure active attacks in reality. The existing cryptosystems against chosen-ciphertext attack are mainly designed by the foreign and there is a lack of CCA secure cryptosystems designed by our people. Although there are several generic transformation approaches to achieve CCA security, the price to pay is the growth of both computational overhead and communication overhead. In this paper, based on SM9, we propose a new identity-based broadcast encryption which is secure against chosen-ciphertext attack. The scheme construction is derived from SM9 encryption algorithm. The private key size and ciphertext size are of constant which is independent of the number of receivers chosen in data encryption phase. Precisely, the private key consists of one element and the ciphertext is composed of three elements only. If the GDDHE assumption holds, we prove that the proposed scheme is selective secure under chosen-ciphertext attack in the random oracle model. To achieve CCA security, we embed a dummy identity in the ciphertext generation, which can be used to answer the decryption query successfully. Analysis shows that the proposed scheme is comparable to the existing efficient identity-based broadcast encryption schemes in terms of computational efficiency and storage efficiency.
    Available online:  September 20, 2022 , DOI: 10.13328/j.cnki.jos.006683
    Abstract:
    The ability to describe local geometric shapes is very important for the representation of irregular point cloud. However, the existing network is still difficult to effectively capture accurate local shape information. In this article, we simulate depthwise separable convolution calculation method in the point cloud and proposes a new type of convolution, namely dynamic cover convolution (DC-Conv), to aggregate local features. The core of DC-Conv is the space cover operator (SCOP), which constructs anisotropic spatial geometry in a local area to cover the local feature space to enhance the compactness of local features. DC-Conv achieves the capture of local shapes by dynamically combining multiple SCOPs in the local neighborhood. Among them, the attention coefficients of the SCOPs are adaptively learned from the point position in a data-driven way. Experiments on the 3D point cloud shape recognition benchmark dataset ModelNet40,ModelNet 10 and ScanObjectNN show that this method can effectively improve the performance of 3D point cloud shape recognition and robustness to sparse point clouds even in the case of a single scale. Finally, we also provide sufficient ablation experiments to verify the effectiveness of the method. The open source code is published at https://github.com/changshuowang/DC-CNN.
    Available online:  September 16, 2022 , DOI: 10.13328/j.cnki.jos.006396
    Abstract:
    A verifiable timed signature (VTS) scheme allows one to time-lock a signature on a known message for a given amount of time T such that after performing a sequential computation for time T anyone can extract the signature from the time-lock. Verifiability ensures that anyone can publicly check if a time-lock contains a valid signature on the message without solving it first, and that the signature can be obtained by solving the same for time T. This study first proposes the notion of verifiable attribute-based timed signatures (VABTS) and gives an instantiation VABTS further. The instantiation VABTS scheme can not only simultaneously support identity privacy-preserving, dynamic user revocation, traceability, timing, but also solve the problem of key escrow in attribute-based scheme. In addition, VABTS has many applications. This study lists two application scenarios of VABTS: building a privacy-preserving payment channel network for the permissioned blockchain and realizing a fair privacy-preserving multi-party computing. Finally, it is proved that the instantiation VABTS scheme is secure and efficient via formal security analysis and performance evaluation.
    Available online:  September 16, 2022 , DOI: 10.13328/j.cnki.jos.006422
    Abstract:
    Comparing with traditional software, the deep learning software has different structures. Even if a lot of test data is used for testing the deep learning software, the adequacy of testing still hard to be evaluted, and many unknown defects could be implied. The deep forest is an emerging deep learning model that overcomes many shortcomings of deep neural networks. For example, the deep neural network requires a lot of training data, high performance computing platform, and many hyperparameters. However, there is no research on testing deep forest. Based on the structural characteristics of deep forests, this study proposes a set of testing coverage criteria, including random forest node coverage (RFNC), random forest leaf coverage (RFLC), cascad forest class coverage (CFCC), and cascad forest output coverage (CFOC). DeepRanger, a coverage-oriented test data generation method based on genetic algorithm, is proposed to automatically generate new test data and effectively improve the model coverage of the test data. Experiments are carried out on the MNIST data set and the gcForest, which is an open source deep forest project. The experimental results show that the four coverage criteria proposed can effectively evaluate the adequacy of the test data set for the deep forest model. In addition, comparing with the genetic algorithm based on random selection, DeepRanger, which is guided by coverage information, can improve the testing coverage of the deep forest model under testing.
    Available online:  September 16, 2022 , DOI: 10.13328/j.cnki.jos.006424
    Abstract:
    In order to alleviate urban traffic congestion and avoid the traffic accident, the route selection in urban road networks has been a hot research topic. With the development of edge computing and vehicle intelligent terminal technology, driving vehicles in urban road network are transiting from self-organizing network to Internet of vehicles (IoV) paradigm, which makes the route selection of vehicles change the computation based on static historic traffic data to real-time traffic information. In the current research on the route selection in urban road networks, many scholars focus on how to improve the efficiency of travel, reduce travel time, etc. Nevertheless, these studies do not consider the possible risk on the selected route. Based on the above issues, this study constructs a real-time road risk assessment model based on edge computing (R3A-EC) for the first time. Besides, it proposes a real-time route selection method based on risk assessment (R2S-RA). The R3A-EC model makes full use of the characteristics of low latency and high reliability of the edge computing technology to assess the risk on the urban road in real time, and uses the minimum risk Bayes decision making to validate whether there is a risk. Finally, based on the real-time risk assessment model, the route selection of urban road network is optimized to realize the real-time dynamic and low-risk route selection method. Compared with the traditional shortest path method Dijkstra and the shortest time method based on VANET, the dynamic path planning algorithm based on MEC and the bidirectional A* shortest path optimization algorithm, the proposed R2S-RA method can better choose the optimal route that takes road risk and travel time into account, so as to reduce the occurrence of traffic congestion and traffic accidents.
    Available online:  September 16, 2022 , DOI: 10.13328/j.cnki.jos.006427
    Abstract:
    Aiming at the problem of random packet loss and fast energy consumption in actual wireless sensor networks, a reliable collection method is designed based on the characteristics of sensor network and the advantages of compressed sensing. Firstly, the network is clustered. In the data acquisition phase, the measurement matrix based on the actual link state is designed and sparse base suitable for the sensor data is constructed. In the data transmission phase, the data is transmitted from the cluster head to the aggregation node. The best-worst ant system is adopted to evaluate the link quality. Then, the multi-path transmission based on the link quality is carried out. Finally, the data reconstruction task is unloaded to the edge node Implementation. The experimental results show that in the sceanario of random packet loss in the network, the data collection method proposed in this study, compared with other methods, the reliability of data transmission and energy consumption of the network show better results.
    Available online:  September 16, 2022 , DOI: 10.13328/j.cnki.jos.006428
    Abstract:
    Sentiment analysis has various application scenarios in software engineering (SE), such as detecting developers’ emotions in commit messages and identifying developers’ opinions on Q&A forums. Nevertheless, commonly used out-of-box sentiment analysis tools cannot obtain reliable results in SE tasks and misunderstanding of technical knowledge is demonstrated to be the main reason. Then researchers start to customize SE-specific methods in supervised or distantly supervised ways. To assess the performance of these methods, researchers use SE-related annotated datasets to evaluate them in a within-dataset setting, that is, they train and test each method using data from the same dataset. However, the annotated dataset for an SE-specific sentiment analysis task is not always available. Moreover, building a manually annotated dataset is time-consuming and not always feasible. An alternative is to use datasets extracted from the same platform for similar tasks or datasets extracted from other SE platforms. To verify the feasibility of these practices, it is needed to evaluate existing methods in within-platform and cross-platform settings, which refer to training and testing each method using data from the same platform but not the same dataset, and training and testing each classifier using data from different platforms. This study comprehensively evaluates existing SE-customized sentiment analysis methods in within-dataset, within-platform, and cross-platform settings. Finally, the experimental results provide actionable insights for both researchers and practitioners.
    Available online:  September 09, 2022 , DOI: 10.13328/j.cnki.jos.006525
    Abstract:
    With machine learning widely applied to the natural language processing (NLP) domain in recent years, the security of NLP tasks receives growing natural concerns. Existing studies found that small modifications in examples might lead to wrong machine learning predictions, which was also called adversarial attack. The textual adversarial attack can effectively reveal the vulnerability of NLP models for improvement. Nevertheless, existing textual adversarial attack methods all focus on designing complex adversarial example generation strategies with a limited improvement of success rate, and the highly invasive modifications bring the decline of textual quality. Thus, a simple and effective method with high adversarial example quality is in demand. To solve this problem, the sememe-level sentence dilution algorithm (SSDA) and the dilution pool construction algorithm (DPCA) are proposed from a new perspective of improving the process of adversarial attack. SSDA is a new process that can be freely embedded into the classical adversarial attack workflow. SSDA first uses dilution pools constructed by DPCA to dilute the original examples, then generates adversarial examples through those diluted examples. It can not only improve the success rate of any adversarial attack methods without any limit of datasets or victim models but also obtain higher adversarial example quality compared with the original method. Through the experiments of different datasets, dilution pools, victim models, and textual adversarial attack methods, it is successfully verified the improvement of SSDA on the success rate and proved that dilution pools constructed by DPCA can further enhance the dilution ability of SSDA. The experiment results demonstrate that SSDA reveals more vulnerabilities of models than classical methods, and DPCA can help SSDA to improve success rate with higher adversarial example quality.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006486
    Abstract:
    Network representation learning is regarded as a key technology for improving efficiency of information network analysis, which maps network nodes to low-dimensional vectors in a latent space, and maintains structure and character of network in these vectors efficiently. In recent years, many studies focus on exploring network topology and node features intensively, and achieve great effects on many network analysis tasks. In fact, besides these two kinds of information, the accompanying information widely exsisting in network reflects many complexy relationships, and plays an important role in construction and evolution of network. To improve the efficiency of network representation learning, a novel model with integration of accompanying information, name NRLIAI, is proposed in this paper, in which the variational auto-encoders(VAE) is employed to propagate and process information. In the encoder, network topology and node features are aggregated and maped by graph convolutional operators. In the decoder, the network is reconstructed, and accompanying information is fused to guide the process of network representation learning. Our model solves the problem that the existing methods are difficult to utilize accompanying information effectively. Additionally, it has generative ability, which enables it ruduce the overfit problem in the learning process. On several real-world network datasets, extensive comparative experiments are conducted through node classification and link prediction tasks. The experimental results show that the proposed model outperforms other competitive network representation learning methods.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006687
    Abstract:
    Heterogeneous information networks (HINs) are directed graphs including multi-typed objects (vertices) and links (edges), which can express rich semantic information and complex structural information. The problem of cohesive subgraph query in HINs, i.e., given a query vertex q, we can find the cohesive subgraphs containing q in HINs, has become an important problem, and has been widely used in various areas such as event planning, biological analysis and product recommendation. Yet existing methods mainly have the two deficiencies:(1) cohesive subgraphs based on relational constraint and motif cliques contain multiple types of vertices, which makes it hard to solve the scenario of focusing on a specific type of vertices. (2) Although the method based on meta-path can query the cohesive subgraphs with a specific type of vertices, it ignores the meta-path-based connectivity between the vertices in the subgraphs. Therefore, we first propose the connectivity with novel edge-disjoint paths based on meta-path in HINs, i.e., path-connectivity. Then, we raise the k-path connected component (k-PCC) based on path-connectivity, which requires the path-connectivity of subgraph to be at least k. Next, we further propose the steiner maximum path-connected component (SMPCC), which is the k-PCC containing q with the maximum path-connectivity. Finally, we design an efficient graph decomposition-based k-PCC discovery algorithm, and based on this, propose an optimized SMPCC query algorithm. A large number of experiments on five real and synthetic HINs prove the effectiveness and efficiency of our proposed approaches.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006692
    Abstract:
    With the rapid development of the Internet and the penetration of big data mining and applications into all walks of life, how to share and use massive data securely and efficiently has become a new hot research issue. Secure multi-party computation is one of the key technologies to solve this problem. It allows a group of participants to interact, compute a function together and get the output without revealing private inputs. Oblivious transfer is a privacy-protected two-party communication protocol in which a sender holds two messages to be sent, and a receiver selects one to receive, but after that, the sender knows nothing about which message the receiver gets, and the receiver cannot get any information about the unselected message. Oblivious transfer has become one of the key modules of secure multi-party computation, and its efficiency optimization can effectively promote the application of secure multi-party computation, especially for special two-party secure computation protocols such as private set intersection. This paper summarizes the classification of oblivious transfer and several common variants, and respectively describes the construction and research progress of the oblivious transfer protocol based on public key cryptography and oblivious transfer extension, which leads to the importance of the efficiency optimization research of oblivious transfer. At the same time, this paper comprehensively reviews the research progress of efficiency optimization of oblivious transfer and oblivious transfer extension from the perspectives of semi-honest adversary and malicious adversary. On the other hand, in practical application, this paper systematically summarizes the optimization technologies used in the engineering implementation of oblivious transfer and oblivious transfer extension protocols. Finally, this paper points out the main problems and future works of oblivious transfer and oblivious transfer extension protocols.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006694
    Abstract:
    Blockchain, as one of the underlying key technologies of digital currency, has received extensive attention with the rapid development of digital currency. Due to the decentralization, tamper resistance, traceability and other properties of blockchain, more and more enterprise/individual users now choose to use blockchain technology to achieve data transmission and recording. On the one hand, the openness and transparency of the blockchain can fully guarantee the availability of data, but on the other hand, it brings high risks to users' privacy. In order to balance the confidentiality and availability of data, homomorphic encryption is usually employed in security solutions of blockchain. However, in practice, the security strength of the deployed homomorphic encryption schemes is likely to change over time. Considering the complex diversity and distributed characteristics of blockchain application scenarios, once a homomorphic encryption scheme is deployed, the corresponding workload will be very heavy when its security strength needs to be adjusted over time. To make things worse, in practice of blockchain, when considering the regulation requirements in many cases (especially for the data published and transmitted by certain group members), a trusted third party (TTP) such as a regulator, which is able to decrypt all the corresponding ciphertexts on the chain, is needed. If a traditional homomorphic encryption scheme is deployed, the TTP needs to store all users' secret keys, which introduces lots of practical problems to key management and storage of the TTP. According to the current application scenarios and security requirements of blockchain, we propose an additive homomorphic encryption scheme, whose security is based on the decisional k-Lin assumption over k where k. Our scheme can be proved IND-CCA1 secure in the standard model, and has the following three advantages:(i) fine-grained adjustment of the security strength of the proposed scheme can achieved via adjusting the parameter k; (ii) it is a double decryption scheme (i.e., it has two kinds of secret keys, where one of them is held by a certain user, and the other is kept by the TTP, so the TTP can use this key to decrypt all the ciphertexts encrypted by the users under their own public keys); (iii) it can easily degenerate into an IND-CPA secure homomorphic encryption scheme, such that the obtaining scheme, with shorter public-secret key pair and shorter ciphertexts, is also an additively homomorphic, double decryption scheme.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006697
    Abstract:
    App reviews are considered as a communication channel between users and developers to perceive user satisfaction. Users usually describe buggy features (i.e., User Actions) and App abnormal behaviors (i.e., Abnormal Behaviors) in forms of key phrases (e.g., "send a video" and "crash"), which could be buried with other trivial information (e.g., complaints) in the review texts. A fine-grained view about this information could facilitate the developers' understanding of feature requests or bug reports from users, and improve the quality of Apps. Existing pattern-based approaches to extract target phrases can only summarize the high-level topics/aspects of reviews, and suffer from low performance due to insufficient semantic understanding of reviews. This paper proposes a semantic-aware and fine-grained App Review bug mining approach (Arab) to extract User Actions and Abnormal Behaviors, and mine the correlations between them. We design a novel neural network model for extracting fine-grained target phrases, which combines textual descriptions and review attributes to better represent the semantics of reviews. Arab also clusters the extracted phrases based on their semantic relations and provides a visualization of the correlations between User Actions and Abnormal Behaviors. We evaluate Arab on 3,426 reviews from six Apps, and the results confirm the effectiveness of Arab in phrase extraction. We further conduct a case study with Arab on 301,415 reviews of 15 popular Apps to explore its potential application and examine its usefulness on large-scale data.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006700
    Abstract:
    Inspired by the human visual attention mechanism, salient object detection (SOD) aims to detect the most attractive and interesting object or region in a given scene. In recent years, with the development and popularization of depth cameras, depth map has been successfully applied to various computer vision tasks, at the same time, which also provides new ideas for the salient object detection task. The introduction of depth map not only enables the computer to simulate the human visual system more comprehensively, but also provides new solutions for the detection of some difficult scenes, such as low contrast and complex backgrounds by utilizing the structure information and location information of the depth map. In view of the rapid development of RGB-D SOD task in the era of deep learning, this paper aims to sort out and summarize the existing related research outputs from the perspective of key scientific problem solutions, and conduct the quantitative analysis and qualitative comparison of different methods on the commonly used RGB-D SOD datasets. Finally, we summarize the challenges and prospects for the future development trends.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006702
    Abstract:
    It is difficult to solve many-objective optimization problems (MaOPs) effectively by using the traditional multi-objective evolutionary algorithms (MOEAs) based on Pareto dominance relation. A dominance relation is proposed by combing double distances of PBI utility function without introducing extra parameter. Secondly, a diversity maintenance method based on double distances is also defined, which not only considers the double distances of the individual, but also adaptively adjusts the weight of diversity according to the objective number of MaOP, so as to better balance the convergence and diversity of the solution set in many-objective space. Finally, the proposed dominance relation and diversity maintenance method are embedded into the framework of NSGA-II, and then a many-objective evolutionary algorithm based on double distances (MaOEA/d2) is designed. The MaOEA/d2 is compared with other five representative many-objective evolutionary algorithms on the DTLZ and WFG benchmark functions with 5-,10-,15-,and 20-objective in terms of IGD and HV indicators. The empirical results show that MaOEA/d2can obtain better convergence and diversity. Therefore, the proposed MaOEA/d2 is a promising many-objective evolutionary algorithm.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006703
    Abstract:
    Multi-label learning is a very important machine learning paradigm. Traditional multi-label learning methods are designed in supervised or semi-supervised manner. Generally, they require accurate labeling of all or partial data into multiple categories. In many practical applications, it's difficult to obtain the label information with a large number of labels, which greatly restricts the promotion and application of multi-label learning. In contrast, label correlation, as a common weak supervision information, has lower requirements for labeling information. How to use label correlation for multi-label learning is an important but unstudied problem. In this paper, we propose a method named weakly supervised multi-label learning using prior label correlation information (WSMLLC). This model restates the sample similarity by using label correlation, and can obtain label indicator matrix effectively, constrain the projection matrix of data by using prior information, and modify the indicator matrix by introducing regression terms. Compared with the existing methods, the outstanding advantage of WSMLLC model is that it can realize the label assignment of multi-label samples only by providing label correlation priors. Experimental results show that WSMLLC has obvious advantages over current advanced multi-label learning methods in the case of complete loss of label matrix.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006704
    Abstract:
    Evolutionary multitasking optimization focuses on population-based search and solving multiple tasks simultaneously via genetic transfer between tasks. It is considered as the third problem optimization paradigm after single-objective optimization and multiobjective optimization, and has become a hot research topic in the field of computational intelligence in recent years. The evolutionary multitasking optimization algorithm simulates the biocultural phenomena of assortative mating and vertical cultural transmission in nature, which leads to the improved convergence characteristics of multiple optimization tasks with inter-task and intra-task transfer knowledge. In this paper, we give a systematic review of the research progress in evolutionary multitasking in recent years. Firstly, the concept of evolutionary multitasking optimization is introduced and its related five definitions are given. We also explain this problem from the perspective of knowledge transfer optimization. Secondly, the basic framework of the evolutionary multitasking optimization algorithm is introduced in detail. The improvement of it and the implementation of other algorithms based on it are presented. Finally, the application in academic and engineering of this algorithm is summarized. At the end of this paper, we point out the existing challenges in the field of evolutionary multitasking optimization and make an outlook for the further development of this direction.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006705
    Abstract:
    Graph partitioning is the primary work of large-scale distributed graph processing, which plays a fundamental role in storage, query, processing and mining of graph applications. Since graph data in the real world are always dynamic, the research of dynamic graph partitioning is a hot topic. This paper systematically introduces the current algorithms for dynamic graph partitioning, which including streaming graph partitioning algorithm, incremental graph partitioning algorithm, and graph repartitioning algorithm. Firstly, the paper introduces three different partitioning strategies, two different dynamic sources of graph and dynamic graph partitioning problem. Then, three different streaming graph partitioning algorithms are introduced, including Hash algorithm, neighbor distribution-based algorithm and novel algorithm. Secondly, two different incremental graph partitioning algorithms, single element incremental graph partitioning, and batch incremental graph partitioning are introduced. Thirdly, the repartitioning algorithm for graph structure and the repartitioning algorithm for graph computation are introduced, respectively. Finally, based on the analysis and comparison of the existing methods, the main challenges of dynamic graph partitioning are summarized and the corresponding research problems are proposed.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006706
    Abstract:
    Deep hierarchical reinforcement learning (DHRL) is an important research field in deep reinforcement learning (DRL). It focuses on sparse reward, sequential decision and weak transfer ability problems, which are difficult to be solved by classic DRL. DHRL decomposes complex problems and constructs a multi-layered structure for DRL strategies based on hierarchical thinking. By using temporal abstraction, DHRL combines lower-level actions to learn semantic higher-level actions. In recent years, with the development of research, DHRL has been able to make breakthroughs in many domains and shows a strong performance. It has been applied to visual navigation, natural language processing, recommendation system and video description generation fields in real world. In this paper, we firstly introduce the theoretical basis of hierarchical reinforcement learning (HRL). Secondly, we describe the key technologies of DHRL, including hierarchical abstraction techniques and common experimental environments. Thirdly, Taking the option-based deep hierarchical reinforcement learning framework (O-DHRL) and the subgoal-based deep hierarchical reinforcement learning framework (G-DHRL) as the main research objects, we analyze and compare those research status and development trend of various algorithms in detail. In addition, a number of DHRL applications in real world are discussed. Finally, we prospect and summarize DHRL.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006707
    Abstract:
    Given a natural language sentence as the query, the task of video moment retrieval aims to localize the most relevant video moment in a long untrimmed video. Based on the rich visual, text, and audio information contained in the video, how to fully understand the visual information provided in the video and utilize the text information provided by the query sentence to enhance the generalization and robustness of model, and how to align and interact cross-modal information are crucial challenges of the video moment retrieval. In this paper, we systematically sort out the work in the field of video moment retrieval, and divide them into ranking-based methods and localization-based methods. Thereinto, the ranking-based methods can be further divided into the methods of preseting candidate clips, and the methods of generating candidate clips with guidance; the localization-based methods can be divided into one-time localization methods and iterative localization ones. We also summarize the datasets and evaluation metrics of this field and review the latest advances. Finally, we introduce the related extension task, e.g., moment localization from video corpus, and conclude the survey with a discussion on promising trends.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006709
    Abstract:
    Aspect term extraction is a natural language processing task that automatically recognizes and extracts aspect term from free expression text. The article first goes over the basic task of aspect term extraction, the authoritative datasets of it and general evaluation specifications on it. Based on these, the article gives a comprehensive review on the state-of-the-art techniques for the task, including traditional extraction techniques based on statistical strategies and feature engineering, and the neural extraction techniques using deep learning. In particular, the article takes the essence of expression language as the starting point, combines with the limitations of existing techniques and gives an elaboration of the technical difficulties and the future development prospects of aspect term extraction.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006711
    Abstract:
    Most of engineering optimization problems can be formulated as constrained optimization problems. Evolutionary algorithms have been widely used in optimization constrained problems in recent years due to their good performance. However, the constraints make the solution space of the problem discrete, shrink and change, which bring great challenges to the evolutionary algorithm to solve the constrained optimization problem. The evolutionary algorithm integrating constraint handling technology has become a research hotspot. In addition, constraint processing techniques have been widely developed in the optimization of complex engineering application problems with the deepening of research in recent years, such as multi-objective, high-dimensional, equality constraint, etc. In this paper, the evolutionary optimization for complex constraint optimization problems is divided into evolutionary optimization algorithms for complex objectives and evolutionary algorithms for complex constraint scenarios according to the reasons of complexity. The challenges of constraint handling technology due to the complexity of practical engineering applications and the latest research progress in current research are discussed. Finally, the future research trends and challenges are summarized.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006712
    Abstract:
    As considerable amounts of mobility data have been accumulated, next point-of-interest (POI) recommendation has become one of the important tasks in location-based social networks. Existing approaches for next POI recommendation mainly focus on capturing local dynamic preferences from user's recent check-in records, but ignore global static information in historical mobility data. As a result, it prevents further mining of user's preferences and limits the recommendation accuracy. To this end, we propose a global and local feature fusion based approach for next POI recommendation (GLNR). GLNR can model user dynamic behavior by taking advantage of the sequential dependencies between check-ins and the underlying relationships between entities contained in global static information.We novelly introduce two types of global static information, i.e., User-POI association paths and POI-POI association paths, to learn user's global static preferences and the global dependency between successive check-ins. Specifically, we construct a heterogeneous information network based on interactive data and geographical information. To capture global static features, we design a relevance-guided path sampling strategy and a hierarchical attention based representation learning method. Moreover, we update the representations of POIs in the user's check-in sequence based on the two types of global static features. Position and time interval aware self-attention mechanism is further utilized to model the sequential dependency between multiple check-ins. We then predict the check-in probability and recommend a set of next POIs for the target user. Finally, we conduct extensive experiments on two real-world datasets to evaluate the performance of our proposed model GLNR. Experimental results validate the superiority of GLNR for improving recommendation accuracy. Besides, our case study indicates that the explicit paths in the global static information help GLNR to provide interpretable recommendations.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006713
    Abstract:
    Hybrid Transactional Analytical Processing (HTAP) relies on a single system to process the mixed workloads of transactions and analytical queries simultaneously. It not only eliminates the Extract-Transform-Load (ETL) process, but also enables real-time data analysis. However, in order to process the mixed workloads of OLTP and OLAP, such systems must balance the trade-off between workload isolation and data freshness. This is mainly because of the interference of highly-concurrent short-lived OLTP workloads and bandwidth-intensive, long-running OLAP workloads. Most existing HTAP databases leverage the best of row store and column store to support HTAP. As there are different requirements for different HTAP applications, HTAP databases have disparate storage strategies and processing techniques. In this survey, we offer a comprehensive survey of HTAP databases. We introduce a taxonomy of state-of-the-art HTAP databases according to their storage strategies and architectures, we then summarize and compare their pros and cons. Different from previous works that focus on single-model and spark-based loosely-coupled HTAP systems, we focus on real-time HTAP databases with a row-column dual store. Moreover, we take a deep dive into their key techniques regarding data organization, data synchronization, query optimization, and resource scheduling. We also introduce existing HTAP benchmarks. Finally, we discuss the research challenges and open problems for HTAP.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006714
    Abstract:
    In recent years, a large number of software defect prediction models have been proposed. Once a new defect prediction model is proposed, it is often compared with previous defect prediction models to evaluate its effectiveness. However, there is no consensus on how to compare the newly proposed defect prediction model with previous defect prediction models. Different studies often adopt different settings for comparison, which may lead to misleading conclusions in the comparisons of prediction models, and consequently lead to missing the opportunity to improve the effectiveness of defect prediction. This paper systematically reviews the comparative experiments of software defect prediction models conducted by domestic and foreign scholars in recent years. First, we introduce the comparisons of defect prediction models. Then, the research progress is summarized from the perspectives of defect dataset, dataset split, baseline models, performance indicators, and classification thresholds, respectively, in the comparisons. Finally, we summarize the opportunities and challenges in comparative experiments of defect prediction models and outline the research directions in the future.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006715
    Abstract:
    Knowledge tracing is an important tool used to simulate learners' knowledge mastery level in education platforms such as online learning platforms and intelligent tutoring systems. It can simulate the current knowledge state of learners in a timely manner, according to their interaction with exercises. The simulated results could be used to predict the performance of learners in future and help them design personalized learning paths. In the past two decades, researchers have proposed many knowledge tracing models based on the theories in statistics and cognitive science. With the openness and application of educational big data, the models based on deep neural networks (referred to as "deep knowledge tracing models") have gradually replaced the traditional models due to their simple theoretical foundations and superior predictive performances, and become a new research hotspot in the field of knowledge tracing. According to their architectures of the used neural networks, the algorithm details of recent representative deep models for knowledge tracing are illustrated, and a comprehensive performance evaluation of the models on five publicly available datasets is conducted. Finally, some use cases and future research directions of deep knowledge tracing are discussed.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006716
    Abstract:
    As the amount of data generated by the Industrial Internet grows, more and more companies are choosing to outsource the storage of their Industrial Internet data to cloud servers to save storage costs. To prevent the outsourced data from being tampered or deleted, companies need to audit the data integrity regularly. This paper proposes a public auditing scheme for industrial internet data based on smart contracts. Particularly, we design a series of game-theory based smart contracts which can efficiently mitigate malicious participators including the third-party auditor and the cloud server. Compared to existing collusion-resistant public auditing schemes, our scheme does not rely on complex cryptographic tools to achieve resistance to participant malicious behavior, and thus is more efficient and suitable to Industrial Internet applications where huge amount of data need to be frequently updated. Specifically, the game-based contract designed in this paper as an individual solution, can be effectively combined with existing public auditing schemes to turn out a new public auditing scheme with better security without losing efficiency. Finally, we conduct a series of tests on our contract in the local environment and Ropsten, the common test chain for Ethereum. The results show that the designed contract is cheap to run and adaptable to the operating environment, has little impact on the efficiency of the original integrity audit solution, and is more efficient than other integrity schemes that resist the malicious behavior of auditors.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006717
    Abstract:
    Software development is changing. Since the Internet allows far-flung development teams to collaboratively create software, open-source software supply chains are becoming more complex and sophisticated. This work tries to define the new open-source software supply chain model and presents a detailed survey of the security issues in the new open-source software supply chain architecture. Various emerging technologies, such as blockchain, machine learning (ML), and continuous fuzzing as solutions to the vulnerabilities in the open-source software supply chain have also been discussed. While many researchers and organizations are have already proposed new technologies and principles to handle the security issues in this area, proper and more effective solutions remain distant. There are new challenges and opportunities to secure the open-source software supply chain, which are also highlighted in this work.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006725
    Abstract:
    Participating media are ubiquitous in nature and are also major elements in many rendering applications such as special effects, digital games, and simulation systems. Physically-based simulation and reproduction of their appearance can significantly boost the realism and immersion of 3D virtual scenes. However, both the underlying structures of participating media and the light propagation in them are very complex. Therefore, rendering with participating media is a difficult task and hot topic in computer graphics so far. In order to facilitate the treatment in rendering and lower the computational cost, classical methods for participating media rendering are always based on two assumptions:independent scattering and local continuity. These two assumptions are also the building blocks of classical Radiative Transfer Equation (RTE). However, most participating media in nature do not satisfy these two assumptions. This results in the noticeable discrepancy between rendered images and real images. In recent years, these two assumptions have been relaxed by incorporating more physically accurate methods to model participating media, thus significantly improving the physical realism of participating media rendering. This survey analyzes and discusses exisiting non-classical participating media rendering techniques from two aspects:correlated media rendering and discrete media rendering. We foucs on discussing the differences between classical and non-classical participating media rendering. We also describe the principles, advantages and limitations behind each method. Finally, we provide some future directions around non-classical participating media rendering that are worth delving into. We hope this suvery can inspire researchers to study non-classical participating media rendering by addressing some critical issues. We also hope this suvery can be a guidance for engineers from industy to improve their renderers by considering non-classical participating media rendering.
    Available online:  July 22, 2022 , DOI: 10.13328/j.cnki.jos.006732
    Abstract:
    With the increasing trend of data scale expansion and structure diversification, how to use the heterogeneous multi co-processors in modern link to provide a real-time and reliable parallel runtime environment for large-scale data processing has become a research hotspot in the field of high performance and database. Modern servers equipped with multi co-processors (GPU) has become the preferred high-performance platform for analyzing large-scale and irregular graph data. The overall performance of existing research designing graph computing systems and algorithms based on multi-GPU server architecture (such as breadth first traversal and shortest path algorithm) has been significantly better than that of multi-core CPU computing environment. However, the data transmission performance between multi-GPU of existing graph computing system is limited by PCI-E bandwidth and local delay, leading to being unable to achieve a linear growth trend of performance by increasing the number of GPU devices, and even serious delay jitter which can not satisfy the high scalability requirements of large-scale graph parallel computing systems. After a series of benchmark experiments, it is found that the existing system has the following two types of defects:1) the hardware architecture of the data link between modern GPU devices is rapidly updated (such as NVLink-V1 and NVLink-V2), and its link bandwidth and delay have been greatly improved. However, the existing systems are still limited by PCI-E for data communication, and can not make full use of modern GPU link resources (including link topology, connectivity and routing); 2) When dealing with irregular graph data, such systems often adopt single data movement strategy between devices, bringing a lot of unnecessary data synchronization overhead between GPU devices via PCI-E bus, resulting in excessive time-wait overhead of local computing. Therefore, it is urgent to make full use of various communication links between modern multi-GPU to design a highly scalable graph computing system. In order to achieve the high scalability of the multi-GPU graph computing system, a fine-grained communication based on hybrid perception is proposed to enhance the scalability of the multi-GPU graph computing system. It pre-awares the architecture link, uses the modular data link and communication strategy for different graph structured data, and finally selects the optimal data exchange method for large-scale graph data (structural data and application data). Based on above optimization strategies, this paper proposes and designs a graph oriented parallel computing system via multi-GPU named ChattyGraph. By optimizing data buffer and multi-GPU collaborative computing with OpenMP and NCCL, ChattyGraph can adaptively and efficiently support various graph parallel computing applications and algorithms on multi-GPU HPC platform. Several experiments of various real-world graph data on 8-GPU NVIDIA DGX server show that ChattyGraph significantly improves graph computing efficiency and scalability, and outperforms other advanced competitors. The average computing efficiency is increased by 1.2-1.5X and the average acceleration ratio is increased by 2-3X, including WS-VR and Groute.
    Available online:  July 15, 2022 , DOI: 10.13328/j.cnki.jos.006412
    Abstract:
    Question matching is an important task of question answering systems. Current methods usually use neural networks to model the semantic matching degree of two sentences.However, in the field of law, questions often have some problems, such as sparse textual representation, professional legal words, and insufficient legal knowledge contained in sentences.Therefore, the general domain deep learning text matching model is not effective in the legal question matching task.In order to make the model better understand the meaning of legal questions and model the knowledge of the legal field, this study firstly constructs a knowledge base of the legal field, and then proposes a question matching model integrating the knowledge of the legal field (such as legal words and statutes). Specifically, a legal dictionary under five categories of legal disputeshas been constructed, including contract dispute, divorce, traffic accident, labor injury, debt and creditor’s right, and relevant legal articles have been collected to build a knowledge base in the legal field.In question matching, the legal knowledge base is first searched for the legal words and statutes corresponding to the question pair, and then the relationship among the question, legal words, and statutes is modeled simultaneously through the cross attention model. Finally, to achieve more accurate question matching, experiments under multiple legal categories were carried out, and the results show that the proposed method in thisstudy can effectively improve the performance of question matching.
    Available online:  July 15, 2022 , DOI: 10.13328/j.cnki.jos.006413
    Abstract:
    Speech translation aims to translate the speech in one language into the speech or text in another language. Compared with the pipeline system, the end-to-end speech translation model has the advantages of low latency, less error propagation, and small storage, so it has attracted much attention. However, the end-to-end model not only requires to process the long speech sequence and extract the acoustic information, but also needs to learn the alignment relationship between the source speech and the target text, leading to modeling difficulty with poor performance. This study proposes an end-to-end speech translation model with cross-modal information fusion, which deeply combines text-based machine translation model with speech translation model. For the length inconsistency between the speech and the text, a redundancy filter is proposed to remove the redundant acoustic information, making the length of filtered acoustic representation consistent with the corresponding text. For learning the alignment relationship, the parameter sharing method is applied to embed the whole machine translation model into the speech translation model with multi-task training. Experimental results on public speech translation data sets show that the proposed method can significantly improve the model performance.
    Available online:  July 15, 2022 , DOI: 10.13328/j.cnki.jos.006414
    Abstract:
    In recent years, the prevalent research on big-data processing often deals with increased data scale and high data complexity. The frequent usage of high-dimensional data poses challenges during application, such as efficient query and fast access of database in the system. Hence, it is critical to design an effective high-dimensional index to increase query throughput and decrease memory footage. Kraska et al. proposed learned index, which has been proved superior in real-world low-dimensional datasets. With the success of wide adoption of machine learning and deep learning on database management system, more and more researchers aim to set up learned index on high-dimensional datasets so as to improve the query efficiency. However, current solutions fail to effectively utilize the distribution information of data, and sometimes incur high overhead on the initialization of complex deep learning models. In this work, an improved high-dimensional learned index (IHDL index) is proposed based on the division of data space and dimension reduction. Specifically, the index utilizes multiple linear models on the dataset, and decreases the initialization overhead while maintains high query accuracy. Experiments on the synthetic dataset and the OSM dataset verifyits superiority in terms of initialization overhead, query throughput, and memory footage.
    Available online:  July 15, 2022 , DOI: 10.13328/j.cnki.jos.006416
    Abstract:
    With the development and openness of connected vehicle, the planning system of intelligent signal system (I-SIG system) has a big security threat from network attack. Former work has revealed that a frequency-fixed data spoofing attack to the planning weakness can cause a heavy traffic congestion. However, there is still very limited knowledge for security detection, warning, and defense, and there is no work that provides a full time-serial congestion situation quantification and analysis for various attack frequency from high to low. Targeting the open source I-SIG system and its COP planning algorithm, this study proposes a unified framework to quantify and analyze the congestion situation under multiple spoofing attack from high to low frequency. Firstly, a space-time tensor space of three ordersis constructed. Based on tensor computation, a function-dependent integrated analysis approach is implemented, in which the max-min analysis, stationarity analysis, and correlation analysis are developed. Experiments on the traffic simulation platform VISSIM show the effectiveness of quantification and analysis, and demonstrate that the results are meaningful.
    Available online:  July 15, 2022 , DOI: 10.13328/j.cnki.jos.006417
    Abstract:
    Mobile edge computing (MEC) is an efficient technology that enables end users to achieve the goal of high bandwidth and low latency by offloading computationally intensive tasks from mobile devices to edge servers. Computing offloading in the mobile edge computing environment plays an important role in reducing user load and enhancing terminal computing capabilities. This study considers service caching, and proposes a cloud-side-end collaborative computing offloading framework, in which D2D communication and opportunistic networks are introduced. Based on the established model, the offloading decision problemis transformed into a mixed integer nonlinear programming problem, and an iterative mechanism isformulated for the non-cooperative game interaction between wireless characteristics and mobile users to jointly determine the computational offloading plan. The proposed computational offloading algorithm theoretically proves that the multi-user computational offloading game under this framework is an exact potential game (EPG), and the offloading decision is to uninstall under the optimal benefit strategy in the entire network. Taking into account the computing resources of the server, the amount of data for offloading tasks, and the delay requirements of tasks, based on the Gale-Shapley matching theory, the best user association matching algorithmis improved and proposed. Finally, the simulation results show that the proposed unloading decision algorithm has a faster convergence rate and is superior to other benchmark algorithms in terms of energy efficiency.
    Available online:  July 07, 2022 , DOI: 10.13328/j.cnki.jos.006403
    Abstract:
    Deep neural networks have been widely used in fields such as autonomous driving and smart healthcare. Like traditional software, deep neural networks inevitably contain defects, and it may cause serious consequences if they make wrong decisions. Therefore, the quality assurance of deep neural networks has received extensive attention. However, deep neural networks are quite different from traditional software. Traditional software quality assurance methods cannot be directly applied to deep neural networks, and targeted quality assurance methods need to be designed. Software fault localization is one of the important methods to ensure software quality. The spectrum-based fault localization method has achieved good results in traditional software fault localization methods, but it cannot be directly applied to deep neural networks. In this study, based on the traditional software fault localization methods, a spectrum-based fault localization approach named Deep-SBFL for deep neural network is proposed. The approach firstly collects the neuron output information and the prediction results of deep neural network as the spectrum. The spectrum is then further calculated as the contribution information, which can be used to quantify the contribution of neurons to the predicted results. Finally, a suspicious formula for the defect localization of deep neural network is proposed. Based on the contribution information, the suspiciousness scores of neurons in deep neural network are calculated and ranked to find out the most likely defective neurons. To verify the effectiveness of the method, EInspect@n (the number of defects successfully located by inpecting the first n positions of the sorted list) and EXAM (the percentage of elements that must be checked before finding defect elements) were evaluated on a deep neural network trained by the MNIST data set. Experimental results show that this approach can effectively locate different types of defects in deep neural networks.
    Available online:  July 07, 2022 , DOI: 10.13328/j.cnki.jos.006404
    Abstract:
    Improving the efficiency of frequent itemset mining in big data is a hot research topic at present. With the continuous growth of data volume, the computing costs of traditional frequent itemset generation algorithms remain high. Therefore, this study proposes a fast mining algorithm of frequent itemset based on Spark (Fmafibs in short). Taking advantage of bit-wise operation, a novel pattern growth strategy is designed. Firstly, the algorithm converts itemset into BitString and exploits bit-wise operation to generate candidate itemset. Secondly, to improve the processing efficiency of long BitString, a vertical grouping strategy is designed and the candidate itemset are obtained by joining the frequent itemset between different groups of same transaction, and then aggregating and filtering them to get the final frequent itemset. Fmafibs is implemented in Spark environment. The experimental results on benchmark datasets show that the proposed method is correct and it can significantly improve the mining efficiency.
    Available online:  July 07, 2022 , DOI: 10.13328/j.cnki.jos.006405
    Abstract:
    The sparsity has always been a primary challenge for recommendation system, and information fusion recommendation can alleviate this problem by exploiting user preference through their comments, ratings, and trust information, so as to generate corresponding recommendations for target users. Full learning of user and item information is the key to build a successful recommendation system. Different users have different preferences for various items, and users’ interest preferences and social circle are changeable dynamically. A recommendation method combining deep learning and information fusion is proposed to solve the problem of sparsity. Particularly, a new deep learning model named information fusion recommendation model combining attention CNN and GNN (ACGIF for short), is constructed. First, attention mechanism is added to the CNN to process the comment information and learn the personalized representation of users and items from the comment information. It learns the comment representation based on comment coding, and learns the user/item representation in the comment through user/item coding. It adds personalized attention mechanism to filter comments with different levels of importance. Then, the rating and trust information are processed through the GNN. For each user, the diffusion process begins with the initial embedding, combining the relevant features and the free user potential vectors that capture the potential behavioral preferences. A layered influence propagation structure is designed to simulate how the user’s potential embedding evolves as the social diffusion process continues. Finally, the preference vector of the user for the item obtained from the first two parts is weighted and fused to obtain the preference vector of the final user for the item. The MAE and RMSE of the recommended results are employed as the experimenalevaluation indicators on four public data sets. The experimental results show that the proposed model has better recommendation effect and running time compared with the existing seven typical recommendation models.
    Available online:  July 07, 2022 , DOI: 10.13328/j.cnki.jos.006406
    Abstract:
    Generating coherent topic descriptions from the user comments of case-related topics plays a significant role in quickly understanding the case-related news, which can be regarded as a multi-document summarization task based on user comments. However, these comments contain lots of noise, the crucial information for generating summaries is scattered in different comments, the sequence-to-sequence model tends to generate irrelevant and incorrect summaries. Based on these observations, this paper presents a case-related topic summarization method based on the topic interaction graph, which reconstructs the user comments into a topic interaction graph. The motivation is that the graph can express the correlation between different user comments, which is useful to filter the key information in user comments. Specifically, the case elements are first extracted from the user comments, and then the topic interaction graph is constructed, which takes the case elements as the nodes and uses the sentences including these case elements as the node’s contents; then the graph transformer network is introduced to produce the representation of the graph. Finally, the summary is generated by using a standard transformer-based decoder. The experimental results on the collected case-related topic summarization corpus show that the proposed method effectively selects useful content and can generate coherent and factual topic summaries.
    Available online:  July 07, 2022 , DOI: 10.13328/j.cnki.jos.006408
    Abstract:
    The identification of opinion targets in microblog is the basis of analyzing network public opinion involved in cases. At present, the identification method of opinion targets based on topic representation needs to preset a fixed number of topics, and the final results rely on artificial inference. In order to solve these problems, this study proposes a weak supervision method, which only uses a small number of labelled comments to automatically identify the opinion targets in microblog. The specific implementation is as follows. Firstly, the comments are encoded and reconstructed twice based on the variational dual topic representation network to obtain rich topic features. Secondly, a small number of labelled comments are used to guide the topic representation network to automatically identify the opinion targets. Finally, the reconstruction loss of double topic representation and the classification loss of opinion targets identification are optimized together by the joint training strategy, to classify comments of opinion targets automatically and mine target terms. Experiments are carried out on two data sets of microblogs involved in cases. The results show that the proposed model outperforms several baseline models in the classification of opinion targets, topic coherence, and diversity of target terms.
    Available online:  June 15, 2022 , DOI: 10.13328/j.cnki.jos.006397
    Abstract:
    Private set intersection (PSI) is a hot topic in the privacy-preserving computation, which allows two parties computing the intersection of their sets without revealing any additional information except the resulting intersection. Prior PSI protocols mostly considers the scenario between two parties with the potential limitation of requiring expensive hardware. In addition, the weak client with low computation capability cannot outsource the computation to semi-trusted cloud without keeping the data privacy. This study designs a new oblivious two-party distributed pseudorandom function (Otd-PRF), which allows the semi-trusted cloud servers participating the equality test without any leakage of the set information. Based on Otd-PRF, a cloud-aided PSI protocol is designed which can delegate the major computation to the semi-trusted cloud. A formal security analysis is also provided in the semi-honest model and it is extended to support the computation of the private set intersection cardinality. Through the comparison with the related work, the proposed protocol is superior in the computation and communication complexity. This protocol is linear in the size of the client's set. Its performance analysis shows that the protocol is more friendly to the client with constrained device in the semi-honest model.
    Available online:  June 15, 2022 , DOI: 10.13328/j.cnki.jos.006399
    Abstract:
    Dense depth map is essential in areas such as autonomous driving and robotics, but today's depth sensors can only produce sparse depth measurements. Therefore, it is necessary to complete it. In all auxiliary modalities, RGB images are commonly used and easily obtained. Many current methods use RGB and sparse depth information in depth completion. However, most of them simply use channel concatenation or element-wise addition to fuse the information of the two modalities, without considering the confidence of each modalities in different scenarios. This paper proposes a dynamic gated fusion module, which is guided by the sparse distribution of input sparse depth and information of both RGB and sparse depth feature, thus fusing two modal features more efficiently by generating dynamic weights. And designed an efficient feature extraction structure according to the data characteristics of different modalities. Comprehensive experiments show the effectiveness of each model. And the network proposed in this paper uses lightweight model to achieve advanced results on two challenging public data sets KITTI depth completion and NYU depth v2. Which shows our method has a good balance of performance and speed.
    Available online:  June 15, 2022 , DOI: 10.13328/j.cnki.jos.006400
    Abstract:
    High-dimensional data is widely adopted in the real world. However, there is usually plenty of redundant and noisy information existing in high-dimensional data, which accounts for the poor performance of many traditional clustering algorithms when clustering high-dimensional data. In practice, it is found that the cluster structure of high-dimensional data is often embedded in the lower dimensional subspace. Therefore, dimension reduction becomes the key technology of mining high-dimensional data. Among many dimension reduction methods, graph-based method becomes a research hotspot. However, most graph-based dimension reduction algorithms suffer from the following two problems: (1) most of the graph-based dimension reduction algorithms need to calculate or learn adjacency graphs, which have high computational complexity; (2) the purpose of dimension reduction is not considered in the process of dimension reduction. To address the problem, a fast unsupervised dimension reduction algorithm is proposed based on the maximum entropy-MEDR, which combines linear projection and the maximum entropy clustering model to find the potential optimal cluster structure of high-dimensional data embedded in low-dimensional subspace through an effective iterative optimization algorithm. The MEDR algorithm does not need the adjacency graph as an input in advance, and has linear time complexity of input data scale. A large number of experimental results on real datasets show that the MEDR algorithm can find a better projection matrix to project high-dimensional data into low-dimensional subspace compared with the traditional dimensionality reduction method, so that the projected data is conducive to clustering analysis.
    Available online:  June 15, 2022 , DOI: 10.13328/j.cnki.jos.006401
    Abstract:
    In order to reduce the labor cost in the process of bug localization, researchers have proposed various automated information retrieval based bug localization models (IRBL), including those models leveraging traditional features and deep learning based features. When evaluating the effectiveness of IRBL models, most of the existing studies neglect the following problems: the softare version mismatching between bug reports and the corresponding source code files in the testing data or/and the data leakage caused by the chronological order of bug reports when training and testing their models. This study aims to investigate the performance of existing models in real experiment settings and analyzes the impact of version mismatching and data leakage on the real performance of each model. F irst, six traditional information retrieval-based models (Buglocator, BTRracer, BLUiR, AmaLgam, BLIA, and Locus) and one novel deep learning model (CodeBERT) are selected as the research objects. Then, an empirical analysis is conducted based on eight open-source projects under five different experimental settings. The experimental results demonstrate that the effectiveness of directly applying CodeBERT in bug localization is not as good as expected, since its accuracy depends on the version and source code size of a test project. Second, the results also show that, compared with the traditional version mismatching experimental setting, the traditional information retrieval-based models under the version matching setting can lead to an improviment that is up to 47.2% and 46.0% in terms of MAP and MRR. Meanwhile, the effectiveness of CodeBERT model is also affected by both data leakage and version mismatching. It means that the effectiveness of traditional information retrieval-based bug localization is underestimated while the application of deep learning based CodeBERT to bug localization still needs more exploration.
    Available online:  June 06, 2022 , DOI: 10.13328/j.cnki.jos.006642
    Abstract:
    The computation offloading problem of Multi-access Edge Computing (MEC) has become one of the hot topics in current research. The current computation offloading scheme only considers the computation offloading problem in the Cloud,Edge and End structures, but does not consider the attributes of the public and private clouds. In this paper, a novel computation offloading scheme is proposed, which considers the relationship between public cloud and private cloud in Edge Computing, and uses public cloud as a supplement to private cloud resources. In this scheme, the insufficient computing power caused by the limitations of private cloud resource can be alleviated; And a two-layer Stackelberg game is established to solve the computation offloading problem. The optimal strategies of each player are obtained, and the existence and uniqueness of the Nash equilibrium solution of the two-layer game are proved. The simulation results and analysis also show that the feasibility of computation offloading scheme based on the two-layer Stackelberg game is proved, and the computation offloading scheme based on the two-layer Stackelberg game is more efficient and more suitable for Edge Computing environment than the computation offloading scheme based on single-layer Stackelberg game.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006645
    Abstract:
    Event extraction is to automatically extract event information that users are interested in from unstructured natural language text and express it in a structured form. Event extraction is an important direction in natural language processing and understanding, and has high application value in different fields such as government public affairs management, financial business, and biomedicine. According to the degree of dependence on manually labeled data, the current event extraction methods based on deep learning are mainly divided into two categories: supervised learning and distant supervised learning. This article provides a comprehensive overview of current event extraction techniques in deep learning. Focusing on supervised methods such as CNN, RNN, GAN and GCN and distant supervision, and the research in recent years is systematically summarized. Besides, the performance of different deep learning models is compared and analyzed in detail. Finally, the challenges facing event extraction are analyzed, and the research trends are forecasted.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006650
    Abstract:
    iscourse structure analysis aims to understand the overall structure of a document and the semantic relationship between its various parts. As a research hotspot of natural language processing, it has developed rapidly in recent years. This paper first summarizes the mainstream discourse structure analysis theories in English and Chinese and then introduces the research on the popular English and Chinese discourse corpus with their relevant calculation models. On this basis, this paper surveys the current work context of discourse structure analysis in Chinese and English and constructs the research framework of discourse structure analysis. Moreover, the application of discourse structure in downstream tasks is introduced briefly. Finally, this paper points out the issues and challenges in the current Chinese discourse structure analysis to provide guidance and help for future research.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006657
    Abstract:
    The lattice-based cryptanalysis method using the algorithms solving hard lattice problems is used to analyze the security of public key cryptosystems, and has become one of the powerful mathematical tools to study the security of RSA and its variant algorithms. The key of this method is the construction of lattice basis. In 2006, Jochemsz and May proposed a general lattice basis construction strategy. However, the general strategy does not make full use of the algebraic structure of RSA algorithms. In recent years, the lattice-based cryptanalysis of RSA mostly focuse on the algebraic structure of specific algorithms, and use special lattice construction techniques to achieve better results. In this paper, we first introduce the lattice-based cryptanalysis method and the general lattice basis construction strategy, and abstract several common lattice construction techniques from previous works. Secondly, we give a survey of main progress on lattice-based cryptanalysis of standard RSA, which covers the following concerns: factoring with known bits, small secret exponent attacks and partial key exposure attacks. Then, we summarize the special algebraic structures of several commonly used RSA variant algorithms and their applicable lattice construction techniques. Finally, we classify and summarize the lattice-based cryptanalysis works of RSA, and prospect the future of the lattice-based cryptanalysis of RSA and the further work.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006666
    Abstract:
    During software development, developers make extensive use of third-party libraries to relieve themselves of heavy burden instead of reinventing common functions. There are dependencies between different third-party libraries. Incompatibilities between versions will lead to errors during installing, loading or invoking third-party libraries, resulting in system exceptions. Such problem is called Dependency Conflict (DC also referred as Conflict Dependency or CD) issue. The root cause of this issue is the third-party libraries fail to cover required features (e.g., methods). DC issues often occur at the project’s build time or runtime, and are difficult to locate. Repairing DC issues requires developers to know about the differencies among versions of third-party libraries they use, and the complex relationship between the third-party libraries increases the difficulty of repairment. In order to find DC issues before software running, and to respond to and deal with system anomalies caused by DC issues in the process of running, researchers have made various studies on these issues. This paper conducts a systematic review of this research topic from four aspects, including the usage analysis of third-party libraries, the root cause of DC issues, the detection methods of DC issues, and common fixing strategies. Finally, the potential research opportunities in the future are discussed, and references are provided for researchers in this field.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006671
    Abstract:
    Inverse reinforcement learning (IRL), also known as inverse optimal control (IOC), is a subfield of imitation learning and reinforcement learning. In order to learn expert behavior, IRL methods infer a reward function from expert demonstrations, then, IRL methods adopt a reinforcement learning algorithm to find out the desired behavior. In recent years, IRL methods have received a lot of attention and have been successfully used in solving a variety of tasks, such as navigation for vehicle investigation, planning trajectory, and robotic optimal control. First, the fundamental theories that include the formal definition of IRL are presented. Then, we introduce the research progress of IRL methods which include algorithms based on linear reward function and non-linear reward function, such as maximum margin approaches and maximum entropy approaches. In addition, from frontier research directions of inverse reinforcement learning, we introduce and analyze representative algorithms in this IRL which include incomplete expert demonstrations IRL approach, multi-agent IRL approach, sub-optimal expert demonstrations IRL approach, and guiding IRL approach. Finally, we summary some primary challenges and future developments in inverse reinforcement learning methods.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006672
    Abstract:
    The basic idea of the multiobjective optimization evolutionary algorithm based on decomposition (MOEA/D) is to transform a multiobjective optimization problem into a set of subproblems (single-objective or multiobjective). Since MOEA/D was proposed in 2007, it has attracted extensive attention from scholars all over the world. MOEA/D has become one of the most representative multiobjective optimization evolutionary algorithms. This paper summarizes the research progress on MOEA/D in the past thirteen years. It includes: (1) the improvements of MOEA/D; (2) the research of MOEA/D on many-objective optimization problems and constrainted optimization problems; (3) the application of MOEA/D on some real-world problems. Then, this paper compares experimentally the performance of several representative improved algorithms of MOEA/D. Finally, this paper presents several potential research topics of MOEA/D in the future.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006677
    Abstract:
    Distributed Ledger (DL) is usually considered as a distributed data management architecture. It maintains data records (the ledgers) across distributed nodes based on consensus mechanism and protocols. DL system can trace all the information of data ownership, usage, and trading chains throughout the lifecycle of data production and transactions, and protect the data from illegal use such as tamper-resistant and non-repudiation. It thus can provide endorsement for data rights confirmation, protection, and audit. Blockchain is a typical implementation of DL system. With the emerging of digital economy applications like digital currency and data asset trading, DL technologies gain more and more attentions. However, system performance is one of the key technical bottlenecks for large-scale application of DL systems, and performance optimization has become a research focus of academia and industry. The paper investigated the methods, technologies, and typical solutions of DL performance optimization from following four perspectives: DL system architecture, ledger data structure, consensus mechanism, and message communication.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006679
    Abstract:
    Deep learning (DL) systems have powerful learning and reasoning capabilities and are widely used in many fields, such as unmanned vehicles, speech processing, intelligent robotics, and etc. Due to the limited dataset and the dependence on manually labeled data, DL systems often fail to detect erroneous behaviors. Accordingly, the quality of DL systems have received widespread attention, especially in safety-critical fields. Due to that fuzzing shows efficient fault-detecting ability in traditional programs, in recent years, it becomes a hot research field to employ fuzzing to test DL systems. In this study, we present a systematic review of fuzzing for DL systems, focusing on test case generation (including seed queue construction, seed selection, and seed mutation), test result determination, and coverage analysis, and then introduce commonly used data sets and metrics. We also discuss issues and opportunities in future researches of this field.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006680
    Abstract:
    In recent years, deep learning technology has made remarkable progress in many computer vision tasks, and more and more researchers have tried to apply it to the field of medical image processing, such as segmentation of anatomical structures in high-throughput medical images (CT, MRI), which can improve the efficiency of image reading for doctors. For specific deep learning tasks in medical applications, the training of deep neural networks needs a large amount of labeled data. But in the medical field, it is awfully hard to obtain large amounts, even unlabeled data from a separate medical institution. Moreover, due to the difference in medical equipment and acquisition protocols, the data from different medical institutions are quite different. The large heterogeneity of data makes it difficult to obtain reliable results on the data of a certain medical institution, with the model trained with data from other medical institutions. In addition, the distribution of disease stage in a dataset is often very uneven, which may also reduce the reliability of the model. In order to reduce the impact of data heterogeneity and improve the generalization ability of the model, domain adaptation and multi-site learning gradually started to be used. Domain adaptation as a research hotspot in transfer learning, is intended to transfer knowledge learned from the source domain to unlabeled target domain data; and federated learning on non-independent and identically distributed (non-iid) data aim to improve the robustness of the model by learning a common representation on multiple data sets. This paper investigates, analyzes, and summarizes domain adaptation, multi-site learning, federated learning on non-iid data and datasets in recent years, and provides references to related research.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006681
    Abstract:
    Heterogeneous information network is a representation form of heterogeneous data. How to integrate the complex semantic information of heterogeneous data is one of the challenges faced by recommendation system. A high-order embedded learning framework for heterogeneous information networks based on weak ties is constructed by using rich semantic information and powerful information transmission capabilities of weak ties, which mainly includes three modules: initial information embedding, high-order information embedding aggregation and recommendation prediction. Initialization information embedded module first adopts the best trust path selection algorithm to avoid information loss caused by sampling a fixed number of neighbors in a full-relational heterogeneous information network. Then network nodes are effectively characterized by filtering out the semantic information of each node using the newly defined multi-task sharing feature importance measurement factor based on multi-head attention and combining it with the interactive structure. The high-order information embedding aggregation module realizes the expression of high-order information by integrating the representational ability of the weak ties and network embedding. And the hierarchical propagation mechanism of heterogeneous information network is utilized to aggregate the characteristics of sampled nodes into the nodes to be predicted. The recommendation prediction module uses the influence recommendation method of high-order information to complete the recommendation. UI-HEHo framework has the characteristics of rich types of embedded nodes, fusion of shared attributes, and implicit interactive information. Finally, the experiments have verified that UI-HEHo can effectively improve the accuracy of rating prediction, as well as the pertinence, novelty and diversity of recommendation generatio. Especially in application scenarios with sparse data,UI-HEHo behaved a good recommendation effect.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006520
    Abstract:
    In view of the fact that the syntactic relationship is not fully utilized and the argument role is missing in event extraction, an event extraction based on dual attention mechanism (EEDAM) method is proposed to improve the accuracy and recall rate of event extraction. Firstly, sentence coding is based on four embedded vectors and dependency relation is introduced to construct dependency relation graph, so that deep neural network can make full use of syntactic relation. Then, through graph transformation attention network, new dependency arcs and aggregate node information are generated to capture long-range dependencies and potential interactions, weighted attention network is integrated to capture key semantic information in sentences, and sentence level event arguments are extracted to improve the prediction ability of the model. Finally, the key sentence detection and similarity ranking are used to fill in the document level arguments. The experimental results show that the event extraction method based on dual attention mechanism can improve the accuracy rate, recall rate, and F1-score by 17.82%, 4.61%, and 9.80% respectively compared with the optimal baseline joint multiple Chinese event extractor (JMCEE) on ACE2005 data set. On the data set of dam safety operation records, the accuracy, recall rate, and F1 score are 18.08%, 4.41%, and 9.93% higher than the optimal baseline JMCEE, respectively.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006419
    Abstract:
    With the development of cloud computing, more and more multimedia data is stored in the cloud. For security needs, it is often necessary to encrypt images before uploading them to the cloud for storage or computing operations. Without knowing the plaintext content of the encrypted image, in order to verify the integrity of the image information and the authenticity of the content, an image hash algorithm based on Paillier homomorphic encryption is proposed. The algorithm is mainly composed of three parts: the image owner encrypts the image, the cloud server generates a ciphertext image hash, and the receiver generates a plaintext image hash. Specifically, the image owner encrypts the image and uploads the encrypted image to the cloud server. The cloud server uses the algorithm of the Paillier cryptosystem to perform calculations of DCT and Watson human visual features in encrypted domain, and uses a key-controlled pseudo-random matrix to increase the randomness of the ciphertext hash, thereby improving the security of the hash. The receiver decrypts and analyzes the received ciphertext hash to obtain the plaintext image hash. Experimental results show that the proposed algorithm has ideal performance in terms of robustness, uniqueness, and security.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006307
    Abstract:
    One of the crucial tasks in the field of Natural Language Processing(NLP) is identifying suitable idioms due to context. The available research considers the Chinese idiom cloze task as a textual similarity task. Although the current pre-trained language model plays an important role in textual similarity, it also has apparent defects. When used as a feature extractor pre-trained language model ignores the mutual information between sentences, while as a text matcher, it requires high computational cost and long running time. In addition, the matching between context and candidate idioms is asymmetric, which influences the effect of the pre-trained language model as a text matcher. In order to solve the above two problems, this paper is motivated by the idea of parameter sharing and proposes a TALBERT-blank network. Idiom selection is transformed from a context-based asymmetric matching process into a blank-based symmetric matching process by TALBERT-blank. The pre-trained language model acts as both a feature extractor and a text matcher, and the sentence vector is utilized for latent semantic matches. This greatly reduces the amount of parameters and the consumption of memory, improves the speed of train and inference while maintaining accuracy, and produces a lightweight and efficient effect. The experimental results of this model on CHID data set prove that compared with ALBERT text matcher, the calculation time is further shortened by 54.35 percent for the compression model with a greater extent under the condition of maintaining accuracy.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006354
    Abstract:
    Learned indexes are capable of predicting the accurate location of data in storage by learning the data distribution. These indexes can significantly reduce storage consumption while providing efficient query processing. Existing learned indexes are mostly optimized for read-only queries, but inadequate in supporting insertions and updates. In an attempt to address the challenges faced by learned index, this paper proposes a workload-adaptive learned index named ALERT. Generally, ALERT employs a Radix Tree to manage variable-length segments, where each segment contains a linear interpolation model with a maximum error-bound. Meanwhile, ALERT utilizes an insertion memory buffer to reduce the cost of updates. Following the database-cracking approach, the paper proposes adaptive index maintenance during the run-time processing of point queries and range queries. The maintenance technique is implemented by performing workload-aware dynamic re-organization on the insertion buffer. Experimental results confirm that, when compared to state-of-the-art learned index, ALERT achieves competitive results as it reduces the index the average construction time by 81%, the average memory utilization by 75%, the average latency of insert by 50%, while maintaining competitive read performances. The average query latency of ALERT is also reduced by 15%, owing to its effective workload-aware optimization.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006684
    Abstract:
    Distributed systems play an important role in the computing environment. The consensus protocol is used to guarantee the consistency between nodes. The design error in the consensus protocol might cause failure in the operation of the system and might bring catastrophic consequences to humans or the environment. Therefore, proving the correctness of the consensus protocol is very important. Formal verification can strictly prove the correctness of the target properties in the design model, which is suitable for verifying consensus protocols. However, with the scale of distributed systems increasing, it becomes more complicated to verify its correctness, which brings more challenges to the formal verification techniques. What method to use to formally verify the design of the consensus protocol and how to increase the verification scale are important research issues in the formal verification of the consensus protocol. This paper investigates the current research on the use of formal methods to verify consensus protocols, summarizes the key techniques and main methods, and proposes future research directions in this field.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006686
    Abstract:
    NL2SQL refers to a technology that automatically converts query expressed in natural language into a structured SQL expression,which can be parsed and executed by the DBMS.NL2SQL can provide ordinary users with a natural interactive interface for database query access,thereby realizing question-answering atop database systems.NL2SQL for complex queries is now a research hotspot in the database community.The most prevalent approach uses the sequence-to-sequence (Seq2seq) encoder and decoder to convert complex natural language to SQL.However,most of the existing work focuses on English language.This approach is not ready to address the special colloquial expressions in Chinese queries.In addition,the existing work cannot correctly output query clauses containing complex calculation expressions.To solve the above problems,this paper proposes to use a tree model instead of the sequence representation.The proposed approach disassembles complex queries from top to down to comprise a multi-way tree,where the tree nodes represent the elements of SQL.It uses a depth-first search to predict and generate SQL statements. Our approach has achieved the championship and 1st runner-up in two official tests of DuSQL Chinese NL2SQL Competition.The experimental results confirm the effectiveness of the proposed approach.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006535
    Abstract:
    Heterogeneous information networks can be used for modeling several applications in the real world. Their representation learning has received extensive attention from scholars. Most of the representation learning methods extract structural and semantic information based on meta-paths and their effectiveness in network analysis have been proved. However, these methods ignore the node internal information and different degrees of importance of meta-path instances. Besides, they can capture only the local node information. Thus, this paper proposes a heterogeneous network representation learning method fusing mutual information and multiple meta-paths. First, a meta-path internal encoding method called relational rotation encoding is used, which captures the structural and semantic information of the heterogeneous information network according to adjacent nodes and meta-path context nodes. It uses an attention mechanism to model the importance of each meta-path instance. Then, an unsupervised heterogeneous network representation learning method fusing mutual information maximization and multiple meta-paths is proposed and mutual information can capture both global and local information. Finally, experiments are conducted on two real datasets. Compared with the current mainstream algorithms as well as some semi-supervised algorithms, the results show that the proposed method has better performance on node classification and clustering.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006634
    Abstract:
    Interferences among wireless signals hinder concurrent transmissions such that the throughput of wireless networks decreases. It is well known that link scheduling is an effective way to improve throughput and decrease transmission delay of wireless networks. SINR (Signal to Interference plus Noise Ratio) model accurately describes the inherent characteristics of wireless signal propagation. Therefore, in this paper, an online distributed link scheduling (OLD_LS) algorithm with a constant approximation factor under the SINR model is proposed. Here, online means that nodes can join in and leave from wireless networks. Nodes joining in or leaving from networks arbitrarily reflects the dynamic characteristics of wireless networks. OLD_LS partitions the network region into hexagons and localizes the SINR model, which is a global interference model. A leader election (LE) subroutine in dynamic networks is proposed in this paper. It is shown that if the dynamic rate of nodes is less than 1/ε, LE elects a leader with a high probability and the time complexity is O(logn+logR). Where,ε is a constant and satisfies ε≤5(1-21-α/2)/6, with α being the path loss exponent, n is the number of senders and R is the longest link length. To the best of our knowledge, the algorithm proposed in this paper is the first online distributed link scheduling algorithm for dynamic wireless networks.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006635
    Abstract:
    Network measurement is the basis of researching on network performance monitoring, traffic management, and fault diagnosis. In-band network telemetry has become a hot issue in current network measurement research due to its real-time, accuracy, and scalability. With the emergence and development of Programmable Data Planes, many practical in-band network telemetry solutions have been proposed thanks to its rich information feedback and flexible function deployment. First, we analyze the principles and deployment challenges of typical in-band network telemetry solutions INT and AM-PM. Second, according to the optimization measures and extension of in-band network telemetry, we analyze the characteristics of the optimization mechanism from the aspects of data collection process and multi-task orchestration, and analyze the feasibility of technology extension from the aspects of wireless network, optical network and hybrid network. Third, we compare and analyze the latest applications from the aspects of in-network performance sensing, network-level telemetry system, traffic scheduling and fault diagnosis. Finally, we summarize research work of in-band network telemetry and highlight the future research directions.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006636
    Abstract:
    This paper proposes some new classical key recovery attacks against Feistel, Misty and Type-1/2 Generalized Feistel Scheme. Our new key recovery attacks can be constructed by combining the birthday attack with the periodic property of Simon’s algorithm, which are different from the previous classical attacks. By using Simon quantum algorithm, an adversary can recover the periodic value in polynomial time in the quantum setting. However, we require the birthday bound to recover the candidate value for the periodic value in the classical setting. By combining the periodic property of Simon’s algorithm with birthday attack, our chosen ciphertexts key recovery attack can recover the key of a 5-round Feistel-F in O(23n/4) time and O(2n/4) chosen plaintexts and ciphertexts. The memory complexity of the above attack is O(2n/4). Compared with Isobe’s result, our new result not only increases one round, but also requires lower memory complexity. For the Feistel-FK structure, we can construct a 7-round key recovery attack. In addition, we can apply the above approach to construct some key recovery attacks against Misty schemes and the Type-1/2 Generalized Feistel Scheme. In details, this paper not only proposes the key recovery attacks against the 5-round Misty L-F and Misty R-F, but also shows the key recovery attacks against the 6-round Misty L-KF/FK and Misty R-KF/FK respectively. In addition, this paper constructs a d2-round key recovery attack for the d branches Type-1 Generalized Feistel Scheme. Furthermore, when d≥6 and d is even, we propose a better key recovery attack for the d branches Type-2 Generalized Feistel Scheme than the previous work.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006637
    Abstract:
    Code comment generation has been an important research task in the field of software engineering in the past few years. Some existing work has achieved impressive results on the open source datasets that containing a large number of <code snippet, comment> pairs. However, in the practice of software enterprises, the code to be commented is usually belong to a software project. Different from the code snippets in the open source datasets, the code in a software project has different length and granularity, developers need to know not only how to add comment, but also where to add comments, namely commenting decision. In this paper, we propose CoComment, a software project-oriented code comment generation approach. This approach automatically extracts domain-specific concepts from software documents, then propagates and expands these concepts by code parsing and text matching. On this basis, an automatic code commenting decision method is made by locating code lines or segments related to these concepts, and corresponding natural language comments are generated by fusing concepts and context. We conduct comparative experiments on 3 software projects, containing more than 46,000 manually annotated code comments. The experimental results demonstrate our approach makes code commenting decision accurately and generates more helpful comments compared with existing work, which effectively solve the problem of automatic code comment for software project.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006638
    Abstract:
    With the rapid development of Internet of things (IoT), cloud computing et al., portable health clinic (PHC) has been realized and widely used in telemedicine. For the significant advantages of 5G communication, China has actively promoted the construction of intelligent medicine and built a multi-functional telemedicine information service platform. The realization of telemedicine is inseparable from the support of the remote data sharing system. At present, the PHC data sharing system uses the network architecture combining the IoT and cloud computing. However, its privacy and security issues are rarely studied. This paper keeps an eye on security and privacy when sharing data in the PHC system. We realize the secure upload of IoT data, normalization of personalized ciphertext, dynamic multi-user fine-grained access control, efficient decryption operations, and formal security verification. This paper first improves the classical proxy re-encryption and attribute-based encryption algorithms. It proposes an IPRE-TO-FAME combined encryption mechanism suitable for the network architecture with IoT and cloud computing. Addressing the challenge of key updates caused by many distributed IoT terminals, this paper uses the idea of proxy re-encryption (PRE) for reference to realize the key update based on the unilateral transformation without changing the IoT’s key. At the same time, as the setting in this paper is different from the conventional algorithm PRE, the re-encryption entity can be regarded as fully trusted. This paper improves the conventional algorithm PRE and implement an efficient IPRE (improved PRE) algorithm. Thirdly, the classic FAME (fast attribute-based message encryption) mechanism is improved to realize dynamic multi-user fine-grained access control. It is convenient for users to use portable intelligent devices to access data anytime and anywhere. Security proof, theoretical analysis, and experimental results show that the scheme proposed in this paper is secure and practical. It is an effective solution to the problem of PHC secure data sharing.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006615
    Abstract:
    Asynchronous programs use asynchronous non-blocking calls to achieve program concurrency, are widely used in parallel and distributed systems. The complexity of verifying asynchronous programs is very high, no matter safety or liveness . This paper proposes a program model of asynchronous programs, and defines two problems on asynchronous programs:the-equivlence problem and reachability problem. By reducing the 3-CNF-SAT to these two problems, and then reducing them to the reachablity problem of communication-free Petri net, we prove that the two problems are both NP-complete. The case shows that these two problems can solve a series of program verification problems on asynchronous programs.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006616
    Abstract:
    The reliable functioning of safety-critical IT systems depends heavily on the correct execution of program code. Deductive program verification can be performed to provide a high level of correctness guarantees for computer programs. There is a plethora of different programming languages in use, and new languages oriented for high reliability scenarios are still being invented. It can be difficult to devise for each such language a full-fledged logical system supporting the verification of programs, and to prove the soundness and completeness of the logical system with respect to the formal semantics of the language. Language-independent verification techniques offer sound verification procedures parameterized over the formal semantics of programming languages. The specialization of the verification procedure with the formal semantics of a concrete programming language directly gives rise to a verification procedure for the language. In this article, we propose a language-independent verification technique based on big-step operational semantics. The technique features a unified procedure for the sound reasoning about program structures that potentially causes unbounded behavior, such as iteration and recursion. In particular, we employ a functional formalization of big-step semantics to support the explicit representation of the computation performed by the sub-structures of a program. This representation enables the exploitation of the auxiliary information provided for these sub-structures in the unified reasoning process. We prove the soundness and relative completeness of the proposed technique, evaluate the technique using verification examples in imperative and functional programming languages, and mechanize all the formal results and verification examples in the Coq proof assistant. The development provides a basis for the implementation of a language-independent program verifier in a proof assistant based on big-step operational semantics.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006617
    Abstract:
    With the development of the Internet, we usher in the 5th Generation of mobile communication technology (5G). The 5G Authentication and Key Agreement (5G-AKA) protocol is proposed mainly to achieve two-way authentication between users and service networks. However, recent research suggests that it may be subject to information deciphering and message replay attacks. At the same time, we found that some variants of the current 5G-AKA cannot satisfy the unlinkability. Therefore, in response to the above-mentioned shortcomings, we propose an improvement plan called SM-AKA. SM-AKA is designed two parallel sub-protocols in a novel way. Through clever mode switching, lighter sub-protocols (GUTI submodule) are frequently adopted, and the other sub-protocol (SUPI submodule) is to deal with abnormalities caused by authentication. According to this mechanism, it not only realizes the efficient authentication, but also improves the stability of protocol. The freshness of variables has also been effectively maintained, which can prevent the replay of messages, and strict encryption and decryption methods have further improved the security of the protocol. Finally, we carry out a complete evaluation of SM-AKA. Through formal modeling, attack assumptions and Tamarin derivation, we prove that the scheme can achieve the authentication and privacy goals, and the theoretical analysis part also shows the correctness of the protocol design.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006423
    Abstract:
    Compared with witness encryption, offline witness encryption is more extensive in the practical applications because of its high-efficiency by transferring the hard computation work to setup phase. However, most of the current offline witness encryption schemes only satisfy the selective security, that is, the adversary must commit a pair of challenge messages (m0, m1) and an instance x before obtaining the public parameters. Chvojka et al. proposed an offline witness encryption construction that achieves semi-adaptive security by introducing the puncturable encryption. The semi-adaptive security permits the adversary to choose challenge messages adaptively. However, the instance of the considered NP language that is used to create the challenge ciphertext must be fixed before the adversary gets the public parameters (ppe, ppd). Therefore, they leave it as an open problem to construct offline witness encryption schemes with fully adaptive security. This study firstly proposes an offline witness encryption scheme that achieves the fully adaptive security. The setup algorithm outputs public parameters (ppe, ppd), where ppe, the encryption key, contains two public keys, a common reference, and a commitment, and the decryption key ppd is an obfuscated circuit. This algorithm needs to be run only once, and the parameters can be used for arbitrary many encryptions. The encryption algorithm outputs a Naor-Yung’s ciphertext by using key encapsulation mechanism and non-interactive witness indistinguishable proofs system. The problem of outputting the challenge plaintext in advance during the proving process of selective security have solved by selecting the encapsulation key in advance. In addition, the proposed scheme can also be turned into a functional offline witness encryption scheme directly to realize the reuse of the decryption key for the function f by embedding f into the decryption key in the key generation phase.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006594
    Abstract:
    Providing safe, reliable, and efficient decisions is a challenging issue in the field of autonomous driving. At present, with the vigorous development of the autonomous driving industry, various behavioral decision-making methods are proposed. However, the decision of autonomous driving behavior is influenced by uncertainties in the environment, and the decision itself also requires effectiveness and high security, current methods are difficult to completely cover these issues. Therefore, we propose an autonomous driving decision-making approach with RoboSim model based on the Bayesian network. Semantic relationship information in driving scenarios is modeled by domain ontology, and an LSTM model for intention prediction of dynamic entities in scenarios is combined to provide driving scenario information for Bayesian networks. Using the decisions inferred by the Bayesian network, we abstract a specific RoboSim model for autonomous driving behavior decision-making, which is platform-independent, and it can simulate decision-making simulation execution cycle. In addition, the RoboSim model also can be transformed into other formal verification models, and in this paper, we use the model checking tool UPPAAL for verification and analysis to ensure the safety of the decision-making model. Combined with the case of lane change overtaking scenario, the feasibility of Bayesian network and RoboSim model construction method for autonomous driving behavior decision making is illustrated, which lays a foundation for providing a safe and efficient autonomous driving decision-making approach.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006492
    Abstract:
    For a given set of moving objects, Continuous k Nearest Neighbor (CkNN) query q over moving objects is to quickly identify and monitor the k nearest objects as objects and the query point evolve. In real life, many location-based applications in transportation, social network, e-commerce, and other fields involve the basic problem of processing CkNN queries over moving objects. Most of existing work processing CkNN queries usually need to determine a query range containing k nearest neighbors through multiple iterations, while each iteration has to identify the number of objects in the current query range, and which dominates the query cost. In order to address this issue, this work proposes a dual index called GGI that consists of a grid index and a Gaussian mixture function to simulate the varying distribution of objects. The bottom layer of GGI employs a grid index to maintain moving objects, and the upper layer constructs Gaussian mixture model to simulate the distribution of moving objects in two-dimensional space. Based on GGI, an incremental search algorithm called IS-CKNN to process CkNN queries. This algorithm directly determines a query region that at least contains k neighbors of q based on Gaussian mixture model, which greatly reduces the number of iterations. When the objects and query point evolve, an efficient incremental query strategy is further proposed, which can maximize the use of existing query results and reduce the calculation of the current query. Finally, extensive experiments are carried out on one real dataset and two synthetic datasets to confirm the superiority of our proposal.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006614
    Abstract:
    Determinization of a nondeterministic automaton is to construct another deterministic automaton that recognizes the same language as the nondeterministic one, which is one of the fundamental notions in automata theory. Determinization of ω automata serves as a natural basic step in the decision procedures of SnS, CTL*, μ-calculus etc. Meanwhile, it is also the key of solving infinite games. Therefore, it is of great significance to study the determinization of ω automata. We focus on a kind of ω automata called Streett automata. Nondeterministic Streett automata can be transformed into equivalent deterministic Rabin or parity automata. In our previous work, we have obtained the optimal and asymptotically optimal determinziation algorithms respectively. For evaluating the theoretical results of proposed algorithms and showing the procedure of determinization visually, it is necessary to develop a tool to support Streett determinization. In this paper, we first introduce four different Streett determinization constructions, including μ-Safra trees, H-Safra trees, compact Streett Safra trees and LIR-H-Safra trees. By H-Safra trees, which are optimal, and μ-Safra trees, deterministic Rabin transition automata are obtained. In addition, deterministic parity transition automata are constructed via another two structures, where LIR-H-Safra trees are asymptotically optimal. Further, based on the open source software, named Graphical Tool for Omega-Automata and Logics (GOAL), we implement a tool for Streett determinization, named NS2DR & PT. Besides, a benchmark is constructed by randomly generating 100 Streett automata. We have implemented these determinization constructions on the benchmark in NS2DR & PT, which shows that experimental results are consistent with theoretical analyses on state complexity. Moreover, the efficiency of different algorithms is also compared and analyzed.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006595
    Abstract:
    ARM's Armv8.1-M architecture and the Arm Helium technology of the M-Profile vector extension solution have been declared to increase the machine-learning performance of the Arm Cortex-M processor up to 15 times. With the rapid development of the Internet of Things, the correct execution of the microprocessor is important. Since the development of chip simulators or programs on chip is relied on the official reference manual, it is also important to ensure its correctness. This paper introduces the correctness verification of the vectorized machine-learning instructions in the official reference manual of the Armv8.1-M architecture. We automatically extracted the operation pseudo-code of the vectorized machine-learning instructions, and then formalized them in semantics rules. With the executable framework provided by K Framework, the formalized semantics rules can be executed and tested by the benchmarks.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006612
    Abstract:
    The security issues of Trusted Execution Environment (TEE) have always been concerned by the domestic and foreign researchers. Memory tag technology utilized in TEE helps to achieve finer-grained memory isolation and access control mechanisms. Neverthless, prior works often rely on testing or empirical analysis to show their effectiveness, which lacks strong assurance of functional correctness and security properties. This paper proposes a general formal model framework for memory tag based access control, and presents a security analysis method in access control based on model checking. First, a general framework for access control model of TEE based on memory tag are constructed utilizing formal method, and those access control entities are formally defined. The defined rules include access control rules and tag update rules. Then the abstract machines of the framework is incrementally designed and implemented with formal language B. These abstract machines formalize the basic properties through invariant constraints. Next, a TEE implementation called TIMBER-V is used as an application case. The TIMBER-V access control model is constructed by instantiating these abstract machines, and the security properties are formally specified. The functional correctness and security of the instantiated model is verified based on model checking. Finally, this paper simulates the specific attack scenarios and these attacks are successfully detected. The evaluation results show the effectiveness of the security analysis method.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006538
    Abstract:
    Existing malware similarity measurement methods cannot accommodate code obfuscation technology and lack the ability to model the complex relationships between malware. This study proposes a malware similarity measurement method called API relation graph enhanced multiple heterogeneous proxembed (RG-MHPE) based on multiplex heterogeneous graph to solve the above problems. This method first uses the dynamic and static feature of malware to construct the multiplex heterogeneous graph and then proposes an enhanced proximity embedding method based on relational paths to solve the problem that proximity embedding cannot be applied to the similarity measurement of the multiplex heterogeneous graph. In addition, this study extracts knowledge from API documents on the MSDN website, builds an API relation graph, learns the similarity between Windows APIs, and effectively slows down the aging speed of similarity measurement models. Finally, the experimental results show that RG-MHPE has the best performance in similarity measurement performance and model anti-aging ability.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006552
    Abstract:
    With the rapid development of emerging technologies, domain software puts forward the new requirements on development efficiency. Datalog as a declarative programming language with concise syntax and good semantics, can help developers to reason and solve complex problems rapidly. However, when solving the real-world problems, the existing single-machine Datalog engines are often limited by the size of memory capacity and have no scalability. In order to solve the above problems, this paper designs and implements Datalog engine based on out-of-core computation. Methods firstly, a series of out-of-core operators are designed, and then the Datalog program is converted into the C++ program with the operators. Then, the partition strategy based on Hash and the minimum replacement scheduling strategy based on search tree pruning are designed. The corresponding partition files are scheduled and computed, and then the final results are generated. Based on this method, the prototype tool DDL(Disk-Based DataLog Engine) is implemented, and widely used real-world Datalog programs are selected to conduct experiments on both synthetic and real-world datasets. The experimental results show that DDL has good performance and high scalability.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006618
    Abstract:
    Data races are common defects in multi-threaded programs. Traditional data race analysis methods are difficult to achieve both in recall and precision. Their detection reports are difficult to locate the root cause of the defect. Considering that Petri nets have the advantages of accurate behavior description and rich analysis tools in the modeling and analysis of concurrent systems, a new data race detection method based on Petri net unfolding technology is proposed. First, by analyzing a program running trace, a Petri net model of the program is mined. It implies multiple different traces of the program even though it is mined from only one trace, which can reduce the false negative rate of traditional dynamic methods while ensuring the performance. After that, a Petri net unfolding-based detection method of program potential data races is proposed, which has a significant improvement in efficiency compared with static methods. Furthermore, it can clearly show the triggering path of the data race defect. Finally, for the potential data race detected in the previous stage, a scheduling schema is designed to replay the defect based on the CalFuzzer platform, which can eliminate false positives and ensure the authenticity of detection results. The corresponding prototype system is developed, and the effectiveness of the proposed method is verified with open program instances.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006593
    Abstract:
    In recent years, deep reinforcement learning has been widely used in sequential decision making. The approach works well in many applications, especially in those scenarios with high-dimensional input and large state spaces. However, there are some limitations of these deep reinforcement learning methods, such as lack of interpretability, inefficient initial training, cold start, etc. In this paper, we propose a framework combining explicit knowledge reasoning with deep reinforcement learning, to alleviation the above problems. The framework successfully leverages high-level priori knowledge in the deep learning process via explicit knowledge representation, resulting in improvement of the training efficiency and the interpretability. The explicit knowledge is categorized into two kinds, namely, acceleration knowledge and safety knowledge. The former intervenes in the training, especially at the early stage, to speed up the learning process, while the latter keeps the agent from catastrophic actions to keep it safe. Our experiments in several domains with several baselines show that the proposed framework significantly improves the training efficiency and the interpretability, and the improvement is general for different reinforcement learning algorithms and different scenarios.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006558
    Abstract:
    Feature requests are enhancements for existing features or requests for new features proposed by end users on the open forums, which can reflect the users' wishes and represent the users' needs. Feature requests play a vital role in improving user satisfaction and improving product competitiveness. It has become an important source of software requirements. However, feature requests are different from traditional requirements in terms of source, content and forms. Thus, there must be differences in applying feature requests to software development than traditional requirements. At present, there are many researches about feature requests on different topics, e.g., classification, prioritization, quality management, and so on. With the continuous increase of related researches, the necessity of a survey of user feature requests analysis and processing has increased. In this paper, we investigated 121 academic research papers on how to analyze and process feature requests in the software development process. We sort the existing researches from the perspective of applying feature requests to the software development process. We summarized the research topics on feature requests and investigated the research progress. Besides, we mapped the feature requests research topics to traditional requirements engineering processes. We analyze the existing research methods and point out research gaps. Finally, in order to provide guidance for future researches, a perspective of the future work in this research area is discussed.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006592
    Abstract:
    In recent years, artificial intelligence has been rapidly advancing. Artificial intelligence system has penetrated our life and has become an indispensable part of our life. However, artificial intelligence systems require a large amount of data to train models, and data disturbances will affect their results. What's more, with the business form changing, the scale becoming more complex, the trustworthiness of the artificial intelligence systems has been getting more and more attention. Firstly, based on summarizing the trustworthiness attributes proposed by various organizations and scholars, we introduce the nine trustworthiness attributes of artificial intelligence. Next, we present the existing AI systems measurement method for the data, model, and result trustworthiness, and propose a artificial intelligence trustworthy evidence collection method. Then, we discuss the trustworthiness measurement model of AI systems. Combined with existing attributes-based software trustworthiness measurement methods and blockchain technology, we propose an artificial intelligence system trustworthiness measurement framework, including the decomposition of trustworthiness attributes and evidence acquisition method, the federation trustworthiness measurement model, and the blockchain-based artificial intelligence trustworthiness measurement structure. Finally, we analyzed the opportunities and challenges of trustworthiness measurement technology for artificial intelligence systems.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006539
    Abstract:
    Database can provide efficient storage and access for massive data. However, it is nontrivial for non-experts to command database query language like SQL, which is essential for querying databases. Hence, querying databases using natural language (i.e., text-to-SQL) has received extensive attention in recent years. This paper provides a holistic view of text-to-SQL technologies and elaborates on current advancements. It first introduces the background of the research and describes the research problem. Then the paper focuses on the current text-to-SQL technologies, including pipeline-based methods, statistical-learning-based methods, as well as techniques developed for multi-turn text-to-SQL task. The paper goes further to discuss the field of semantic parsing to which text-to-SQL belongs. Afterward, it introduces the benchmarks and evaluation metrics that are widely used in the research field. Moreover, it compares and analyzes the state-of-the-art models from multiple perspectives. Finally, the paper summarizes the potential challenges for text-to-SQL task, and gives some suggestions for future research.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006540
    Abstract:
    In the era of today’s Internet of Things, embedded systems are becoming important components for accessing the cloud, which are used in both secure and privacy-sensitive applications or devices frequently. However, the underlying software (a.k.a. firmware) often suffered from a wide range of security vulnerabilities. The complexity and heterogeneous of the underlying hardware platform, the difference of the hardware and software implementation, the specificity and limited document, together with limited running environment made some of very good dynamic testing tools for desktop systems hard to (even impossible) be adapted to embedded devices/firmware environment directly. In recent years, researchers have made great progress in detecting well-known vulnerabilities in embedded device firmware based on binary code similarity analysis. Focusing on the key technical challenges of binary code similarity analysis, we studied the existing binary code similarity analysis technologies systematically, analyzed and compared the general process, technical characteristics and evaluation criteria of these technologies comprehensively. Then we analyzed and summarized the application of these technologies in the field of the embedded device firmware vulnerability search. At the end, we presented some technical challenges in this field and proposed some open future research directions for the related researchers.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006544
    Abstract:
    Text style transfer is one of the hot issues in the field of natural language processing in recent years. It aims to transfer the specific style or attributes of the text (such as emotion, tense, gender, etc.) through editing or generating while retaining the text content. The purpose of this article is to sort out the existing methods in order to advance this research field. First, the problem of text style transfer is defined and the challenges are given; then, the existing methods are classified and reviewed, focusing on the TST methods based on unsupervised learning and further dividing them into the implicit methods and the explicit methods. The implementation mechanisms, advantages, limitations and performance of each method are also analyzed; Subsequently, the performance of several representative methods on automatic evaluation indicators such as transfer accuracy, text content retention, and perplexity are compared through experiments; finally, the research of text style transfer is concluded and prospected.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006528
    [Abstract] (623) [HTML] (0) [PDF 2.25 M] (1052)
    Abstract:
    Blockchains such as Ethereum serially execute smart contract transactions in a block, which can strictly guarantee the consistency of the blockchain state between nodes after execution, but it has become a serious bottleneck restricting the throughput of these blockchains. Therefore, the use of parallel methods to optimize the execution of smart contract transactions has gradually become the focus of industry and academia. This paper summarizes the research progresses of the parallel execution methods of smart contracts in blockchains, and proposes a research framework. From the perspective of the phases of parallel execution of smart contracts, the framework condenses four parallel execution models of smart contracts, namely the parallel execution model based on static analysis, the parallel execution model based on dynamic analysis, the parallel execution model between nodes and the divide-and-conquer parallel execution model, and describes the typical parallel execution methods under each model. Finally, this paper discusses the factors affecting parallel execution such as the transaction dependency graph and concurrency control strategies, and proposes future research directions.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006522
    [Abstract] (1774) [HTML] (0) [PDF 1.07 M] (2323)
    Abstract:
    Reasoning over knowledge graphs aims to infer new facts based on known ones, so as to make the graphs as complete as possible. In recent years, distributed embedding-based reasoning methods have made great success on this task. However, due to their black-box nature, these methods cannot provide interpretability for a specific prediction. Therefore, there has been a growing interest in how to design user-understandable and user-trustworthy reasoning models. Starting from the basic concept of interpretability, this paper systematically studies the recently developed methods for interpretable reasoning on knowledge graphs. Specifically, it introduces the research progress of ante-hoc and post-hoc interpretable reasoning models. According to the scope of interpretability, ante-hoc interpretable models can be further divided into local-interpretable and global-interpretable models. In post-hoc interpretable reasoning models, this paper reviews representative reasoning methods and introduces two post-hoc interpretation methods in detail. Next, it also summarizes the application of explainable knowledge reasoning in such fields as finance and healthcare. Then, this paper summarizes the current situation in explainable knowledge learning. Finally, the future technological development of interpretable reasoning models is prospected.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006524
    [Abstract] (1455) [HTML] (0) [PDF 1.86 M] (1280)
    Abstract:
    Time-sensitive networking (TSN) is an important research area to update infrastructure of industrial internet of things. Deterministic transmission in TSN is the key technologies, mainly including time-triggered scheduling in control plane, mixed-cricality transmission, and deterministic delay analysis, to support deterministic real-time transmission requirements for industrial control. This paper surveys the related works on deterministic transmission technologies of TSN in recent years and systematically cards and summarzes them. First, this paper presents the different transmission models of different kinds of flows in TSN. Second, based on these models, on the one hand, this paper presents time-triggered scheduling model and its research status and existing challenges on control plane. On the other hand, this paper presents the architecture of TSN switches, the strategies of mixed-criticality transmission and their disadvatanges and the corresponding improvement approaches. Third, this paper models the transmission delay of the whole TSN based on netowork calculus and presents the delay analysis methods, their research status and possible improvement directions. Finally, this paper summarizes the challenges and research prospects of deterministic transmission technologies in TSN.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006377
    Abstract:
    As a typical form of the Serverless architecture, the Function as a Service (FaaS) architecture abstracts the business into fine-grained functions, and provides automatic operation and maintenance functionality such as auto-scaling, which can greatly reduce the operation and maintenance costs. Some of the high concurrent, high available, and high flexible services (such as payment, red packet, etc.) in many online service systems have been migrated to the FaaS platform, but a large number of traditional monolithic applications still find it difficult to take advantage of the FaaS architecture. In order to solve this problem, a dynamic and static analysis based FaaS migration approach for monolithic applications is proposed in this paper. This approach identifies and strips the implementation code and dependencies for the specified monolithic application API by combining dynamic and static analysis, and then completes the code refactoring according to the function template. Aiming at the cold-start problem of functions in high concurrency scenario, this approach uses the master-slave multithreaded Reactor model based on IO multiplexing to optimize the function template and improve the concurrency processing capability of a single function instance. Based on this approach, we implemented Codext, a prototype tool for Java language, and carried out experimental verification on OpenFaaS, an open source Serverless platform, for four open source monolithic applications.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006350
    Abstract:
    With the popularization of digital information technology, the reversible data hiding in encrypted images (RDHEI) has gradually become the research hotspot of privacy protection in cloud storage. As a technology which can embed additional information in encrypted domain, extract the embedded information correctly and recover the original image without loss, RDHEI has been widely paid attention by researchers. To embed sufficient additional information in the encrypted image, a high-capacity RDHEI method using adaptive encoding is proposed in this paper. Firstly, the occurrence frequency of different prediction errors of the original image is calculated and the corresponding adaptive Huffman coding is generated. Then, the original image is encrypted with stream cipher and the encrypted pixels are marked with different Huffman codewords according to the prediction errors. Finally, additional information is embedded in the reserved room of marked pixels by bit substitution. The experimental results show that the proposed algorithm can extract the embedded information correctly and recover the original image losslessly. Compared with similar algorithms, the proposed algorithm makes full use of the characteristics of the image itself and greatly improves the embedding rate of the image. On UCID, BOSSBase, and BOWS-2 datasets, the average embedding rate of the proposed algorithm reaches 3.162 bpp, 3.917 bpp, and 3.775 bpp, which is higher than the state-of-the-art algorithm of 0.263 bpp, 0.292 bpp, and 0.280 bpp, respectively.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006368
    Abstract:
    Video Click-Through Rate (CTR) Prediction is one of the important tasks in the context of video recommendation. According to click-through prediction, recommendation systems can adjust the order of the recommended video sequence to improve the performance of video recommendation. In recent years, with the explosive growth of videos, the problem of video cold start has become more and more serious. Aim for this problem, we propose a novel video click-through prediction model which utilizes both the video content features and context features to improve CTR prediction; we also propose a simulation training of the cold start scenario and neighbor-based new video replacement method to enhance the model's CTR prediction ability for new videos. Our proposed model is able to predict CTR for both old and new videos. The experiments on two real-world video CTR datasets (Track_1_series and Track_2_movies) show the effectiveness of our proposed method. Specifically, our proposed model using both video content and contextual information improves the performance of CTR prediction for old videos, which also outperforms the existing models on both datasets. Additionally, for new videos, a baseline model without considering the cold start problem achieves an AUC score of about 0.57. By contrast, our proposed model gives much better AUC scores of 0.645 and 0.615 on Track_1_series and Track_2_movies, respectively, showing the better robustness to the cold start problem.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006521
    Abstract:
    Recommender system is an information filtering system that helps users filter a large number of invalid information to obtain information or items by estimating their interests and preferences. The mainstream traditional recommendation system mainly uses offline and historical user data to continuously train and optimize offline models, and then recommend items for online users. There are three main problems:the unreliable estimation of user preferences based on sparse and noisy historical data, the ignorance of online contextual factors that affect user behavior, and the unreliable assumption that users are aware of their preferences by default. Since the dialogue system focuses on the user's real-time feedback data and obtains the user's current interaction intentions, "conversational recommendation " combines the interactive form of the dialogue system with the recommendation task, and becomes an effective means to solve the traditional recommendation problem. Through online interactive methods, conversational recommendation can guide and capture users' current preferences and interests, and provide timely feedback and updates. Thanks to the widespread use of voice assistants and chatbot technologies, as well as the mature application of technologies such as reinforcement learning and knowledge graphs in recommendation strategies, in the past few years, more and more researchers have paid attention to conversational recommendation systems. This survey combs the overall framework of the conversational recommendation system, classifies the datasets used in the conversational recommendation algorithm, and discusses the relevant metrics to evaluate the effect of the conversational recommendation. Focusing on the background interaction strategy and recommendation logic in conversational recommendation, this survey summarizes the existing research achievements of the domestic and foreign researchers in recent years. And finally, this survey also summarizes and prospects future works of conversational recommendation.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006500
    Abstract:
    During the software development and maintenance process, bug fixers usually refer to bug reports submitted by end-users or developers/testers to locate and fix a bug. In this sense, the quality of the bug report largely determines whether the bug fixer could quickly and precisely locate the bug and further fix it. Researchers have done much work on characterizing, modeling, and improving the quality of bug reports. This paper offers a systematic survey on existing work on bug report quality, with an attempt to understand the current state of research on this area as well as to open new avenues for future research work. Firstly, we summarized a list of quality problems of bug reports reported by existing studies, such as the missing of key information and errors in information items. Then, we presented a series of work on automatically modeling bug report quality. After that, we introduced those approaches that aim to improve bug report quality. Finally, we discussed the challenges and potential opportunities for research on bug report quality.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006502
    [Abstract] (552) [HTML] (0) [PDF 1.10 M] (1476)
    Abstract:
    Nowadays, the big data processing frameworks such as Hadoop and Spark, have been widely used for data processing and analysis in industry and academia. These big data processing frameworks adopt the distributed architecture, generally developed in object-oriented languages like Java, Scala, etc. These frameworks take Java Virtual Machine (JVM) as the runtime environment on cluster nodes to execute computing tasks, i.e., relying on JVM's automatic memory management mechanism to allocate and reclaim data objects. However, current JVMs are not designed for the big data processing frameworks, leading to many problems such as long garbage collection (GC) time and high cost of data serialization and deserialization. As reported by users and researchers, GC time can take even more than 50% of the overall application execution time in some cases. Therefore, JVM memory management problem has become the performance bottleneck of the big data processing frameworks. This paper makes a systematic review of the recent JVM optimization research work for big data processing frameworks. Our contributions include (1) We summarize the root causes of the performance degradation of big data applications when executed in JVM; (2) We summarize the existing JVM optimization techniques for big data processing frameworks. We also classify these methods into categories, compare and analyze the advantages and disadvantages of each, including the method's optimization effects, application scopes and burdens on users; (3) We finally propose some future JVM optimization directions, which will help the performance improvement of big data processing frameworks.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006503
    Abstract:
    Object-oriented software metrics are important for understanding and guaranting the quality of object-oriented software. By comparing object-oriented software metrics with their thresholds, we can simply and intuitively evaluate whether there is a bug. The methods to deriving metrics thresholds mainly include unsupervised learning methods based on the distribution of metric data and supervised learning methods based on the relationship between the metrics and defect-proneness. The two types of methods have their own advantages and disadvantages:unsupervised methods do not require label information to derive thresholds and are easy to implement, but the resulting thresholds often have a low performance in defect prediction; supervised methods improve the defect prediction performance by machine learning algorithms, but they need label information to derive the thresholds, which is not easy to obtain, and the linking technology between metrics and defect-proneness is complex. In recent years, researchers of the two types of methods have continued to explore and made a great progress. At the same time, it is still challenging to derive the thresholds of object-oriented software metrics. This paper offers a systematic survey of recent research achievements in deriving metric thresholds. First, we introduce the research problem in object-oriented software metric threshold derivation. Then, we describe the current main research work in detail from two aspects:unsupervised and supervised learning methods. After that, we discuss related techniques. Finally, we summarize the opportunities and challenges in this field and outline the reaearch directions in the future.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006510
    Abstract:
    With the increasing scale and complexity of computer networks, it is difficult for network administrators to ensure that the network intent has been correctly realized, and the incorrect network configuration will affect the security and availability of the network. Inspired by the successful application of formal methods in the field of hardware verification and software verification, researchers applied formal methods to networks, forming a new research field, namely Network Verification, which aims to use rigorous mathematical methods to prove the correctness of the network. Network Verification has become a hot research topic in the field of network and security, and its research results have been successfully applied in actual networks. From the three research directions of data plane verification, control plane verification, and stateful network verification, this paper systematically summarizes the existing research results in the field of network verification, and analyzes the research hotspots and related solutions, aiming to organize the field of network verification and provides systematic references and future work prospects for researchers in the field.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006513
    Abstract:
    Anonymous networks aim to protect the user's communication privacy in open network environment. Since Chaum proposed Mix-net, related work has been progressing in decades. Nowadays, based on Mix-net, DC-net or PIR, many anonymous networks have been developed, for various application scenarios and threat models by integrating multiple design elements. Beginning from anonymity concepts, this paper introduces the overall development of anonymous network area. Representative works and their design choices are classified and articulated. This paper systematically analyzes the characteristics of anonymous networks from the aspects of anonymity, latency and bandwidth overhead, etc.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006515
    Abstract:
    This study proposes a feature extraction algorithm based on the principal component analysis (PCA) of the anisotropic Gaussian kernel penalty which is different from the traditional kernel PCA algorithms. In the non-linear data dimensionality reduction, the nondimensionalization of raw data is ignored by the traditional kernel PCA algorithms. Meanwhile, the previous kernel function is mainly controlled by one identical kernel width parameter in each dimension, which cannot reflect the significance of different features in each dimension precisely, resulting in the low accuracy of dimensionality reduction process. To address the above issues, contraposing the current problem of nondimensionalization of raw data, an averaging algorithm is proposed in this study, which has shown sound performance in improving the variance contribution rate of the original data typically. Then, anisotropic Gaussian kernel function is introduced owing each dimension has different kernel width parameters which can critically reflect the importance of the dimension data features. In addition, the feature penalty function of kernel PCA is formulated based on the anisotropic Gaussian kernel function to represent the raw data with fewer features and reflect the importance of each principal component information. Furthermore, the gradient descent method is introduced to update the kernel width of feature penalty function and control the iterative process of the feature extraction algorithm. To verify the effectiveness of the proposed algorithm, several algorithms are compared on UCI public data sets and KDDCUP99 data sets, respectively. The experimental results show that the feature extraction algorithm of the PCA based on the anisotropic Gaussian kernel penalty is 4.49% higher on average than the previous PCA algorithms on UCI public data sets. The feature extraction algorithm of the PCA based on the anisotropic Gaussian kernel penalty is 8% higher on average than the previous PCA algorithms on KDDCUP99 data sets.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006518
    [Abstract] (827) [HTML] (0) [PDF 1.43 M] (1068)
    Abstract:
    Recently, with the rapid development of information technology, emerging technologies represented by artificial intelligence are widly applied in education, triggering profound changes in the concept and mode of learning. And, online learning transcends the limitations of time and space, providing more possibilities for learners to learn "anytime and anywhere". However, the separation of time and space of teachers and students in online learning makes teachers could not handle students' learning process, limits the quality of teaching and learning. Diversified learning targets and massive learning resources generate some new problems, i.e., how to quickly accomplish learning targets, reduce learning costs and reasonably allocate learning resources. And these problems have become the limitations of the development of individuals and the society. However, traditional "one size fitsall" educational model can no longer fit human's nedds, thus, we need one more effieient and scientific personalized education model to help learners maximize their learning targets with minimal learning costs. Based on these considerations, what we need is to new adaptive learning system which could automatically and efficiently identify learner personalized characteristics, efficiently organize and allocate learning resources, and plan a global personalized learning path. In this paper, we systematically review and analyze the current researches on personalized learning path recommendation, and we analyze different research sight from multidisciplinary perspective. Then, we summarize the most applied algorithm in current research. Finally, we highlight the main shortcomings of the current rearch, which we should pay more attention to.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006499
    Abstract:
    Accurately predicting the status of 1p/19q is of great significance for formulating treatment plans and evaluating the prognosis of gliomas. Although there are some works which can predict the status of 1p/19q accurately based on magnetic resonance images and machine learning methods, they require to delineate the tumor contour preliminarily, which cannot satisfy the needs of computer-aided diagnosis. To deal with this issue, this work proposes a novel deep multi-scale invariant features-based network (DMIF-Net) for predictions 1p/19q status in glioma. Firstly, it uses the wavelet-scattering network to extract multi-scale and multi-orientation invariant features, and deep split and aggregation network to extract semantic features. Then, it reduces the feature dimensions using a multi-scale pooling module and fuses these features with concatenation. Finally, with inputing the bounding box of the tumor region it can predict the 1p/19q status accurately. The experimental results illustrate that, without requiring to delineate the tumor region accurately, the AUC predicted by DMIF-Net can reach 0.92 (95%CI=[0.91,0.94]). Compared with the best deep learning model, the AUC, sensitivity, and specificity increased by 4.1%, 4.6%, and 3.4%, respectively. Compared with the state-of-the-art models on glioma, AUC and accuracy have increased by 4.9% and 5.5%, respectively. Moreover, the ablation experiments demonstrate that the proposed multi-scale invariant feature extraction module can promote effectively the 1p/19q prediction performance, which verify that combining the semantic and multi-scale invariant features can significantly increase the prediction accuracy for 1p/19q status without knowing the boundaries of tumor region, providing therefore an auxiliary means for formulating personalized treatment plan for low-grade glioma.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006488
    [Abstract] (614) [HTML] (0) [PDF 1.58 M] (1042)
    Abstract:
    In recent years, with the continuous development of computer vision, semantic segmentation and shape completion of 3D scene have been paid more and more attention by academia and industry. Among them, semantic scene completion is an emerging research in this field, which aims to to simultaneously predict the spatial layout and semantic labels of a 3D scene, and has developed rapidly in recent years. In this paper, we classify and summarize the methods based on RGB-D images proposed in this field in few years. These methods are divided into two categories based on whether deep learning is used or not, which include traditional methods and deep learning-based methods. Among them, the methods based on deep learning are divided into two categories according to the input data type, which are the methods based on single depth image and the methods based on RGB-D images. Based on the classification and overview of the existing methods, we collate the relevant datasets used for semantic scene completion task and analyze the experimental results. Finally, we summarize the challenges and development prospects of this field.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006485
    [Abstract] (2656) [HTML] (0) [PDF 841.81 K] (2916)
    Abstract:
    Reinforcement learning is a technique that discovers optimal strategies in a trial and error way, and has become a general method for solving environmental interaction problems. However, as a machine learning method, reinforcement learning faces an unexplainable problem in machine learning. The unexplainable problem limits applications of reinforcement learning in safety-sensitive fields, e.g., medical, military, transportation, etc., and leads to the lack of universally applicable solutions in environmental simulation and task generalization. Though a lot of works devoted to overcoming this weakness, the academic community still lacks a consistent understanding of explainable reinforcement learning. In this paper, we explore the basic problems of reinforcement learning and review existing works. To begin with, we explore the parent problem, i.e., explainable artificial intelligence, and summarizes its existing definitions. Next, we construct an interpretability theoretical system to describe the common problems of explainable reinforcement learning and explainable artificial intelligence, which discussing intelligent algorithms and mechanical algorithms, interpretation, factors that affect interpretability, and the intuitiveness of the explanation. Then, three unique problems of explainable reinforcement learning, i.e., environmental interpretation, task interpretation, and strategy interpretation, are defined based on the characteristics of reinforcement learning. After that, the latest researches on explainable reinforcement learning are reviewed, and the existing methods were systematically classified. Finally, we discuss the research directions in the future.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006343
    Abstract:
    Since ordinary city road map has not covered the road restrictions information for the lorry, and lacks of hot spots labeling, they cannot satisfy massive batches and long-distance road transportation requirements of bulk commodity transporting. In order to address the issues of frequent transportation accidents and low logistics efficiency, and further improve the truck drivers’ travel experience, it is urgent to combine the type of goods transported with the type of truck as well as the driver’s route selection preference to study the building method of customized logistics map for bulk commodity transporting. With the widespread applications of mobile Internet and Internet of vehicles, spatio-temporal data generated by bulk commodity transporting is growing rapidly. It constitutes logistics big data with other logistics operational data, which provides a solid data foundation for logistics map building. This study first comprehensively reviews the state-of-the-art work about the issue of map building using trajectory data. Then, to tackle the limitations of existing digital map building methods in the field of bulk commodity transporting, a data-driven logistics map building framework is put forward using multi-source logistics data. The following researches are focused on: (1) multi-constraint logistics map construction based on users' prior knowledge; (2) dynamic spatio-temporal data driven logistics map incremental updating. Logistics map will become AI infrastructure for new generation of logistics technology fit for bulk commodity transportation. The research results of this study provide rich practical contents for the technical innovation of logistics map building, and offer new solutions to promote the cost reduction and efficiency improvement of logistics, which have important theoretical significance and application values.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006435
    Abstract:
    Anycast uses BGP to achieve the best path selection by assigning the same IP address to multiple terminal nodes.In recent years, as anycast technology has become more and more common, it has been widely used in DNS and CDN services. This studyfisrtlyintroduces anycast technology in an all-round wayand then discusses current problems of anycast technology and summarizes these problems into three categories: anycast inference is imperfect, anycast performance cannot be guaranteed, and it is difficult to control anycast load balancing. In response to these problems, the latest research progress is described. Finally, the problems in solving anycast problems and the direction of improvementare summarizedtoprovide useful references for researchers in related fields.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006437
    Abstract:
    As a distributed storage solution with high performance and high scalability, key-value storage systems have been widely adopted in recent years, such as Redis, MongoDB, Cassandra, etc. On the one hand,the multi-replication mechanism widely used in distributed storage system improves system throughput and reliability, but also increases the extra overhead of system coordination and replicationconsistency. For the cross-region distributed system, the long-distance replication coordination overhead may even become the performance bottleneck of the system, reducing system availability and throughput. The distributed key-value storage system called Elsa, proposed in this study, is a coordination-free multi-master key-value storage system that is designed for cross-region architecture. On the basis of ensuring high performance and high scalability, Elsa adopts the conflict-free replicated data types (CRDT) technology to ensure strong eventual consistency between replications without coordination, reducing the coordination overhead between system nodes. In this study, across-region distributed environment spanning 4 data centers and 8 nodes on aliyun platform is set up and a large-scale distributed performance comparison experiment is carried out.The experimental results show that under the cross-region distributed environment, the throughput of Elsa has obvious advantages for high concurrent contention loads, reaching up to 7.37 times of the MongoDB cluster and 1.62 times of the Cassandra cluster.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006431
    Abstract:
    Code smells are low-quality code snippets that are in urgent need of refactoring. Code smell is a research hotspot in software engineering, with many related research topics, large time span, and rich research results. To sort out the relevant research approach and results, analyze the research hotspots, and predict the future research directions, this study systematically analyzes and classifies 339 papers related to code smell published from 1990 to June 2020. The development trend of code smells is analyzed and counted, the mainstream and hot spots of related research are quantitatively revealed, the key code smells concerned by the academia are identified, and also the differences of concerns between industry and academia are studied.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006425
    Abstract:
    In the process of software testing, the expected output of the program under test is an important factor in judging whether the program is defective. Metamorphic testing technique uses the properties of the program under test to check the output of the program, so as to effectively solve the problem of being difficult to construct the expected output of the program. In recent years, metamorphic testing has blossomed in the field of software testing. Many researchers have optimized techniques related to metamorphic testing and applied them to various fields to effectively improve software quality. This study summarizes and analyzes the research work of metamorphic testing from the following three aspects:theoretical knowledge, improvement strategies and application areas, especially the research results of the past five years. Meanwhile, the possible research is discussed when metamorphic testing is applied for parallel programs. First, the basic concepts of metamorphic testing and the metamorphic testing process are provided; next, according to its steps, the optimization techniques for metamorphic testing are summarized from the four perspectives:metamorphic relationships, test case generation, test execution, and metamorphic testing tools; then, the application fields of metamorphic testing are listed; finally, based on the existing research results, the problems faced by metamorphic testing are discussed in parallel program testing, and the possible solutions are provided for further research.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006420
    [Abstract] (630) [HTML] (0) [PDF 1.80 M] (1516)
    Abstract:
    Emotion is the external expression of affect, which has an influence on cognition, perception, and decision-making of peopole’s daily life. As one of the basic problems in the realization of overall computer intelligence, emotion recognition has been studied in depth and widely applied in fields of affective computing and human-computer interaction. Comparing withfacial expression, speech and other physiological signals, using EEG to recognize emotion is attracting more attention for its higher temporal resolution, lower cost, better identification accuracy, and higher reliability. In recent years, more deep learning architectures are applied and have achieved better performance than traditional machine learning methods in this task. Deep learning for EEG-based emotion recognition is one of the research focuses and it remains many challenges to overcome. Considering that there exist few reviews literature to refer to, this study investigates the implementation of deep learning in EEG-based emotion recognition. Specifically, input formulation, deep learning architecture, experimental setting and results are surveyed. Besides, articles that evaluated their model on the widely used datasets, DEAP and SEED, perform qualitative and quantitative analysis are carefully screenedfrom different aspects and a comparisonis accomplished. Finally, the total work is summarizedand the prospect of future work isgiven.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006402
    [Abstract] (771) [HTML] (0) [PDF 6.49 M] (1514)
    Abstract:
    Blockchain is a distributed ledger constructed by a series of network nodes. It owns the following security attributes: unforgeability, decentralization, trustless, provable security based on cryptography and non-repudiation. This paper summarizes those security services, including data confidentiality, data integrity, authentication, data privacy, assured data erasure. This paper first introduces the concept of blockchain and public key cryptography. For the above-mentioned 5 security services, existing security threats faced by users in actual scenarios and their corresponding solutions are analyzed. The drawbacks of those traditional implementations are also discussed, and then countermeasures are introduced based on blockchain. Finally, values and challenges associated with blockchain are discussed as well.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006358
    Abstract:
    In recent years, deep learning has shown excellent performance in image steganalysis. At present, most of the image steganalysis models based on deep learning are special steganalysis models, which are only applied to a specific steganography. To detect the stego images of other steganographic algorithms using the special steganalysis model, a large number of stego images encoded by the steganographic algorithms are regarded as datasets to retrain the model. However, in practical steganalysis tasks, it is difficult to obtain a large number of encoded stego images, and it is a great challenge to train the universal steganalysis model with very few stego images samples. Inspired by the research results in the field of few-shot learning, we propose a universal steganalysis method based on transductive propagation network. First, the feature extraction network is improved based on the existing few-shot learning classification framework, and the multi-scale feature fusion network is designed, so that the few-shot classification model can extract more steganalysis features for the classification task based on weak information such as secret noise residue. Second, to solve the problem that steganalysis model based on few-shot learning is difficult to converge, the initial model with prior knowledge is obtained by pre-training. Then, the steganalysis models based on few-shot learning in frequency domain and spatial domain are trained respectively. The results of self-test and cross-test show that the average detection accuracy is above 80%. Furthermore, the steganalysis models based on few-shot learning in frequency domain and spatial domain are retrained by means of dataset enhancement, so that the detection accuracy of the steganalysis models based on few-shot learning is improved to more than 87% compared with the previous steganalysis model based on few-shot learning. Finally, the proposed steganalysis model based on few-shot learning is compared with the existing steganalysis models in frequency domain and spatial domain, the result shows that the detection accuracy of the universal steganalysis model based on few-shot learning is slightly below those of SRNet and ZhuNet in spatial domain and is beyond that of existing best steganalysis model in frequency domain under the experimental setup of few-shot learning. The experimental results show that the proposed method based on few-shot learning is efficient and robust for the detection of unknown steganographic algorithms.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006351
    Abstract:
    How to detect sudden events in data streams on social media is a popular research topic in natural language processing. However, current methods for extracting emergencies have problems of low accuracy and low efficiency. In order to solve these problems, this paper proposes an emergency detection method based on the characteristics of word correlation, which can quickly detect emergency events from the social network data stream, so that relevant decision makers can take timely and effective measures to deal with, making the negative impact of emergencies can be reduced as much as possible to maintain social stability. First of all, through noise filtering and emotion filtering, we get microblog texts full of negative emotions. Then, based on the time information, time slice the Weibo data to calculate the word frequency characteristics, user influence and word frequency growth rate characteristics of each word of the data in each time window, and use the burst calculation method to extract the burst word. According to the word2vec model, similar words are merged, and the characteristic similarity of the burst words is used to form a burst word relationship graph. Finally, the multi-attribute spectral clustering algorithm is used to optimally divide the word relationship graph, and pay attention to abnormal words when the time window slides, and to judge the sudden events through the structural changes caused by the sudden changes of the words in the sub-graph. It is known from the experimental results that the emergency event detection method has a better event detection effect in the real-time blog post data stream. Compared with the existing methods, the emergency detection method proposed in this paper can meet the needs of emergency detection. Not only can it detect the detailed information of sub-events, but also the relevant information of events can be accurately detected.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006321
    Abstract:
    The regional network border describes the topological border nodes in cyberspace among countries and regions in the real world. By combining active and passive measurement techniques, this paper proposes a dual-stage method of discovering regional network border nodes——RNB. The first stage is to discover the regional network border's candidate sets by using directed topology measurement and multi-source geolocation; the second stage is to accurately identify border nodes from the candidate sets by using multi-source information weighted geolocation and dual PING geolocation. The experiment took mainland China as the target region and discovered 1,644 border nodes. Compared with the CAIDA data set, our results have 37% of exclusively discovered border nodes with only 2.5% of the measurement cost. The accuracy rate under manual verification is 99.3%, and that under the verification of an ISP operator is 75%.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006421
    [Abstract] (892) [HTML] (0) [PDF 3.42 M] (1424)
    Abstract:
    In order to ensure the network-wide consensus and tamper proof of the transaction ledger, the miner nodes are required to possess strong computing and storage resource in the traditional blockchain technology. It greatly limits the resource-constrained devices to join in the blockchain systems. In recent years, blockchain technology has been expanded in many fields, such as financial economy, health care, Internet of Things, supply chain, etc. However, there is a large number of devices with weak computing power and low storage capacity in these application scenarios, which brings great challenges to the application of blockchain. Therefore, lightweight blockchain technology is emerging. This study summarizes some related works of lightweight blockchain from the two aspects of lightweight computing and storage. Their advantages and disadvantages are compared and analyzed. Finally, the future development of the lightweight blockchain systems is prospected.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006429
    [Abstract] (2401) [HTML] (0) [PDF 3.26 M] (3019)
    Abstract:
    Knowledge graph (KG) is a kind of technology that uses graph model to describe the relationship between knowledge and modeling things. Knowledge Graph Embedding (KGE), as a widely adopted knowledge representation method, its main idea is to embed entities and relationships in a knowledge graph into a continuous vector space, which is used to simplify operations while preserving the intrinsic structure of the KG. It can benefit a variety of downstream tasks, such as KG completion, relation extraction, etc. Firstly, the existing knowledge graph embedding technologies are comprehensively reviewed, including not only techniques using the facts observed in KG for embedding, but also dynamic KG embedding methods that add time dimensions, as well as KG embedding technologies that integrate multi-source information. The relevant models are analyzed, compared and summarized from the perspectives of entity embedding, relation embedding and scoring functions. Then, typical applications of KG embedding technologies in downstream tasks are briefly introduced, including question answering systems, recommendation systems and relationship extraction. Finally, the challenges of knowledge graph embedding are expounded, and the future research directions are prospected.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006501
    Abstract:
    In order to protect the security of the execution environment of security-sensitive programs in computing devices, researchers have proposed the TEE technology, which provides a secure execution environment for security-sensitive programs that is isolated from the rich computing environment by isolating hardware and software. Side-channel attacks have evolved from traditionally requiring expensive equipment to now inferring confidential information using its access mode obtained basing only on microarchitecture states through software. The TEE architecture only provides an isolation mechanism and cannot resist this type of emerging software side-channel attacks. This paper thoroughly investigates the software side-channel attacks and corresponding countermeasures of the three TEE architectures of ARM TrustZone, Intel SGX and AMD SEV, and discusses the development trend of their attacks and defense mechanisms. First, we introduce the basic principles of ARM TrustZone, Intel SGX and AMD SEV, and elaborate on the definition and classification of software cache side-channel attacks, as well as the practical side-channel attack methods and steps. Second, from the perspective of processor instruction execution, we propose a TEE attack surface classification method, use this method to classify TEE software side-channel attacks, and explain the attacks combining software side-channel attacks and other attacks. Third, we discuss the threat model of TEE software side-channel attacks in detail. Finally, we comprehensively summarize the industry's countermeasures against TEE software side-channel attacks, and discuss some future research trends of TEE software side-channel attacks from two aspects of attack and defense.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006415
    Abstract:
    Deep learning has made great achievements in various fields such as computer vision, natural language processing, speech recognition, and other fields. Compared with traditional machine learning algorithms, deep models have higher accuracy on many tasks. Because deep learning is an end-to-end, highly non-linear, and complex model, the interpretability of deep models is not as good as traditional machine learning algorithms, which brings certain obstacles to the application of deep learning in real life. It is of great significance and necessary to study the interpretability of depth model, and in recent years many scholars have proposed different algorithms on this issue. For image classification tasks, this study divides the interpretability algorithms into global interpretability and local interpretability algorithms. From the perspective of interpretation granularity, global interpretability algorithms are further divided into model-level and neuron-level interpretability algorithms, and local interpretability algorithms are divided into pixel-level features, concept-level features, and image-level feature interpretability algorithms. Based on the above framework, this studymainly summarizes the common deep model interpretability research algorithms and related evaluation indicators, and discusses the current challenges and future research directions for deep model interpretability research. It isbelieved that conducting research on the interpretability and theoretical foundation of deep model is a necessary way to open the black box of the deep model, and interpretability algorithms have huge potential to provide help for solving other problems of deep models, such as fairness and generalization.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006384
    Abstract:
    In the era of big data, there are more and more application analysis scenarios driven by large-scale data. How to quickly and efficiently extract the information for analysis and decision-making from these massive data brings great challenges to the database system, At the same time, the real-time performance of analysis data in modern business analysis and decision-making requires that the database system can process ACID transactions and complex analysis queries. However, the traditional data partition granularity is too coarse, and can not adapt to the dynamic changes of complex analysis load; the traditional data layout is single, and can not cope with the modern increasing mixed transaction analysis application scenarios In order to solve the above problems, "intelligent data partition and layout" has become one of the current research hotspots. It extracts the effective characteristics of workload through data mining, machine learning and other technologies, and design appropriate partition strategy to avoid scanning a large number of irrelevant data and guide the layout structure design to adapt to different types of workloads. This paper first introduces the background knowledge of data partition and layout techniques, and then elaborates the research motivation, development trend and key technologies of intelligent data partition and layout. Finally, the research prospect of intelligent data partition and layout is summarized and prospected.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006385
    Abstract:
    Spoken language understanding is one of the hot research topics in the field of natural language processing. It is applied in many fields such as personal assistants, intelligent customer service, human-computer dialogue, and medical treatment. Spoken language understanding technology refers to the conversion of natural language input by the user into semantics representation, which mainly includes 2 sub-tasks of intent recognition and slot filling. At this stage, the deep modeling of joint recognition methods for intent recognition and slot filling tasks in spoken language understanding has become mainstream and has achieved good results. Summarizing and analyzing the joint modeling algorithm of deep learning for spoken language learning is of great significance. First, it introduces the related work to the application of deep learning technology to spoken language understanding, and then the existing research work is analyzed from the relationship between intention recognition and slot filling. The experimental results of different models are compared and summarized. Finally, the challenges that future research may face are prospected.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006390
    Abstract:
    Human pose estimation is a basic and challenging task in the field of computer vision. It is the basis for many of computer vision tasks, such as action recognition and action detection. With the development of deep learning methods, deep learning-based human pose estimation algorithms have shown excellent results. In this paper, we divide pose estimation methods into three categories, including single person pose estimation, top-down multi-person pose estimation and bottom-up multi-person pose estimation. We introduce the development of 2D human pose estimation algorithms in recent years, and discuss the current challenges of two-dimensional human pose estimation. Finally, we give an outlook for the future development of human pose estimation.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006391
    Abstract:
    Deep reinforcement learning combines the representation ability of deep learning with the decision-making ability of reinforcement learning, which has aroused great research interest due to its remarkable effect in complex control tasks. This paper classifies the model-free deep reinforcement learning methods into Q-value function methods and policy gradient methods by considering whether the Bellman equation is used, and introduce the two kinds of methods from the aspects of model structure, optimization process and evaluation respectively. Toward the low sample efficiency problem in deep reinforcement learning, this paper illustrates that the overestimation problem in Q-value function methods and the unbiased sampling constraint in policy gradient methods are the main factors that affect the sample efficiency according to model structure. Then, from the perspectives of enhancing the exploration efficiency and improving the sample exploitation rate, this paper summarizes various feasible optimization methods according to the recent research hotspots and trends, analyzes advantages together with existing problems of related methods, and compares them according to the scope of application and optimization effect. Finally, this paper proposes to enhance the generality of optimization methods, explore migration of optimization mechanisms between the two kinds of methods and improve theoretical completeness as future research directions.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006395
    Abstract:
    How to utilize multi-source and heterogeneous spatio-temporal data to achieve accurate trajectory prediction as well as reflect the movement characteristics of moving objects is a core issue in the research field of trajectory prediction. Most of the existing trajectory prediction models are used to predict long sequential trajectory patterns according to the characteristics of historical trajectories, or the current locations of moving objects are integrated into spatio-temporal semantic scenarios to predict trajectories based on historical trajectories of moving objects. This survey summarizes the currently commonly-used trajectory prediction models and algorithms, involving different research fields. Firstly, the state-of-the-art works of multiple-motion trajectory prediction and the basic models of trajectory prediction are described. Secondly, the prediction models of different categories are summarized, including mathematical statistics, machine learning, filtering algorithm, as well as the representative methods in these research fields. Thirdly, the context awareness techniques are introduced, the definition of context awareness by different scholars from different research fields are described, the key technical points of context awareness techniques are presented, such as the different kinds of models on context awareness computing, context acquisition and context reasoning, and the different categories, filtering, storage and fusion of context awareness and their implementation methods are analyzed. The technical roadmap of multiple-motion-pattern trajectory prediction of moving objects with context awareness and the working mechanism of each task is introduced in detail. This survey presents the real-world application scenarios of context awareness techniques, for example, location recommendation, point of interest recommendation. By comparing them with traditional algorithms, the advantages and disadvantages of context awareness techniques in the aforementioned applications are discussed. The new methods for pedestrian trajectory prediction based on context awareness and long short-term memory (LSTM) techniques are introduced in detail. Lastly, the current problems and future trends of trajectory prediction and context awareness are summarized.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006409
    [Abstract] (1094) [HTML] (0) [PDF 7.50 M] (1045)
    Abstract:
    With the rapid development of neural network and other technologies, artificial intelligence has been widely applied in safety-critical or mission-critical systems, such as autopilot systems, disease diagnosis systems, and malware detection systems. Due to the lack of a comprehensive and in-depth understanding of artificial intelligence software systems, some errors with serious consequences occur frequently. The functional attributes and non-functional attributes of artificial intelligence software systems are proposed to enhance the adequate understanding and quality assurance of artificial intelligence software systems. After investigation, a large number of researchers are devoted to the study of functional attributes, but people are paying more and more attention to the non-functional attributes of artificial intelligence software systems. This paper investigates 138 papers in related fields, systematically combs the existing research results from the aspects of attribute necessity, attribute definition, attribute examples, and common quality assurance methods, and summarizes the research work on non-functional attributes of artificial intelligence software systems. At the same time, a summary and relationship analysis are presented on the non-functional attributes of artificial intelligence software systems. The open source tools that can be used in the research of artificial intelligence software system are surveyed. Finally, the thoughts on potential future research directions and challenges are summarized on non-functional attributes of artificial intelligence software systems, which, hopefully, will provide references for researchers interested in the related directions.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006410
    Abstract:
    Cyber-physical system (CPS) plays an increasingly important role in social life. The on-demand choreography of CPS resources is based on the software defining of CPS resources. The definition of software interfaces depends on the full description for the capabilities of CPS resources. At present, in the CPS field, there is a lack of a knowledge base that can describe resources and their capabilities, and a lack of an effective way to construct the knowledge base. For the text description of CPS resources, this study proposes to construct the CPS resource capability knowledge graph and designs a bottom-up automatic construction method. Given CPS resources, this method first extracts textual descriptions of the resources’ capabilities from code and texts, and generates a normalized expression of capability phrases based on a predefined representation pattern. Then, capability phrases are divided, aggregated and abstracted based on the key components of the verb-object structure to generate the hierarchical abstract description of capabilities for different categories of resources. Finally, the CPS knowledge graph is constructed. Based on the Home Assistant platform, this study constructs a knowledge graph containing 32 resource categories and 957 resource capabilities. In the construction experiment, the results of manual construction and automatic construction using the proposed method are compared and analyzed from different dimensions. Experimental results show that this study provides a feasible method for automatic construction of CPS Resource Capability Knowledge Graph. This method helps to reduce the workload of artificial construction, supplement the description of resource services and capabilities in the CPS field and improves the knowledge completeness.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006411
    [Abstract] (1139) [HTML] (0) [PDF 5.52 M] (1578)
    Abstract:
    With the vigorous development of areas such as big data and cloud computing, it has become a worldwide trend for the public to attach importance to data security and privacy. Different groups are reluctant to share data in order to protect their own interests and privacy, which leads to data silos. Federated learning enables multiple parties to build a common, robust model without exchanging their data samples, thus addressing critical issues such as data fragmentation and data isolation. However, more and more studies have shown that the federated learning algorithm first proposed by Google can not resist sophisticated privacy attacks. Therefore, how to strengthen privacy protection and protect users’ data privacy in the federated learning scenario is an important issue. This paper offers a systematic survey of existing research achievements of privacy attacks and protection in federated learning in recent years. First, the definition, characteristics and classification of federated learning are introduced. Then the adversarial model of privacy threats in federated learning is analyzed, and typical works of privacy attacks are classified with respect to the adversary’s objectives. Next, several mainstream privacy-preserving technologies are introduced and their advantages and disadvantages in practical applications are pointed out. Futhermore, the existing achievements on protection against privacy attacks are summarized and six privacy-preserving schemes are elaborated. Finally, future challenges of privacy preserving in federated learning are concluded and promising future research directions are discussed.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006407
    Abstract:
    Separation logic is an extension of the classical Hoare logic for reasoning about pointers and dynamic data structures, and has been extensively used in the formal analysis and verification of fundamental software, including operating system kernels. Automated constraint solving is one of the key means to automate the separation-logic based verification of these programs. The verification of programs manipulating dynamic data structures usually involves both the shape properties, e.g., singly or doubly linked lists and trees, and data constraints, e.g., sortedness and the invariance of data sets/multisets. This paper introduces COMPSPEN, a separation logic solver capable of simultaneously reasoning about the shape properties and data constraints of linear dynamic data structures. First, the theoretical foundations of COMPSPEN are introduced, including the definition of separation logic fragment SLIDdata as well as the decision procedures of the satisfiability and entailment problems of SLIDdata. Then, the implementation and the architecture of the COMPSPEN tool are presented. At last, the experimental results for COMPSEN are reported. 600 test cases are collected and the performance of COMPSPEN is compared with the state-of-the-art separation logic solvers, including ASTERIX, S2S, Songbird, and SPEN. The experimental results show that COMPSPEN is the only tool capable of solving separation logic formulae involving set data constraints, and in overall, it is able to efficiently solve the satisfiability problem of separation logic formulas involving both shape properties and linear arithmetic data constraints on linear dynamic data structures, and is also capable of solving the entailment problem.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006380
    Abstract:
    Being one of the most deployed Payment Channel Networks (PCN), the Lightning Network (LN) has attracted much attention since it was proposed in 2016. The Lightning Network is a Layer-2 technology addressing the scalability problem of Bitcoin. In LN, participants only need to submit Layer-1 transactions on the blockchain to open and close the payment channel, and they can issue multiple transactions off-chain. This working mechanism avoids the waste of time on waiting for every transaction to be verified and simultaneously saves transaction fees. However, as the time of Lightning Network put in practice is rather short, previous works were based on small volume and rapidly-changing data, which lacks necessary time-effectiveness. To fill in the gap and get a comprehensive understanding of the topology of Lightning Network and its evolving trend, in this paper, we characterize both static and dynamic features of LN by leveraging graph analysis based on data of high time-effectiveness updated to July, 2020. We do a clustering analysis of the nodes, and presents some conclusions and insights derived of the clustering results. Moreover, we conduct an additional study of the charging mechanism in LN by comparing the on-chain and off-chain transaction fees.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006330
    Abstract:
    VM consolidation for cloud data centers is one of the hottest research topics in cloud computing. It is challenging to minimize the energy consumption while ensuring QoS of the hosts in cloud data centers, which is essentially a NP-hard multi-objective optimization problem. In this paper, we propose an Energy Efficient Hybrid Swarm Intelligence VM Consolidation Method (HSI-VMC) for heterogeneous cloud environments to address this problem, which including Peak Efficiency Based Static Threshold Overloaded Hosts Detection Strategy (PEBST), Migration Ratio Based Reallocate-VM Selection Strategy (MRB), Target Host Selection Strategy, Hybrid Discrete Heuristic Differential Evolutionary Particle Swarm Optimization VM Placement Algorithm (HDH-DEPSO) and Load Average Based Underloaded Hosts Processing Strategy (AVG). Specifically, the combination of PEBST, MRB and AVG is able to detect the overloaded and underloaded hosts and selects appropriate VMs for migration to reduce SLAV and VM migrations. Also, HDH-DEPSO combines the advantages of DE and PSO to search for the best VM placement solution, which can reduce cluster's real-time power effectively. A series of experiments based on real cloud environment datasets (PlanetLab, Mix and Gan) show that HSI-VMC can reduce energy consumption sharply with accommodate to multiple QoS metrics, outperforms several existing mainstream energy-aware VM consolidation approaches.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006331
    Abstract:
    Directed grey-box fuzzing measures the effectiveness of seeds for detecting the execution path towards the target. In addition to the closeness between the triggered execution and the target code lines, the ability to explore diversified execution paths is also important to avoid local optimum. Current directed grey-box fuzzing methods measure this capability by coverage counting of the whole program. But only a part of the program is responsible for the calculation of the target state. If the new seed brings target irrelevant state changes, it cannot enhance the queue for state exploration. What is worse, it may distract the concentration of the fuzzer and waste time on exploring target irrelevant code logic. To solve this problem, this paper provides a valid coverage guided directed grey-box fuzzing method. We use static program slicing technique to locate the code region that can affect the target state and detect interesting seeds that bring new differences in coverage of this code region. By enlarging the energy of these seeds and reducing others(adjusting power schedule), the fuzzer can be guided to focus on seeds that can help explore different control flow that target depends and mitigate the interference of redundant seeds. Our experiment on the benchmark provided shows that this strategy brings significant performance improvement for AFLGO.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006356
    Abstract:
    The use of the DataFlow model integrates the batch processing and stream processing of big data computing. However, the existing cluster resource scheduling frameworks for big data computing are either oriented to stream processing or batch processing, which are not suitable for batch processing and stream processing jobs to share cluster resources. In addition, when GPUs are used for big data analysis and calculations, resource usage efficiency is reduced due to the lack of effective CPU-GPU resource decoupling methods. Based on the analysis of existing cluster scheduling frameworks, a hybrid resource scheduling framework called HRM is designed and implemented that can perceive batch/stream processing applications. Based on a shared state architecture, HRM uses a combination of optimistic blocking protocols and pessimistic blocking protocols to ensure different resource requirements for stream processing jobs and batch processing jobs. On computing nodes, it provides flexible binding of CPU-GPU resources, and adopts queue stacking technology, which not only meets the real-time requirements of stream processing jobs, but also reduces feedback delays and realizes the sharing of GPU resources. By simulating the scheduling of large-scale jobs, the scheduling delay of HRM is only about 75% of the centralized scheduling framework; by using actual load testing, the CPU resource utilization is increased by more than 25% when batch processing and stream processing share clusters; by using the fine-grained job scheduling method, not only the GPU utilization rate is increased by more than 2 times, the job completion time can also be reduced by about 50%.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006362
    Abstract:
    Data-intensive tasks include a large number of tasks. Using GPU devices to improve the performance of tasks is the main method currently. However, in the case of solving the fair sharing of GPU resources between data-intensive tasks and reducing the cost of data network transmission, the existing research methods do not comprehensively consider the contradiction between resource fairness and data transmission costs. The paper analyzes the characteristics of GPU cluster resource scheduling, and proposes an algorithm based on the minimum cost and the maximum number of tasks in GPU cluster resource scheduling. The method can solve the contradiction between the fair allocation of GPU resources and the high cost of data transmission. The scheduling process is divided into two stages. In the first stage, each job gives its own optimal plan according to the data transmission costs, and in the second stage, the resource allocator merges the plan of each job. Firstly, the paper gives the overall structure of the framework, and the source allocator works globally after each job giving its own optimal plan. Secondly, the network bandwidth estimation strategy and the method of computing the data transmission cost of the task are given. Thirdly, the basic algorithm for the fair allocation of resources based on the number of GPUs is given. Fourthly, the scheduling algorithm with the smallest cost and the largest number of tasks is proposed, which describing the implementation strategies of resource non-grabbing, robbing and resource fairness strategies. Finally, six data-intensive computing tasks are designed, and the algorithm proposed in the paper is tested, and the experiments verifies the scheduling algorithm can achieve about 90% of resource fairness, while also ensuring that the parallel operation time of jobs is minimized.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006365
    [Abstract] (1463) [HTML] (0) [PDF 1.02 M] (1367)
    Abstract:
    With the in-depth penetration of information technology in various fields, there are many data in the real world. This can help data-driven algorithms in machine learning obtain valuable knowledge. Meanwhile, high-dimension, excessive redundancy, and strong noise are inherent characteristics of these various and complex data. In order to eliminate redundancy, discover data structure, and improve data quality, prototype learning is developed. By finding a prototype set from the target set, we can reduce the data in the sample space, and then improve the efficiency and effectiveness of machine learning algorithms. Its feasibility has been proven in many applications. Thus, the research on prototype learning has been one of the hot and key research topics in the field of machine learning recently. This paper mainly introduces the research background and application value of prototype learning. Meanwhile, it also provides an overview of specialties of various related methods in prototype learning, quality evaluation of prototypes, and typical applications. Then, the research progress of prototype learning with respect to supervision mode and model design is presented. In particular, the former involves unsupervision, semi-supervision, and full supervision mode, and the latter compares four kinds of prototype learning methods based on similarity, determinantal point process, data reconstruction, and low-rank approximation, respectively. Finally, this paper looks forward to the future development of prototype learning.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006369
    [Abstract] (1173) [HTML] (0) [PDF 915.30 K] (1101)
    Abstract:
    Code snippets in open-source and enterprise software projects and posted on various software development websites are important software development resources. However, developer needs for code search often reflect high-level intentions and topics, which are difficult to be satisfied through information retrieval based code search techniques. It is thus highly desirable that code snippets can be accompanied with semantic tags reflecting their high-level intentions and topics to facilitate code search and understanding. Existing tag generation technologies are mainly oriented to text content or rely on historical data, and cannot meet the needs of large-scale code semantic annotation and auxiliary code search and understanding. Targeted at the problem, this paper proposes a software knowledge graph based approach (called KGCodeTagger) that automatically generates semenatic tags for code snippets. KGCodeTagger constructs a software knowledge graph based on concepts and relations extracted from API documentations and software development Q&A text and uses the knowledge graph as the basis of code semantic tag generation. Given a code snippet, KGCodeTagger identifies and extracts API invocations and concept mentions, and then links them to the corresponding concepts in the software knowledge graph. On this basis, the approach further identifies other concepts related to the linked concepts as candidates and selects semantic tags from relevant concepts based on the diversity and representativeness. We evaluate the software knowledge graph construction steps of KGCodeTagger and the quality of the generated code tags. The results show that KGCodeTagger can produce high-quality and meaningful software knowledge graph and code semantic tags that can help developers quickly understand the intention of the code.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006372
    Abstract:
    Sampling is a fundamental class of computational problems. The problem of generating random samples from a solution space according to certain probability distribution, has numerous important applications in approximate counting, probability inference, statistical learning, etc. In the Big Data Era, the distributed sampling attracts considerably more attentions. In recent years, there is a line of research works that systematically study the theory of distributed sampling. This paper surveys important results on distributed sampling, including distributed sampling algorithms with theoretically provable guarantees, the computational complexity of sampling in the distributed computing model, and the mutual relation between sampling and inference in the distributed computing model.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006373
    Abstract:
    God class refers to a class that carries heavy tasks and responsibilities. The common feature of god class is that it contains a large number of attributes and methods, and has multiple dependencies with other classes in the system. God Class is a typical code smell, which has a negative impact on the development and maintenance of the software. In recent years, many studies have been devoted to discovering or refactoring the god class; however, the detection ability of existing methods is not strong, and the detection precision is not high enough. This paper proposes a god class detection approach based on graph model and isolation forest algorithm, which can be divided into two stages:the stage of the graph structure information analysis and the stage of intra-class measurement evaluation. In the stage of the graph structure information analysis, the inter-class method call graph and the intra-class structure graph are established respectively, and the isolation forest algorithm is used to reduce the detection range of god class. In the stage of the intra-class measurement evaluation, the impact of the scale and architecture of the project is taken into account, and the average value of the god class related measurement indicators in the project is used as the benchmark. We design an experiment to determine the scale factors, and use the product of the average value and the scale factors as the threshold for the detection to obetain the god class detection result. The experimental results on the code smell standard data set show that the method proposed in this article improves the precision and F1 value by 25.82% and 33.39% respectively compared with the existing god class detection methods, while maintaining a high level of recall at the same time.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006359
    Abstract:
    In recent years, traditional HDDs' areal density will stop increasing. To extend the capacity of disk drives, several new storage techniques were proposed, including Shingled Magnetic Recording (SMR), which is the first one to reach market during those newtechnologies. However, the shingled track structure of SMR disks will encounter serious write amplification and declining performance when processing random write requests. Furthermore, constructing RAID5 based on SMR drives worsens the write amplification (WA) because the parity updating of RAID5 is very frequent to produce many random writes. In this paper, for current SMR disks' structure, we find that the first track of each band can be overwritten without impacting other tracks, because the wide write head can be moved a bit to cover both the first track and the guard region. In other words, the first track of each band can be called the free track, because it can be overwritten freely without causing any write amplification. Therefore, we propose a new Free-Track-based RAID system (FT-RAID) based on SMR drives, to fully develop the potentials of the overwriting-free region in SMR disk drives. FT-RAID is consisted of two key techniques, i.e., FT-Mapping and FT-Buffer. FT-Mapping is an SMR-friendly data mapping manner in RAID, which maps the frequently updated parity blocks to the free tracks; FT-Buffer adopts an SMR-friendly two-layer cache structures, in which the upper level can support in-place updating for hot blocks and the lower level can supply higher capacity for the write buffer. Both of them are designed to mitigate the degradation of performance by reducing SMR WA, leading to an 80.4% lower WA ratio than CMR-based RAID5 based on practical enterprise I/O workloads.
    Available online:  April 21, 2021 , DOI: 10.13328/j.cnki.jos.006339
    Abstract:
    Bug location based on information retrieval(IR) uses cross language semantic similarity to construct a retrieval model to locate source code errors through bug report. However, the traditional method of bug location based on information retrieval treats the code as pure text and only uses the lexical semantic information of source code, which leads to the problem of low accuracy caused by the lack of candidate code semantics in fine-grained bug location, and the usefulness of the results needs to be improved. By analyzing the relationship between code change and bug generation in the scenario of program evolution, this paper proposes a fine-grained bug location method based on source code extension information, the explicit semantic information of code vocabulary and implicit information of code execution are used to enrich source code semantics to realize fine-grained bug location. Based on the location candidate points, the semantic context is used to enrich the code quantity, and the structural semantics of code execution intermediate language is used to realize fine-grained code distinguishability. Meanwhile, natural language semantics is used to guide the generation of code language representation based on attention mechanism, the semantic mapping between fine-grained code and natural language is implemented to implement fine-grained bug location method FlowLocator. The experimental results show that compared with the classical IR bug location method, the location accuracy of this method is significantly improved in the Top-N rank, Mean Average Precision(MAP) and Mean Reciprocal Rank(MRR).
    Available online:  April 21, 2021 , DOI: 10.13328/j.cnki.jos.006340
    Abstract:
    Recent research on multi-turn dialogue generation has focused on RNN or Transformer-based encoder-decoder architecture. However, most of these models ignore the influence of dialogue structure on dialogue generation. To solve this problem, this paper proposes to use graph neural network structure to model the dialogue structure information, thus effectively describing the complex logic within a dialogue. We propose text-based similarity relation structure, turn-switching-based relation structure, and speaker-based relation structure for dialogue generation, and employ graph neural network to realize information transmission and iteration in dialogue context. Extensive experiments on the DailyDialog dataset show that the proposed model consistently outperforms other baseline models in many indexes, which indicates that our proposed model with graph neural network can effectively describe various correlation structures in dialogue, thus contributing to the high-quality dialogue response generation.
    Available online:  April 21, 2021 , DOI: 10.13328/j.cnki.jos.006324
    Abstract:
    In the field of software engineering, code completion is one of the most useful technologies in the integrated development environment (IDE). It improves the efficiency of software development and becomes an important technology to accelerate the development of modern software. Prediction of class names, method names, keywords and so on, through code completion technology, to a certain extent, improves code specifications and reduces the work intensity of programmers. In recent years, artificial intelligenceIn general, Smart Code Completion uses the source code training network to learn code characteristics from the corpus, and makes recommendations and predictions based on the context code characteristics of the locations to be completed. Most of the existing code feature representations are based on program grammar and do not reflect the semantic information of the program.The network structure currently used is still not capable of solving long-distance dependency problems when facing long code sequences. Therefore, this paper presents a method to characterize codes based on program control dependency and grammar information, and considers code completion as an abstract grammar tree (AST) node prediction problem based on time convolution network (TCN), which enables network models toThis method has been proven to be about 2.8% more accurate than existing methods.
    Available online:  April 21, 2021 , DOI: 10.13328/j.cnki.jos.006296
    Abstract:
    In the multi-stage secret sharing scheme, the participants of authorized sets in each level of access structures can jointly reconstruct the corresponding secret. But in reality, adversaries who corrupted an unauthorized set can obtain some or even all of the share information of the uncorrupted participants through memory attacks, thereby illegally obtaining some or even all of the shared secrets. Facing with such memory leaks, the existing multi-stage secret sharing schemes are no longer secure. Based on this, this paper firstly presents a formal computational security model of indistinguishability against chosen secret attack for multi-stage secret sharing. Then, using the combination of the physical unclonable function and the fuzzy extractor, a verifiable memory leakage-resistant multi-stage secret sharing scheme for general access structures is constructed based on the minimal linear codes. Furthermore, in the presence of a memory attacker, we prove that the scheme is computational secure in the random oracle model. Finally, we compare and analyze the proposed scheme with the existing schemes in terms of their properties and computational complexity.
    Available online:  February 07, 2021 , DOI: 10.13328/j.cnki.jos.006305
    Abstract:
    Boundaries identification of Chinese named entities is a difficult problem because of no separator between Chinese texts. Futhermore, the lack of well-marked NER data make Chinese NER tasks more challenging in vertical domains, such as clinical domain and financial domain. To address aforementioned issues, this paper proposes a novel cross-domain Chinese NER model by dynamically Transferring Entity Span information (TES-NER). The cross-domain shared entity span information is transferred from the general domain (source domain) with sufficient corpus to the Chinese named entity recognition model on the vertical domain (target domain) through a dynamic fusion layer based on the gate mechanism, where the entity span information is used to represent the scope of the Chinese named entities. Specifically, TES-NER first introduces a cross-domain shared entity span recognition module based on a BILSTM layer and a fully connected neural network (FCN) which are used to identify the cross-domain shared entity span information to determine the boundaries of the Chinese named entities. Then, a Chinese named entity recognition module is constructed to identify the domain-specific Chinese named entities by applying independent bidirectional long short-term memory with conditional random field models (BILSTM-CRF). Finally, a dynamic fusion layer is designed to dynamically determine the amount of the cross-domain shared entity span information extracted from the entity span recognition module, which is used to transfer the knowledge to the domain-specific named entity recognition model through the gate mechanism. This paper sets the general domain (source domain) dataset as the news domain dataset (MSRA) with sufficient labeled corpus, while the vertical domain (target domain) datasets are composed of three datasets:mixed domain (ontonotes5.0), financial domain (resume) and medical domain (ccks2017). Among them, the mixed domain dataset (ontonotes5.0) is a corpus integrating six different vertical domains. The F1 values of the model proposed in this paper are 2.18%, 1.68%, and 0.99% higher than the bidirectional long short-term memory with conditional random field model (BILSTM-CRF), respectively.
    Available online:  October 18, 2017 , DOI:
    [Abstract] (2597) [HTML] (0) [PDF 525.21 K] (4023)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,315-325页. 原文链接如下:https://doi.org/10.1145/3106237.3106242, 读者如需引用该文请标引原文出处。
    Available online:  October 18, 2017 , DOI:
    [Abstract] (2572) [HTML] (0) [PDF 352.38 K] (5220)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,303-314页. 原文链接如下:https://doi.org/10.1145/3106237.3106239, 读者如需引用该文请标引原文出处。
    Available online:  September 11, 2017 , DOI:
    [Abstract] (2990) [HTML] (0) [PDF 276.42 K] (2189)
    Abstract:
    GitHub, a popular social-software-development platform, has fostered a variety of software ecosystems where projects depend on one another and practitioners interact with each other. Projects within an ecosystem often have complex inter-dependencies that impose new challenges in bug reporting and fixing. In this paper, we conduct an empirical study on cross-project correlated bugs, i.e., causally related bugs reported to different projects, focusing on two aspects: 1) how developers track the root causes across projects; and 2) how the downstream developers coordinate to deal with upstream bugs. Through manual inspection of bug reports collected from the scientific Python ecosystem and an online survey with developers, this study reveals the common practices of developers and the various factors in fixing cross-project bugs. These findings provide implications for future software bug analysis in the scope of ecosystem, as well as shed light on the requirements of issue trackers for such bugs.
    Available online:  June 21, 2017 , DOI:
    [Abstract] (3079) [HTML] (0) [PDF 169.43 K] (2337)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在IEEE Transactions on Software Engineering 2017 已录用待发表. 原文链接如下:http://ieeexplore.ieee.org/document/7792694, 读者如需引用该文请标引原文出处。
    Available online:  June 13, 2017 , DOI:
    [Abstract] (4284) [HTML] (0) [PDF 174.91 K] (2767)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 39th International Conference on Software Engineering, Pages 27-37, Buenos Aires, Argentina — May 20 - 28, 2017, IEEE Press Piscataway, NJ, USA ?2017, ISBN: 978-1-5386-3868-2 原文链接如下:http://dl.acm.org/citation.cfm?id=3097373, 读者如需引用该文请标引原文出处。
    Available online:  January 25, 2017 , DOI:
    [Abstract] (3151) [HTML] (0) [PDF 254.98 K] (2124)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 871-882. DOI: https://doi.org/10.1145/2950290.2950364 原文链接如下:http://dl.acm.org/citation.cfm?id=2950364, 读者如需引用该文请标引原文出处。
    Available online:  January 18, 2017 , DOI:
    [Abstract] (3576) [HTML] (0) [PDF 472.29 K] (2092)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Pages 133—143, Seattle WA, USA, November 2016. 原文链接如下:http://dl.acm.org/citation.cfm?id=2950327, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017 , DOI:
    [Abstract] (3391) [HTML] (0) [PDF 293.93 K] (1914)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE'16), 810 – 821, November 13 - 18, 2016. 原文链接如下:https://doi.org/10.1145/2950290.2950310, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017 , DOI:
    [Abstract] (3708) [HTML] (0) [PDF 244.61 K] (2180)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE 2016, 原文链接如下:http://dl.acm.org/citation.cfm?doid=2950290.2950313, 读者如需引用该文请标引原文出处。
    Available online:  December 12, 2016 , DOI:
    [Abstract] (3206) [HTML] (0) [PDF 358.69 K] (2217)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE'16会议上Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 原文链接如下:http://dl.acm.org/citation.cfm?id=2950340, 读者如需引用该文请标引原文出处。
    Available online:  September 30, 2016 , DOI:
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在ASE2016会议上http://ase2016.org/ 原文链接如下:http://dl.acm.org/citation.cfm?id=2970366 读者如需引用该文请标引原文出处。
    Available online:  September 09, 2016 , DOI:
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 俊杰的文章发表在ASE2016会议上,http://ase2016.org/。 原文链接如下:http://dl.acm.org/citation.cfm?doid=2970276.2970300 请读者标引时请引注原文出处。
    Available online:  September 07, 2016 , DOI:
    Abstract:
    CCF 软件工程专业委员会白晓颖教授(清华大学)推荐。 原文发表在 ASE 2016 Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering。 全文链接:http://dx.doi.org/10.1145/2970276.2970307。 重要提示:读者如引用该文时请标注原文出处。
    Available online:  August 29, 2016 , DOI:
    Abstract:
    CCF软件工程专业委员会白晓颖教授(清华大学)推荐。 该论文发表在ACM Transactions on Software Engineering and Methodology (TOSEM, Vol. 25, No. 2, Article 13, May 2016),被ICSE 2016主会邀请为“Journal first”报告, 全文参见http://dl.acm.org/citation.cfm?id=2876443。 论文作者包括北京大学的周明辉,马秀娟,张路和梅宏,以及田纳西大学的Audris Mockus。 重要提示:读者如引用该文时请标注原文出处。
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291, DOI:
    [Abstract] (35898) [HTML] (0) [PDF 832.28 K] (76068)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2010,21(3):427-437, DOI:
    [Abstract] (31564) [HTML] (0) [PDF 308.76 K] (35649)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2011,22(1):71-83, DOI:10.3724/SP.J.1001.2011.03958
    [Abstract] (28512) [HTML] (0) [PDF 781.42 K] (50405)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2016,27(1):45-71, DOI:10.13328/j.cnki.jos.004914
    [Abstract] (27852) [HTML] (748) [PDF 880.96 K] (27480)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2009,20(5):1337-1348, DOI:
    [Abstract] (26963) [HTML] (0) [PDF 1.06 M] (42029)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2008,19(1):48-61, DOI:
    [Abstract] (26637) [HTML] (0) [PDF 671.39 K] (58226)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2009,20(2):271-289, DOI:
    [Abstract] (25959) [HTML] (0) [PDF 675.56 K] (39990)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2005,16(1):1-7, DOI:
    [Abstract] (21025) [HTML] (0) [PDF 614.61 K] (18184)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2004,15(3):428-442, DOI:
    [Abstract] (19906) [HTML] (0) [PDF 1009.57 K] (14375)
    Abstract:
    With the rapid development of e-business, web applications based on the Web are developed from localization to globalization, from B2C(business-to-customer) to B2B(business-to-business), from centralized fashion to decentralized fashion. Web service is a new application model for decentralized computing, and it is also an effective mechanism for the data and service integration on the web. Thus, web service has become a solution to e-business. It is important and necessary to carry out the research on the new architecture of web services, on the combinations with other good techniques, and on the integration of services. In this paper, a survey presents on various aspects of the research of web services from the basic concepts to the principal research problems and the underlying techniques, including data integration in web services, web service composition, semantic web service, web service discovery, web service security, the solution to web services in the P2P (Peer-to-Peer) computing environment, and the grid service, etc. This paper also presents a summary of the current art of the state of these techniques, a discussion on the future research topics, and the challenges of the web services.
    2005,16(5):857-868, DOI:
    [Abstract] (19213) [HTML] (0) [PDF 489.65 K] (27318)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2010,21(8):1834-1848, DOI:
    [Abstract] (18924) [HTML] (0) [PDF 682.96 K] (51342)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2009,20(1):54-66, DOI:
    [Abstract] (18380) [HTML] (0) [PDF 1.41 M] (46930)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2012,23(1):32-45, DOI:10.3724/SP.J.1001.2012.04091
    [Abstract] (17825) [HTML] (0) [PDF 408.86 K] (27972)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2012,23(4):962-986, DOI:10.3724/SP.J.1001.2012.04175
    [Abstract] (17705) [HTML] (0) [PDF 2.09 M] (28278)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2009,20(3):524-545, DOI:
    [Abstract] (16803) [HTML] (0) [PDF 1.09 M] (19723)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2009,20(1):124-137, DOI:
    [Abstract] (15938) [HTML] (0) [PDF 1.06 M] (19828)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2009,20(11):2965-2976, DOI:
    [Abstract] (15808) [HTML] (0) [PDF 442.42 K] (12862)
    Abstract:
    This paper studies uncertain graph data mining and especially investigates the problem of mining frequent subgraph patterns from uncertain graph data. A data model is introduced for representing uncertainties in graphs, and an expected support is employed to evaluate the significance of subgraph patterns. By using the apriori property of expected support, a depth-first search-based mining algorithm is proposed with an efficient method for computing expected supports and a technique for pruning search space, which reduces the number of subgraph isomorphism testings needed by computing expected support from the exponential scale to the linear scale. Experimental results show that the proposed algorithm is 3 to 5 orders of magnitude faster than a na?ve depth-first search algorithm, and is efficient and scalable.
    2004,15(8):1208-1219, DOI:
    [Abstract] (15744) [HTML] (0) [PDF 948.49 K] (11715)
    Abstract:
    With the explosive growth of network applications and complexity, the threat of Internet worms against network security becomes increasingly serious. Especially under the environment of Internet, the variety of the propagation ways and the complexity of the application environment result in worm with much higher frequency of outbreak, much deeper latency and more wider coverage, and Internet worms have been a primary issue faced by malicious code researchers. In this paper, the concept and research situation of Internet worms, exploration function component and execution mechanism are first presented, then the scanning strategies and propagation model are discussed, and finally the critical techniques of Internet worm prevention are given. Some major problems and research trends in this area are also addressed.
    2009,20(5):1226-1240, DOI:
    [Abstract] (15545) [HTML] (0) [PDF 926.82 K] (14147)
    Abstract:
    This paper introduces the concrete details of combining the automated reasoning techniques with planning methods, which includes planning as satisfiability using propositional logic, Conformant planning using modal logic and disjunctive reasoning, planning as nonmonotonic logic, and Flexible planning as fuzzy description logic. After considering experimental results of International Planning Competition and relevant papers, it concludes that planning methods based on automated reasoning techniques is helpful and can be adopted. It also proposes the challenges and possible hotspots.
    2003,14(10):1717-1727, DOI:
    [Abstract] (15483) [HTML] (0) [PDF 839.25 K] (12417)
    Abstract:
    Sensor networks are integration of sensor techniques, nested computation techniques, distributed computation techniques and wireless communication techniques. They can be used for testing, sensing, collecting and processing information of monitored objects and transferring the processed information to users. Sensor network is a new research area of computer science and technology and has a wide application future. Both academia and industries are very interested in it. The concepts and characteristics of the sensor networks and the data in the networks are introduced, and the issues of the sensor networks and the data management of sensor networks are discussed. The advance of the research on sensor networks and the data management of sensor networks are also presented.
    2009,20(2):350-362, DOI:
    [Abstract] (15257) [HTML] (0) [PDF 1.39 M] (36785)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2015,26(1):62-81, DOI:10.13328/j.cnki.jos.004701
    [Abstract] (14902) [HTML] (631) [PDF 1.04 M] (22449)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2014,25(4):839-862, DOI:10.13328/j.cnki.jos.004558
    [Abstract] (14710) [HTML] (727) [PDF 1.32 M] (16500)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2009,20(10):2729-2743, DOI:
    [Abstract] (13875) [HTML] (0) [PDF 1.12 M] (9334)
    Abstract:
    In a multi-hop wireless sensor network (WSN), the sensors closest to the sink tend to deplete their energy faster than other sensors, which is known as an energy hole around the sink. No more data can be delivered to the sink after an energy hole appears, while a considerable amount of energy is wasted and the network lifetime ends prematurely. This paper investigates the energy hole problem, and based on the improved corona model with levels, it concludes that the assignment of transmission ranges of nodes in different coronas is an effective approach for achieving energy-efficient network. It proves that the optimal transmission ranges for all areas is a multi-objective optimization problem (MOP), which is NP hard. The paper proposes an ACO (ant colony optimization)-based distributed algorithm to prolong the network lifetime, which can help nodes in different areas to adaptively find approximate optimal transmission range based on the node distribution. Furthermore, the simulation results indicate that the network lifetime under this solution approximates to that using the optimal list. Compared with existing algorithms, this ACO-based algorithm can not only make the network lifetime be extended more than two times longer, but also have good performance in the non-uniform node distribution.
    2012,23(5):1148-1166, DOI:10.3724/SP.J.1001.2012.04195
    [Abstract] (13647) [HTML] (0) [PDF 946.37 K] (15311)
    Abstract:
    With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here, the features, influence and related products of cloud databases are first discussed. Then, research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.
    2012,23(1):1-20, DOI:10.3724/SP.J.1001.2012.04100
    [Abstract] (13501) [HTML] (0) [PDF 1017.73 K] (27904)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2000,11(11):1460-1466, DOI:
    [Abstract] (13445) [HTML] (0) [PDF 520.69 K] (9527)
    Abstract:
    Intrusion detection is a highlighted topic of network security research in recent years. In this paper, first the necessity o f intrusion detection is presented, and its concepts and models are described. T hen, many intrusion detection techniques and architectures are summarized. Final ly, the existing problems and the future direction in this field are discussed.
    2008,19(zk):112-120, DOI:
    [Abstract] (13182) [HTML] (0) [PDF 594.29 K] (12996)
    Abstract:
    An ad hoc network is a collection of wireless mobile nodes dynamically forming a temporary network without the use of any existing network infrastructure or centralized administration. Due to bandwidth constraint and dynamic topology of mobile ad hoc networks, multipath supported routing is a very important research issue. In this paper, we present an entropy-based metric to support stability multipath on-demand routing (SMDR). The key idea of SMDR protocol is to construct the new metric-entropy and select the stability multipath with the help of entropy metric to reduce the number of route reconstruction so as to provide QoS guarantee in the ad hoc network whose topology changes continuously. Simulation results show that, with the proposed multipath routing protocol, packet delivery ratio, end-to-end delay, and routing overhead ratio can be improved in most of cases. It is an available approach to multipath routing decision.
    2004,15(4):571-583, DOI:
    [Abstract] (13179) [HTML] (0) [PDF 1005.17 K] (8326)
    Abstract:
    For most peer-to-peer file-swapping applications, sharing is a volunteer action, and peers are not responsible for their irresponsible bartering history. This situation indicates the trust between participants can not be set up simply on the traditional trust mechanism. A reasonable trust construction approach comes from the social network analysis, in which trust relations between individuals are set up upon recommendations of other individuals. Current p2p trust model could not promise the convergence of iteration for trust computation, and takes no consideration for model security problems, such as sybil attack and slandering. This paper presents a novel recommendation-based global trust model and gives a distributed implementation method. Mathematic analyses and simulations show that, compared to the current global trust model, the proposed model is more robust on trust security problems and more complete on iteration for computing peer trust.
    2013,24(8):1786-1803, DOI:10.3724/SP.J.1001.2013.04416
    [Abstract] (13142) [HTML] (0) [PDF 1.04 M] (14404)
    Abstract:
    Many specific application oriented NoSQL database systems are developed for satisfying the new requirement of big data management. This paper surveys researches on typical NoSQL database based on key-value data model. First, the characteristics of big data, and the key technique issues supporting big data management are introduced. Then frontier efforts and research challenges are given, including system architecture, data model, access mode, index, transaction, system elasticity, load balance, replica strategy, data consistency, flash cache, MapReduce based data process and new generation data management system etc. Finally, research prospects are given.
    2006,17(7):1588-1600, DOI:
    [Abstract] (12944) [HTML] (0) [PDF 808.73 K] (12632)
    Abstract:
    Routing technology at the network layer is pivotal in the architecture of wireless sensor networks. As an active branch of routing technology, cluster-based routing protocols excel in network topology management, energy minimization, data aggregation and so on. In this paper, cluster-based routing mechanisms for wireless sensor networks are analyzed. Cluster head selection, cluster formation and data transmission are three key techniques in cluster-based routing protocols. As viewed from the three techniques, recent representative cluster-based routing protocols are presented, and their characteristics and application areas are compared. Finally, the future research issues in this area are pointed out.
    2011,22(1):115-131, DOI:10.3724/SP.J.1001.2011.03950
    [Abstract] (12932) [HTML] (0) [PDF 845.91 K] (25541)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2009,20(1):11-29, DOI:
    [Abstract] (12871) [HTML] (0) [PDF 787.30 K] (12219)
    Abstract:
    Constrained optimization problems (COPs) are mathematical programming problems frequently encountered in the disciplines of science and engineering application. Solving COPs has become an important research area of evolutionary computation in recent years. In this paper, the state-of-the-art of constrained optimization evolutionary algorithms (COEAs) is surveyed from two basic aspects of COEAs (i.e., constraint-handling techniques and evolutionary algorithms). In addition, this paper discusses some important issues of COEAs. More specifically, several typical algorithms are analyzed in detail. Based on the analyses, it concluded that to obtain competitive results, a proper constraint-handling technique needs to be considered in conjunction with an appropriate search algorithm. Finally, the open research issues in this field are also pointed out.
    2002,13(7):1228-1237, DOI:
    [Abstract] (12863) [HTML] (0) [PDF 500.04 K] (12054)
    Abstract:
    Software architecture (SA) is emerging as one of the primary research areas in software engineering recently and one of the key technologies to the development of large-scale software-intensive system and software product line system. The history and the major direction of SA are summarized, and the concept of SA is brought up based on analyzing and comparing the several classical definitions about SA. Based on summing up the activities about SA, two categories of study about SA are extracted out, and the advancements of researches on SA are subsequently introduced from seven aspects.Additionally,some disadvantages of study on SA are discussed,and the causes are explained at the same.Finally,it is concluded with some singificantly promising tendency about research on SA.
    2015,26(1):26-39, DOI:10.13328/j.cnki.jos.004631
    [Abstract] (12744) [HTML] (525) [PDF 763.52 K] (11868)
    Abstract:
    In recent years, transfer learning has provoked vast amount of attention and research. Transfer learning is a new machine learning method that applies the knowledge from related but different domains to target domains. It relaxes the two basic assumptions in traditional machine learning: (1) the training (also referred as source domain) and test data (also referred target domain) follow the independent and identically distributed (i.i.d.) condition; (2) there are enough labeled samples to learn a good classification model, aiming to solve the problems that there are few or even not any labeled data in target domains. This paper surveys the research progress of transfer learning and introduces its own works, especially the ones in building transfer learning models by applying generative model on the concept level. Finally, the paper introduces the applications of transfer learning, such as text classification and collaborative filtering, and further suggests the future research direction of transfer learning.
    2013,24(1):50-66, DOI:10.3724/SP.J.1001.2013.04276
    [Abstract] (12656) [HTML] (0) [PDF 0.00 Byte] (14704)
    Abstract:
    As an important application of acceleration in the cloud, the distributed caching technology has received considerable attention in industry and academia. This paper starts with a discussion on the combination of cloud computing and distributed caching technology, giving an analysis of its characteristics, typical application scenarios, stages of development, standards, and several key elements, which have promoted its development. In order to systematically know the state of art progress and weak points of the distributed caching technology, the paper builds a multi-dimensional framework, DctAF. This framework is constituted of 6 dimensions through analyzing the characteristics of cloud computing and boundary of the caching techniques. Based on DctAF, current techniques have been analyzed and summarized; comparisons among several influential products have also been made. Finally, the paper describes and highlights the several challenges that the cache system faces and examines the current research through in-depth analysis and comparison.
    2008,19(8):1902-1919, DOI:
    [Abstract] (12541) [HTML] (0) [PDF 521.73 K] (12011)
    Abstract:
    Visual language techniques have exhibited more advantages in describing various software artifacts than one-dimensional textual languages during software development, ranging from the requirement analysis and design to testing and maintenance, as diagrammatic and graphical notations have been well applied in modeling system. In addition to an intuitive appearance, graph grammars provide a well-established foundation for defining visual languages with the power of precise modeling and verification on computers. This paper discusses the issues and techniques for a formal foundation of visual languages, reviews related practical graphical environments, presents a spatial graph grammar formalism, and applies the spatial graph grammar to defining behavioral semantics of UML diagrams and developing a style-driven framework for software architecture design.
    2003,14(9):1621-1628, DOI:
    [Abstract] (12380) [HTML] (0) [PDF 680.35 K] (17107)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2008,19(8):1947-1964, DOI:
    [Abstract] (12303) [HTML] (0) [PDF 811.11 K] (8325)
    Abstract:
    Wide-Spread deployment for interactive information visualization is difficult. Non-Specialist users need a general development method and a toolkit to support the generic data structures suited to tree, network and multi-dimensional data, special visualization techniques and interaction techniques, and well-known generic information tasks. This paper presents a model driven development method for interactive information visualization. First, an interactive information visualization interface model (IIVM) is proposed. Then, the development method for interactive information visualization based on IIVM is presented. The Daisy toolkit is introduced, which includes Daisy model builder, Daisy IIV generator and runtime framework with Daisy library. Finally, an application example is given. Experimental results show that Daisy can provide a general solution for development for interactive information visualization.
    2010,21(2):231-247, DOI:
    [Abstract] (12291) [HTML] (0) [PDF 1.21 M] (14562)
    Abstract:
    In this paper, a framework is proposed for handling fault of service composition through analyzing fault requirements. Petri nets are used in the framework for fault detecting and its handling, which focuses on targeting the failure of available services, component failure and network failure. The corresponding fault models are given. Based on the model, the correctness criterion of fault handling is given to analyze fault handling model, and its correctness is proven. Finally, CTL (computational tree logic) is used to specify the related properties and enforcement algorithm of fault analysis. The simulation results show that this method can ensure the reliability and consistency of service composition.
    2002,13(10):1952-1961, DOI:
    [Abstract] (12286) [HTML] (0) [PDF 570.96 K] (10005)
    Abstract:
    The crucial technologies related to personalization are introduced in this paper, which include the representation and modification of user profile, the representation of resource, the recommendation technology, and the architecture of personalization. By comparing with some existing prototype systems, the key technologies about how to implement personalization are discussed in detail. In addition, three representative personalization systems are analyzed. At last, some research directions for personalization are presented.
    2003,14(9):1635-1644, DOI:
    [Abstract] (12154) [HTML] (0) [PDF 622.06 K] (10268)
    Abstract:
    Computer forensics is the technology field that attempts to prove thorough, efficient, and secure means to investigate computer crime. Computer evidence must be authentic, accurate, complete and convincing to juries. In this paper, the stages of computer forensics are presented, and the theories and the realization of the forensics software are described. An example about forensic practice is also given. The deficiency of computer forensics technique and anti-forensics are also discussed. The result comes out that it is as the improvement of computer science technology, the forensics technique will become more integrated and thorough.
    2012,23(1):82-96, DOI:10.3724/SP.J.1001.2012.04101
    [Abstract] (12036) [HTML] (0) [PDF 394.07 K] (12263)
    Abstract:
    Botnets are one of the most serious threats to the Internet. Researchers have done plenty of research and made significant progress. However, botnets keep evolving and have become more and more sophisticated. Due to the underlying security limitation of current system and Internet architecture, and the complexity of botnet itself, how to effectively counter the global threat of botnets is still a very challenging issue. This paper first introduces the evolving of botnet’s propagation, attack, command, and control mechanisms. Then the paper summarizes recent advances of botnet defense research and categorizes into five areas: Botnet monitoring, botnet infiltration, analysis of botnet characteristics, botnet detection and botnet disruption. The limitation of current botnet defense techniques, the evolving trend of botnet, and some possible directions for future research are also discussed.
    2010,21(7):1620-1634, DOI:
    [Abstract] (11999) [HTML] (0) [PDF 765.23 K] (18022)
    Abstract:
    As an application of mobile ad hoc networks (MANET) on Intelligent Transportation Information System, the most important goal of vehicular ad hoc networks (VANET) is to reduce the high number of accidents and fatal consequences dramatically. One of the most important factors that would contribute to the realization of this goal is the design of effective broadcast protocols. This paper introduces the characteristics and application fields of VANET briefly. Then, it discusses the characteristics, performance, and application areas with analysis and comparison of various categories of broadcast protocols in VANET. According to the characteristic of VANET and its application requirement, the paper proposes the ideas and breakthrough direction of information broadcast model design of inter-vehicle communication.
    2017,28(1):1-16, DOI:10.13328/j.cnki.jos.005139
    [Abstract] (11959) [HTML] (529) [PDF 1.75 M] (6696)
    Abstract:
    Knapsack problem (KP) is a well-known combinatorial optimization problem which includes 0-1 KP, bounded KP, multi-constraint KP, multiple KP, multiple-choice KP, quadratic KP, dynamic knapsack KP, discounted KP and other types of KPs. KP can be considered as a mathematical model extracted from variety of real fields and therefore has wide applications. Evolutionary algorithms (EAs) are universally considered as an efficient tool to solve KP approximately and quickly. This paper presents a survey on solving KP by EAs over the past ten years. It not only discusses various KP encoding mechanism and the individual infeasible solution processing but also provides useful guidelines for designing new EAs to solve KPs.
    2008,19(7):1565-1580, DOI:
    [Abstract] (11684) [HTML] (0) [PDF 815.02 K] (14019)
    Abstract:
    Software defect prediction has been one of the active parts of software engineering since it was developed in 1970's. It plays a very important role in the analysis of software quality and balance of software cost. This paper investigates and discusses the motivation, evolvement, solutions and challenges of software defect prediction technologies, and it also categorizes, analyzes and compares the representatives of these prediction technologies. Some case studies for software defect distribution models are given to help understanding.
    2004,15(12):1751-1763, DOI:
    [Abstract] (11660) [HTML] (0) [PDF 928.33 K] (6550)
    Abstract:
    This paper presents a research work in children Truing test(CTT).The main defference between our test program and other ones is its knowledge-based character,which is supported by a massive commonsense knowledge base.The motivation,design,techniques,experimental results and platform(including a knowledge engine and a cinverstation engine)of the CTT are described in this paper.Finally,some cincluding thoughts about the CTT and AI are given.
    2010,21(5):916-929, DOI:
    [Abstract] (11650) [HTML] (0) [PDF 944.50 K] (15445)
    Abstract:
    Data deduplication technologies can be divided into two categories: a) identical data detection techniques, and b) similar data detection and encoding techniques. This paper presents a systematic survey on these two categories of data deduplication technologies and analyzes their advantages and disadvantages. Besides, since data deduplication technologies can affect the reliability and performance of storage systems, this paper also surveys various kinds of technologies proposed to cope with these two aspects of problems. Based on the analysis of the current state of research on data deduplication technologies, this paper makes several conclusions as follows: a) How to mine data characteristic information in data deduplication has not been completely solved, and how to use data characteristic information to effectively eliminate duplicate data also needs further study; b) From the perspective of storage system design, it still needs further study how to introduce proper mechanisms to overcome the reliability limitations of data deduplication techniques and reduce the additional system overheads caused by data deduplication techniques.
    2008,19(10):2706-2719, DOI:
    [Abstract] (11645) [HTML] (0) [PDF 778.29 K] (10072)
    Abstract:
    Web search engine has become a very important tool for finding information efficiently from the massive Web data. With the explosive growth of the Web data, traditional centralized search engines become harder to catch up with the growing step of people's information needs. With the rapid development of peer-to-peer (P2P) technology, the notion of P2P Web search has been proposed and quickly becomes a research focus. The goal of this paper is to give a brief summary of current P2P Web search technologies in order to facilitate future research. First, some main challenges for P2P Web search are presented. Then, key techniques for building a feasible and efficient P2P Web search engine are reviewed, including system topology, data placement, query routing, index partitioning, collection selection, relevance ranking and Web crawling. Finally, three recently proposed novel P2P Web search prototypes are introduced.
    1999,10(11):1206-1211, DOI:
    [Abstract] (11491) [HTML] (0) [PDF 392.66 K] (5344)
    Abstract:
    In this paper, the authors discuss two important issues in rough set research which are attribute reduction and value reduction. A new attribute reduction approach which can reach the best attribute reduction is presented based on discernibility matrix and logic computation. And a multivariate decision tree can be got with this method. Some improvements for a widely used value reduction method are also achieved in this paper. The complexity of acquired rule knowledge can be reduced effectively in this way.
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291, DOI:
    [Abstract] (35898) [HTML] (0) [PDF 832.28 K] (76068)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2008,19(1):48-61, DOI:
    [Abstract] (26637) [HTML] (0) [PDF 671.39 K] (58226)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2010,21(8):1834-1848, DOI:
    [Abstract] (18924) [HTML] (0) [PDF 682.96 K] (51342)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2011,22(1):71-83, DOI:10.3724/SP.J.1001.2011.03958
    [Abstract] (28512) [HTML] (0) [PDF 781.42 K] (50405)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2009,20(1):54-66, DOI:
    [Abstract] (18380) [HTML] (0) [PDF 1.41 M] (46930)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2009,20(5):1337-1348, DOI:
    [Abstract] (26963) [HTML] (0) [PDF 1.06 M] (42029)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2009,20(2):271-289, DOI:
    [Abstract] (25959) [HTML] (0) [PDF 675.56 K] (39990)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2004,15(10):1493-1504, DOI:
    [Abstract] (8512) [HTML] (0) [PDF 937.72 K] (36981)
    Abstract:
    Graphics processing unit (GPU) has been developing rapidly in recent years at a speed over Moor抯 law, and as a result, various applications associated with computer graphics advance greatly. At the same time, the highly processing power, parallelism and programmability available nowadays on the contemporary GPU provide an ideal platform on which the general-purpose computation could be made. Starting from an introduction to the development history and the architecture of GPU, the technical fundamentals of GPU are described in the paper. Then in the main part of the paper, the development of various applications on general purpose computation on GPU is introduced, and among those applications, fluid dynamics, algebraic computation, database operations, and spectrum analysis are introduced in detail. The experience of our work on fluid dynamics has been also given, and the development of software tools in this area is introduced. Finally, a conclusion is made, and the future development and the new challenge on both hardware and software in this subject are discussed.
    2009,20(2):350-362, DOI:
    [Abstract] (15257) [HTML] (0) [PDF 1.39 M] (36785)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2010,21(3):427-437, DOI:
    [Abstract] (31564) [HTML] (0) [PDF 308.76 K] (35649)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2013,24(11):2476-2497, DOI:10.3724/SP.J.1001.2013.04486
    [Abstract] (9382) [HTML] (0) [PDF 1.14 M] (31760)
    Abstract:
    Probabilistic graphical models are powerful tools for compactly representing complex probability distributions, efficiently computing (approximate) marginal and conditional distributions, and conveniently learning parameters and hyperparameters in probabilistic models. As a result, they have been widely used in applications that require some sort of automated probabilistic reasoning, such as computer vision and natural language processing, as a formal approach to deal with uncertainty. This paper surveys the basic concepts and key results of representation, inference and learning in probabilistic graphical models, and demonstrates their uses in two important probabilistic models. It also reviews some recent advances in speeding up classic approximate inference algorithms, followed by a discussion of promising research directions.
    2014,25(9):1889-1908, DOI:10.13328/j.cnki.jos.004674
    [Abstract] (10835) [HTML] (614) [PDF 550.98 K] (30954)
    Abstract:
    This paper first introduces the key features of big data in different processing modes and their typical application scenarios, as well as corresponding representative processing systems. It then summarizes three development trends of big data processing systems. Next, the paper gives a brief survey on system supported analytic technologies and applications (including deep learning, knowledge computing, social computing, and visualization), and summarizes the key roles of individual technologies in big data analysis and understanding. Finally, the paper lays out three grand challenges of big data processing and analysis, i.e., data complexity, computation complexity, and system complexity. Potential ways for dealing with each complexity are also discussed.
    2012,23(4):962-986, DOI:10.3724/SP.J.1001.2012.04175
    [Abstract] (17705) [HTML] (0) [PDF 2.09 M] (28278)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):32-45, DOI:10.3724/SP.J.1001.2012.04091
    [Abstract] (17825) [HTML] (0) [PDF 408.86 K] (27972)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2012,23(1):1-20, DOI:10.3724/SP.J.1001.2012.04100
    [Abstract] (13501) [HTML] (0) [PDF 1017.73 K] (27904)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2016,27(1):45-71, DOI:10.13328/j.cnki.jos.004914
    [Abstract] (27852) [HTML] (748) [PDF 880.96 K] (27480)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2005,16(5):857-868, DOI:
    [Abstract] (19213) [HTML] (0) [PDF 489.65 K] (27318)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2018,29(5):1471-1514, DOI:10.13328/j.cnki.jos.005519
    [Abstract] (4710) [HTML] (722) [PDF 4.38 M] (26246)
    Abstract:
    Computer aided detection/diagnosis (CAD) can improve the accuracy of diagnosis,reduce false positive,and provide decision supports for doctors.The main purpose of this paper is to analyze the latest development of computer aided diagnosis tools.Focusing on the top four fatal cancer's incidence positions,major recent publications on CAD applications in different medical imaging areas are reviewed in this survey according to different imaging techniques and diseases.Further more,multidimentional analysis is made on the researches from image data sets,algorithms and evaluation methods.Finally,existing problems,research trend and development direction in the field of medical image CAD system are discussed.
    2011,22(1):115-131, DOI:10.3724/SP.J.1001.2011.03950
    [Abstract] (12932) [HTML] (0) [PDF 845.91 K] (25541)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2013,24(1):77-90, DOI:10.3724/SP.J.1001.2013.04339
    [Abstract] (10590) [HTML] (0) [PDF 0.00 Byte] (24417)
    Abstract:
    Task parallel programming model is a widely used parallel programming model on multi-core platforms. With the intention of simplifying parallel programming and improving the utilization of multiple cores, this paper provides an introduction to the essential programming interfaces and the supporting mechanism used in task parallel programming models and discusses issues and the latest achievements from three perspectives: Parallelism expression, data management and task scheduling. In the end, some future trends in this area are discussed.
    2015,26(1):62-81, DOI:10.13328/j.cnki.jos.004701
    [Abstract] (14902) [HTML] (631) [PDF 1.04 M] (22449)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2017,28(4):959-992, DOI:10.13328/j.cnki.jos.005143
    [Abstract] (8188) [HTML] (534) [PDF 3.58 M] (20410)
    Abstract:
    The development of mobile internet and the popularity of mobile terminals produce massive trajectory data of moving objects under the era of big data. Trajectory data has spatio-temporal characteristics and rich information. Trajectory data processing techniques can be used to mine the patterns of human activities and behaviors, the moving patterns of vehicles in the city and the changes of atmospheric environment. However, trajectory data also can be exploited to disclose moving objects' privacy information (e.g., behaviors, hobbies and social relationships). Accordingly, attackers can easily access moving objects' privacy information by digging into their trajectory data such as activities and check-in locations. In another front of research, quantum computation presents an important theoretical direction to mine big data due to its scalable and powerful storage and computing capacity. Applying quantum computing approaches to handle trajectory big data could make some complex problem solvable and achieve higher efficiency. This paper reviews the key technologies of processing trajectory data. First the concept and characteristics of trajectory data is introduced, and the pre-processing methods, including noise filtering and data compression, are summarized. Then, the trajectory indexing and querying techniques, and the current achievements of mining trajectory data, such as pattern mining and trajectory classification, are reviewed. Next, an overview of the basic theories and characteristics of privacy preserving with respect to trajectory data is provided. The supporting techniques of trajectory big data mining, such as processing framework and data visualization, are presented in detail. Some possible ways of applying quantum computation into trajectory data processing, as well as the implementation of some core trajectory mining algorithms by quantum computation are also described. Finally, the challenges of trajectory data processing and promising future research directions are discussed.
    2011,22(6):1299-1315, DOI:10.3724/SP.J.1001.2011.03993
    [Abstract] (9729) [HTML] (0) [PDF 987.90 K] (19871)
    Abstract:
    Attribute-Based encryption (ABE) scheme takes attributes as the public key and associates the ciphertext and user’s secret key with attributes, so that it can support expressive access control policies. This dramatically reduces the cost of network bandwidth and sending node’s operation in fine-grained access control of data sharing. Therefore, ABE has a broad prospect of application in the area of fine-grained access control. After analyzing the basic ABE system and its two variants, Key-Policy ABE (KP-ABE) and Ciphertext-Policy ABE (CP-ABE), this study elaborates the research problems relating to ABE systems, including access structure design for CP-ABE, attribute key revocation, key abuse and multi-authorities ABE with an extensive comparison of their functionality and performance. Finally, this study discusses the need-to-be solved problems and main research directions in ABE.
    2009,20(1):124-137, DOI:
    [Abstract] (15938) [HTML] (0) [PDF 1.06 M] (19828)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2009,20(3):524-545, DOI:
    [Abstract] (16803) [HTML] (0) [PDF 1.09 M] (19723)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2006,17(9):1848-1859, DOI:
    [Abstract] (11349) [HTML] (0) [PDF 770.40 K] (18820)
    Abstract:
    In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
    2005,16(1):1-7, DOI:
    [Abstract] (21025) [HTML] (0) [PDF 614.61 K] (18184)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2012,23(8):2058-2072, DOI:10.3724/SP.J.1001.2012.04237
    [Abstract] (9368) [HTML] (0) [PDF 800.05 K] (18092)
    Abstract:
    The Distributed denial of service (DDoS) attack is a major threat to the current network. Based on the attack packet level, the study divides DDoS attacks into network-level DDoS attacks and application-level DDoS attacks. Next, the study analyzes the detection and control methods of these two kinds of DDoS attacks in detail, and it also analyzes the drawbacks of different control methods implemented in different network positions. Finally, the study analyzes the drawbacks of the current detection and control methods, the development trend of the DDoS filter system, and corresponding technological challenges are also proposed.
    2004,15(11):1583-1594, DOI:
    [Abstract] (7574) [HTML] (0) [PDF 1.57 M] (18056)
    Abstract:
    Uncertainty exists widely in the subjective and objective world. In all kinds of uncertainty, randomness and fuzziness are the most important and fundamental. In this paper, the relationship between randomness and fuzziness is discussed. Uncertain states and their changes can be measured by entropy and hyper-entropy respectively. Taken advantage of entropy and hyper-entropy, the uncertainty of chaos, fractal and complex networks by their various evolution and differentiation are further studied. A simple and effective way is proposed to simulate the uncertainty by means of knowledge representation which provides a basis for the automation of both logic and image thinking with uncertainty. The AI (artificial intelligence) with uncertainty is a new cross-discipline, which covers computer science, physics, mathematics, brain science, psychology, cognitive science, biology and philosophy, and results in the automation of representation, process and thinking for uncertain information and knowledge.
    2010,21(7):1620-1634, DOI:
    [Abstract] (11999) [HTML] (0) [PDF 765.23 K] (18022)
    Abstract:
    As an application of mobile ad hoc networks (MANET) on Intelligent Transportation Information System, the most important goal of vehicular ad hoc networks (VANET) is to reduce the high number of accidents and fatal consequences dramatically. One of the most important factors that would contribute to the realization of this goal is the design of effective broadcast protocols. This paper introduces the characteristics and application fields of VANET briefly. Then, it discusses the characteristics, performance, and application areas with analysis and comparison of various categories of broadcast protocols in VANET. According to the characteristic of VANET and its application requirement, the paper proposes the ideas and breakthrough direction of information broadcast model design of inter-vehicle communication.
    2013,24(5):1078-1097, DOI:10.3724/SP.J.1001.2013.04390
    [Abstract] (10988) [HTML] (0) [PDF 1.74 M] (17804)
    Abstract:
    The control and data planes are decoupled in software-defined networking, which provide a new solution for research on new network applications and future Internet technologies. The development status of OpenFlow-based SDN technologies is surveyed in this paper. The research background of decoupled architecture of network control and data transmission in OpenFlow network is summarized first, and the key components and research progress including OpenFlow switch, controller, and SDN technologies are introduced. Moreover, current problems and solutions of OpenFlow-based SDN technologies are analyzed in four aspects. Combined with the development status in recent years, the applications used in campus, data center, network management and network security are summarized. Finally, future research trends are discussed.
    2014,25(1):37-50, DOI:10.13328/j.cnki.jos.004497
    [Abstract] (8921) [HTML] (578) [PDF 929.87 K] (17780)
    Abstract:
    This paper surveys the state of the art of speech emotion recognition (SER), and presents an outlook on the trend of future SER technology. First, the survey summarizes and analyzes SER in detail from five perspectives, including emotion representation models, representative emotional speech corpora, emotion-related acoustic features extraction, SER methods and applications. Then, based on the survey, the challenges faced by current SER research are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, and presents detailed comparison and analysis between these methods.
    2005,16(10):1743-1756, DOI:
    [Abstract] (9234) [HTML] (0) [PDF 545.62 K] (17149)
    Abstract:
    This paper presents a survey on the theory of provable security and its applications to the design and analysis of security protocols. It clarifies what the provable security is, explains some basic notions involved in the theory of provable security and illustrates the basic idea of random oracle model. It also reviews the development and advances of provably secure public-key encryption and digital signature schemes, in the random oracle model or the standard model, as well as the applications of provable security to the design and analysis of session-key distribution protocols and their advances.
    2003,14(9):1621-1628, DOI:
    [Abstract] (12380) [HTML] (0) [PDF 680.35 K] (17107)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2018,29(10):2966-2994, DOI:10.13328/j.cnki.jos.005551
    [Abstract] (7484) [HTML] (901) [PDF 610.06 K] (16687)
    Abstract:
    In recent years, the rapid development of Internet technology and Web applications has triggered the explosion of various data on the Internet, which generates a large amount of valuable knowledge. How to organize, represent and analyze these knowledge has attracted much attention. Knowledge graph was thus developed to organize these knowledge in a semantical and visualized manner. Knowledge reasoning over knowledge graph then becomes one of the hot research topics and plays an important role in many applications such as vertical search and intelligent question-answer. The goal of knowledge reasoning over knowledge graph is to infer new facts or identify erroneous facts according to existing ones. Unlike traditional knowledge reasoning, knowledge reasoning over knowledge graph is more diversified, due to the simplicity, intuitiveness, flexibility, and richness of knowledge representation in knowledge graph. Starting with the basic concept of knowledge reasoning, this paper presents a survey on the recently developed methods for knowledge reasoning over knowledge graph. Specifically, the research progress is reviewed in detail from two aspects:One-Step reasoning and multi-step reasoning, each including rule based reasoning, distributed embedding based reasoning, neural network based reasoning and hybrid reasoning. Finally, future research directions and outlook of knowledge reasoning over knowledge graph are discussed.
    2014,25(4):839-862, DOI:10.13328/j.cnki.jos.004558
    [Abstract] (14710) [HTML] (727) [PDF 1.32 M] (16500)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2013,24(2):295-316, DOI:10.3724/SP.J.1001.2013.04336
    [Abstract] (9308) [HTML] (0) [PDF 0.00 Byte] (16495)
    Abstract:
    Under the new application mode, the traditional hierarchy data centers face several limitations in size, bandwidth, scalability, and cost. In order to meet the needs of new applications, data center network should fulfill the requirements with low-cost, such as high scalability, low configuration overhead, robustness and energy-saving. First, the shortcomings of the traditional data center network architecture are summarized, and new requirements are pointed out. Secondly, the existing proposals are divided into two categories, i.e. server-centric and network-centric. Then, several representative architectures of these two categories are overviewed and compared in detail. Finally, the future directions of data center network are discussed.
    2010,21(7):1605-1619, DOI:
    [Abstract] (9367) [HTML] (0) [PDF 856.25 K] (16395)
    Abstract:
    The rapid development of Internet leads to an increase in system complexity and uncertainty. Traditional network management can not meet the requirement, and it shall evolve to fusion based Cyberspace Situational Awareness (CSA). Based on the analysis of function shortage and development requirement, this paper introduces CSA as well as its origin, conception, objective and characteristics. Firstly, a CSA research framework is proposed and the research history is investigated, based on which the main aspects and the existing issues of the research are analyzed. Meanwhile, assessment methods are divided into three categories: Mathematics model, knowledge reasoning and pattern recognition. Then, this paper discusses CSA from three aspects: Model, knowledge representation and assessment methods, and then goes into detail about main idea, assessment process, merits and shortcomings of novel methods. Many typical methods are compared. The current application research of CSA in the fields of security, transmission, survivable, system evaluation and so on is presented. Finally, this paper points the development directions of CSA and offers the conclusions from issue system, technical system and application system.
    2009,20(6):1393-1405, DOI:
    [Abstract] (11135) [HTML] (0) [PDF 831.86 K] (16218)
    Abstract:
    Combinatorial testing can use a small number of test cases to test systems while preserving fault detection ability. However, the complexity of test case generation problem for combinatorial testing is NP-complete. The efficiency and complexity of this testing method have attracted many researchers from the area of combinatorics and software engineering. This paper summarizes the research works on this topic in recent years. They include: various combinatorial test criteria, the relations between the test generation problem and other NP-complete problems, the mathematical methods for constructing test cases, the computer search techniques for test generation and fault localization techniques based on combinatorial testing.
    2008,19(11):2803-2813, DOI:
    [Abstract] (8574) [HTML] (0) [PDF 319.20 K] (16099)
    Abstract:
    A semi-supervised clustering method based on affinity propagation (AP) algorithm is proposed in this paper. AP takes as input measures of similarity between pairs of data points. AP is an efficient and fast clustering algorithm for large dataset compared with the existing clustering algorithms, such as K-center clustering. But for the datasets with complex cluster structures, it cannot produce good clustering results. It can improve the clustering performance of AP by using the priori known labeled data or pairwise constraints to adjust the similarity matrix. Experimental results show that such method indeed reaches its goal for complex datasets, and this method outperforms the comparative methods when there are a large number of pairwise constraints.
    2009,20(8):2241-2254, DOI:
    [Abstract] (6172) [HTML] (0) [PDF 1.99 M] (16067)
    Abstract:
    Inspired from the idea of data fields, a community discovery algorithm based on topological potential is proposed. The basic idea is that a topological potential function is introduced to analytically model the virtual interaction among all nodes in a network and, by regarding each community as a local high potential area, the community structure in the network can be uncovered by detecting all local high potential areas margined by low potential nodes. The experiments on some real-world networks show that the algorithm requires no input parameters and can discover the intrinsic or even overlapping community structure in networks. The time complexity of the algorithm is O(m+n3/γ)~O(n2), where n is the number of nodes to be explored, m is the number of edges, and 2<γ<3 is a constant.
    2020,31(7):2245-2282, DOI:10.13328/j.cnki.jos.006037
    [Abstract] (2278) [HTML] (524) [PDF 967.02 K] (15990)
    Abstract:
    Ultrasonography is the first choice of imaging examination and preoperative evaluation for thyroid and breast cancer. However, ultrasonic characteristics of benign and malignant nodules are commonly overlapped. The diagnosis heavily relies on operator's experience other than quantitative and stable methods. In recent years, medical imaging analysis based on computer technology has developed rapidly, and a series of landmark breakthroughs have been made, which provides effective decision supports for medical imaging diagnosis. In this work, the research progress of computer vision and image recognition technologies in thyroid and breast ultrasound images is studied. A series of key technologies involved in automatic diagnosis of ultrasound images is the main lines of the work. The major algorithms in recent years are summarized and analyzed, such as ultrasound image preprocessing, lesion localization and segmentation, feature extraction and classification. Moreover, multi-dimensional analysis is made on the algorithms, data sets, and evaluation methods. Finally, existing problems related to automatic analysis of those two kinds of ultrasound imaging are discussed, research trend and development direction in the field of ultrasound images analysis are discussed.
    2021,32(2):349-369, DOI:10.13328/j.cnki.jos.006138
    [Abstract] (5424) [HTML] (724) [PDF 727.22 K] (15816)
    Abstract:
    Few-shot learning is defined as learning models to solve problems from small samples. In recent years, under the trend of training model with big data, machine learning and deep learning have achieved success in many fields. However, in many application scenarios in the real world, there is not a large amount of data or labeled data for model training, and labeling a large number of unlabeled samples will cost a lot of manpower. Therefore, how to use a small number of samples for learning has become a problem that needs to be paid attention to at present. This paper systematically combs the current approaches of few-shot learning. It introduces each kind of corresponding model from the three categories: fine-tune based, data augmentation based, and transfer learning based. Then, the data augmentation based approaches are subdivided into unlabeled data based, data generation based, and feature augmentation based approaches. The transfer learning based approaches are subdivided into metric learning based, meta-learning based, and graph neural network based methods. In the following, the paper summarizes the few-shot datasets and the results in the experiments of the aforementioned models. Next, the paper summarizes the current situation and challenges in few-shot learning. Finally, the future technological development of few-shot learning is prospected.
    2009,20(8):2199-2213, DOI:
    [Abstract] (9892) [HTML] (0) [PDF 2.05 M] (15690)
    Abstract:
    This paper analyzes the previous study of applying P2P technology in mobile Internet. It first introduces the P2P technology and the conception of mobile Internet, and presents the challenges and service pattern of P2P technology in mobile Internet. Second, the architectures of P2P technology in mobile Internet are described in terms of centralized architecture, super node architecture and ad hoc architecture, respectively. Further more, the resource location algorisms and cross-layer optimizations are introduced based on two different terminal access patterns. Detailed analyses of different key technologies are presented and the disadvantages are pointed out. At last, this paper outlines future research directions.
    2009,20(3):567-582, DOI:
    [Abstract] (7788) [HTML] (0) [PDF 780.38 K] (15494)
    Abstract:
    The research on the software quality model and software quality evaluation model has always been a hot topic in the area of software quality assurance and assessment. A great amount of domestic and foreignresearches have been done in building software quality model and quality assessment model, and so far certainaccomplishments have been achieved in these areas. In recent years, platform building and systematization havebecome the trends of developing basic softwares based on operating systems. Therefore, the quality evaluation ofthe foundational software platform becomes an essential issue to be solved. This article analyzes and concludes thecurrent development of researches on software quality model and software quality assessment model focusing onsummarizing and depicting the developing process of quality evaluation of foundational software platform. It alsodiscusses the future development of researches on quality assessment of foundational software platform in brief, trying to establish a good foundation for it.
    2010,21(5):916-929, DOI:
    [Abstract] (11650) [HTML] (0) [PDF 944.50 K] (15445)
    Abstract:
    Data deduplication technologies can be divided into two categories: a) identical data detection techniques, and b) similar data detection and encoding techniques. This paper presents a systematic survey on these two categories of data deduplication technologies and analyzes their advantages and disadvantages. Besides, since data deduplication technologies can affect the reliability and performance of storage systems, this paper also surveys various kinds of technologies proposed to cope with these two aspects of problems. Based on the analysis of the current state of research on data deduplication technologies, this paper makes several conclusions as follows: a) How to mine data characteristic information in data deduplication has not been completely solved, and how to use data characteristic information to effectively eliminate duplicate data also needs further study; b) From the perspective of storage system design, it still needs further study how to introduce proper mechanisms to overcome the reliability limitations of data deduplication techniques and reduce the additional system overheads caused by data deduplication techniques.
    2012,23(5):1148-1166, DOI:10.3724/SP.J.1001.2012.04195
    [Abstract] (13647) [HTML] (0) [PDF 946.37 K] (15311)
    Abstract:
    With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here, the features, influence and related products of cloud databases are first discussed. Then, research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.
    2007,18(1):146-156, DOI:
    [Abstract] (9334) [HTML] (0) [PDF 728.16 K] (15258)
    Abstract:
    A new surrogate placement strategy, CCSP (capacity-constrained surrogate placement), is proposed to enhance the performance for content distribution networks (CDNs). CCSP aims to address surrogate placement in a manner that minimizes the communication cost while ensuring at the same time the maximization of system throughput. This work differs from the existing works on the resource allocation problem in communication networks, CCSP considers load distribution and processing capacity constraints on surrogates by modeling the underlying request-routing mechanism, thus guaranteeing a CDN to have minimum network resource consumption, maximum system throughput, and better load balancing among surrogates. An efficient greedy algorithm is developed for a simplified version of the CCSP problem in tree networks. The efficiency of the proposed algorithm is systematically analyzed through the experimental simulations.
    2016,27(3):691-713, DOI:10.13328/j.cnki.jos.004948
    [Abstract] (8574) [HTML] (493) [PDF 2.43 M] (14967)
    Abstract:
    Learning to rank(L2R) techniques try to solve sorting problems using machine learning methods, and have been well studied and widely used in various fields such as information retrieval, text mining, personalized recommendation, and biomedicine.The main task of L2R based recommendation algorithms is integrating L2R techniques into recommendation algorithms, and studying how to organize a large number of users and features of items, build more suitable user models according to user preferences requirements, and improve the performance and user satisfaction of recommendation algorithms.This paper surveys L2R based recommendation algorithms in recent years, summarizes the problem definition, compares key technologies and analyzes evaluation metrics and their applications.In addition, the paper discusses the future development trend of L2R based recommendation algorithms.
    2009,20(6):1425-1443, DOI:
    [Abstract] (9627) [HTML] (0) [PDF 1.09 M] (14965)
    Abstract:
    The software fault injection testing (SFIT) technique has been developed for thirty years. It is one of the most active parts in software testing research. As a non-traditional testing technique, it plays a very important role in enhancing software quality, eliminating software failures and improving the process of software development. A detailed review of the research on SFIT is presented based on the survey and classification of the current SFIT techniques. Then, some important testing frameworks and tools that are effective at present are also discussed. Meanwhile, a brief description of the testing system CSTS (Component Security Testing System) is provided as well. Based on the precise investigation on SFIT, the issues and challenges of SFIT are pointed out and the future development trend for SFIT is proposed.