• Current Issue
  • Online First
  • Archive
  • Click Rank
  • Most Downloaded
  • 综述文章
  • 专刊文章
  • 分辑系列
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2023,34(6):2509-2525, DOI: 10.13328/j.cnki.jos.006846
    [Abstract] (561) [HTML] (28) [PDF 2.11 M] (441)
    Abstract:
    Researchers use key classes as starting points for software understanding and maintenance. These key classes may cause a significant security risk to the software if they have defects. Therefore, identifying key classes can improve the reliability and stability of the software. Most of the existing methods are based on non-trainable solutions, which calculate the score of each node according to a certain calculation rule, and cannot fully utilize the structural information available in the software network. To solve these problems, a supervised deep learning method is proposed based on graph neural network technology. First, the project is built as a software network and the network embedding learning method Node2Vec is used to learn the node representation. Then, the node representation is mapped into a score through a simple dense network. Second, the aggregation function of the graph neural networks (GNNs) is improved to aggregate important scores instead of node embedding. The direction and weight information between nodes are also considered when aggregating the scores of neighbor nodes. Finally, the nodes are ranked in descending order according to the predicted score output by the model. To evaluate the effectiveness of the proposed method, it is applied to eight Java open-source software systems. The experimental results show that the proposed method performs better than benchmark methods. In the top 10 key candidates, the proposed method achieves 6.4% higher recall and 3.5% higher precision than the state-of-the-art.
    2023,34(6):2526-2542, DOI: 10.13328/j.cnki.jos.006848
    [Abstract] (519) [HTML] (15) [PDF 1.97 M] (603)
    Abstract:
    Since the release of Android, it has become the most widely used mobile phone operating system in the world due to its advantages such as open source, rich hardware, and diverse application markets. At the same time, the explosive growth of Android devices and Android applications (app for short) has made it a target of 96% of mobile malware. Among current detection methods, the direct extraction of simple program features, ignoring the program semantics is fast but less accurate, and the conversion of semantic information of programs into graph models for analysis improves accuracy but has high runtime overhead and is not very scalable. To address these challenges, the program semantics of an App is distilled into a function call graph and the API call is abstracted to convert the call graph into a simpler graph. Finally, these vectors are fed into a graph convolution network (GCN) model to train a classifier with triplet loss (i.e., SriDroid). After conducting experimental analysis on 20 246 Android apps, it is found that SriDroid can achieve 99.17% malware detection accuracy with sound robustness.
    2023,34(6):2543-2561, DOI: 10.13328/j.cnki.jos.006849
    [Abstract] (833) [HTML] (14) [PDF 2.13 M] (534)
    Abstract:
    As software becomes more complex, the need for research on vulnerability detection is increasing. The rapid discovery and patching of software vulnerabilities is able to minimize the damage caused by vulnerabilities. As an emerging detection method, deep learning-based vulnerability detection methods can learn from the vulnerability code and automatically generate its implied vulnerability pattern, saving a lot of human effort. However, deep learning-based vulnerability detection methods are not yet perfect; function-level detection methods have a coarse detection granularity with low detection accuracy; slice-level detection methods can effectively reduce sample noise, but there are still the following two aspects of the problem: On the one hand, most of the existing methods use artificial vulnerability datasets for experiments, and the ability to detect vulnerabilities in real environments is still in doubt; on the other hand, the work is only dedicated to detecting the existence of vulnerabilities in the slice samples and the lack of interpretability of the detection results. To address above issues, this study proposes a slice-level vulnerability detection and interpretation method based on the graph neural network. The method first normalizes the C/C++ source code and extracts slices to reduce the interference of redundant information in the samples; secondly, a graph neural network model is used to embed the slices to obtain their vector representations to preserve the structural information and vulnerability features of the source code; then the vector representations of slices are fed into the vulnerability detection model for training and prediction; finally, the trained vulnerability detection model and the vulnerability slices to be explained are fed into the vulnerability interpreter to obtain the specific lines of vulnerability code. The experimental results show that in terms of vulnerability detection, the method achieves an F1 score of 75.1% for real-world vulnerability, which is 41.2%-110.4% higher than the comparative methods. In terms of vulnerability interpretation, the method can reach 73.6% accuracy when limiting the top 10% of critical nodes, which is 8.9% and 24.9% higher than the other two interpreters, and the time overhead is reduced by 42.5% and 15.4%, respectively. Finally, this method correctly detects and explains 59 real vulnerabilities in the four open-source software, proving its practicality in real-world vulnerability discovery.
    2023,34(6):2562-2585, DOI: 10.13328/j.cnki.jos.006853
    Abstract:
    The entity evolutionary coupling analysis of software systems is helpfulfor analysis activities such as co-change candidate prediction, risk identification of software supply chain, code vulnerability traceability, defect prediction and architecture problem localization. The evolutionary coupling between two entities indicates that these entities tend to be changed together in the software revision history. Existing methods present a low accuracy to detect the frequent "having distance" co-change in the revision history. To address this problem, this study proposes an evolutionary coupling analysis method based on the combination of association rule mining, episode mining and latent semantic indexing (association rule, MINEPI and LSI based method, AR-MIM), which mines co-change relations of "having distance". The experiment verified the effectiveness of AR-MIM by compared with the four baseline methods on the dataset, collecting 58 Python projects, 242 074 pieces of training data, and 330 660 pieces of ground truth. The results show that the precision, recall, and F1 score of AR-MIM are better than those of existing methods in co-change candidate prediction.
    2023,34(6):2586-2605, DOI: 10.13328/j.cnki.jos.006851
    Abstract:
    Providing formal guarantees for self-driving cars is a challenging task, since input-output space (i.e., all possible combinations of inputs and outputs) is too large to explore exhaustively. This paper presents an automated verification technique ensuring steering angle safety for self-driving cars by incorporating convex optimization and deep learning verification (DLV). DLV is an automated verification framework for safety of image classification neural networks. The DLV is extended by convex optimization technique in fail-safe trajectory planning to solve the judgement problem of predicted steering angle, and thus, to achieve verification of steering angle safety for self-driving cars. The benefits of the proposed approach are demonstrated on the NVIDIA's end-to-end self-driving architecture, which is a crucial ingredient in many modern self-driving cars. The experimental results indicate that the proposed technique can successfully find adversarial misclassifications (i.e., incorrect steering decisions) within given regions and family of manipulations if they exist. Therefore, the safety verification can be achieved (if no misclassification is found for all DNN layers, in which case the network can be said to be stable or reliable w.r.t. steering decisions) or falsification (in which case the adversarial examples can be used to fine-tune the network).
    2023,34(6):2606-2627, DOI: 10.13328/j.cnki.jos.006850
    [Abstract] (637) [HTML] (14) [PDF 2.34 M] (490)
    Abstract:
    Software occupies an increasingly important position in various fields of the national economy. Under the background of the Internet of everything, interaction, analysis and collaboration of information are becoming more and more common, and dependencies among programs/softwares are increasing. It makes people put forward higher requirements for system reliability and robustness. A software supply chain consists of open source components and third-party components, and its security problems have become the focus of both academia and industry in recent years. As an important part of open source software, library functions are closely related to the security of the software supply chain. In order to improve software development efficiency, software libraries or application programming interfaces (APIs) will be frequently used in the process of programming, but errors or vulnerabilities in library functions may be exploited by attackers to compromise the security of the software supply chain. These errors or vulnerabilities are often related to exceptions in library functions. Therefore, the exception analysis methods of library functions are summarized from the two aspects of accuracy and efficiency in this study. The basic idea and important process of each exception analysis method are described, and a preliminary solution is given for the challenges faced by library function exception analysis. Exception analysis of library functions in the software supply chain is helpful to enhance the robustness of software system and to ensure the security of the software supply chain.
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    Available online:  June 07, 2023 , DOI: 10.13328/j.cnki.jos.006817
    Abstract:
    The ranking function method is the main method for the termination analysis of loops, and it indicates that loop programs can be terminated. In view of single-path linear constraint loop programs, this study presents a new method to analyze the termination of the loops. Based on the calculation of the normal space of the increasing function, this method considers the calculation of the ranking function in the original program space as that in the subspace. Experimental results show that the method can effectively verify the termination of most loop programs in the existing literature.
    Available online:  June 07, 2023 , DOI: 10.13328/j.cnki.jos.006897
    Abstract:
    Multi-behavior recommendation aims to utilize interactive data from multiple behaviors of users to improve recommendation performance. Existing multi-behavior recommendation methods generally directly exploit the multi-behavior data for the shared initialized user representations and involve the mining of user preferences and modeling of relationships among different behaviors in the tasks. However, these methods ignore the data imbalance under different interactive behaviors (the amount of interactive data varies greatly among different behaviors) and the information loss caused by the adaptation to the above two tasks. User preferences refer to the interests that users exhibit in different behaviors (e.g., browsing preferences), and the relationship among behaviors indicates a potential conversion from one behavior to another behavior (e.g., the conversion from browsing to purchasing). In multi-behavior recommendation, the mining of user preferences and the modeling of relationships among different behaviors can be regarded as a two-stage task. On the basis of the above considerations, the model of two-stage learning for multi-behavior recommendation (TSL-MBR for short) is proposed, which decouples the above two tasks with a two-stage strategy. In particular, the model retains the end-to-end structure and learns the two tasks by alternating training with fixed parameters. The first stage is to model user preferences under different behaviors. In this stage, the interactive data from all behaviors (without distinction as to behavior type) are first used to model the global preferences of users to alleviate the problem of data sparsity to the greatest extent. Then, the interactive data of each behavior are used to refine the behavior-specific user preference (local preference) and thus lessen the influence of the data imbalance among different behaviors. The second stage is to model the relationships among different behaviors. In this stage, the mining of user preferences and modeling of relationships among different behaviors are decoupled to relieve the information loss problem caused by adaptation to the two tasks. This two-stage model significantly improves the system’s ability to predict target behaviors. Extensive experimental results show that TSL-MBR can substantially outperform the state-of-the-art baseline models, achieving 103.01% and 33.87% of relative gains on average over the best baseline on the Tmall and Beibei datasets, respectively.
    Available online:  May 31, 2023 , DOI: 10.13328/j.cnki.jos.006827
    Abstract:
    Microservice architectures have been widely deployed and applied, which can greatly improve the efficiency of software system development, reduce the cost of system update and maintenance, and enhance the extendibility of software systems. However, However, microservices are characterized by frequent changes and heterogeneous fusion, which result in frequent faults, fast fault propagation, and great influence. Meanwhile, complex call dependency or logical dependency between microservices makes it difficult to locate and diagnose faults timely and accurately, which poses a challenge to the intelligent operation and maintenance of microservice architecture systems. The service dependency discovery technology identifies and deduces the call dependency or logical dependency between services from data during system running and constructs a service dependency graph, which helps to timely and accurately discover and locate faults and diagnose causes during system running and is conducive to intelligent operation and maintenance requirements such as resource scheduling and change management. This study first analyzes the problem of service dependency discovery in microservice systems and then summarizes the technical status of the service dependency discovery from the perspective of three types of runtime data, such as monitoring data, system log data, and trace data. Then, based on the fault cause location, resource scheduling, and change management of the service dependency graph, the study discusses the application of service dependency discovery technology to intelligent operation and maintenance. Finally, the study discusses how service dependency discovery technology can accurately discover call dependency or logical dependency and use service dependency graph to conduct change management and predicts future research directions.
    Available online:  May 24, 2023 , DOI: 10.13328/j.cnki.jos.006894
    Abstract:
    Deep learning has achieved great success in image classification, natural language processing, and speech recognition. Data augmentation can effectively increase the scale and diversity of training data, thereby improving the generalization of deep learning models. However, for a given dataset, a well-designed data augmentation strategy relies heavily on expert experience and domain knowledge and requires repeated attempts, which is time-consuming and labor-intensive. In recent years, automated data augmentation has attracted widespread attention from the academic community and the industry through the automated design of data augmentation strategies. To solve the problem that existing automated data augmentation algorithms cannot strike a good balance between prediction accuracy and search efficiency, this study proposes an efficient automated data augmentation algorithm SGES AA based on a self-guided evolution strategy. First, an effective continuous vector representation method is designed for the data augmentation strategy, and then the automated data augmentation problem is converted into a search problem of continuous strategy vectors. Second, a strategy vector search method based on the self-guided evolution strategy is presented. By introducing historical estimation gradient information to guide the sampling and updating of exploration points, it can effectively avoid the local optimal solution while improving the convergence of the search process. The results of extensive experiments on image, text, and speech datasets show that the proposed algorithm is superior to or matches the current optimal automated data augmentation methods without significantly increasing the time consumption of searches.
    Available online:  May 24, 2023 , DOI: 10.13328/j.cnki.jos.006823
    Abstract:
    How to reduce secure and repeated replies is a challenging problem in the open-domain multi-turn dialogue model. However, the existing open-domain dialogue models often ignore the guiding role of dialogue objectives and how to introduce and select more accurate knowledge information in dialogue history and dialogue objectives. Based on these phenomena, this study proposes a multi-turn dialogue model based on knowledge enhancement. Firstly, the model replaces the notional words in the dialogue history with semaphores and domain words, so as to eliminate ambiguity and enrich the dialogue text representation. Then, the knowledge-enhanced dialogue history and expanded triplet world knowledge are effectively integrated into the knowledge management and knowledge copy modules, so as to integrate information of knowledge, vocabularies, dialogue history, and dialogue objectives and generate diverse responses. The experimental results and visualization on two international benchmark open-domain Chinese dialogue corpora verify the effectiveness of the proposed model in both automatic evaluation and human judgment.
    Available online:  May 24, 2023 , DOI: 10.13328/j.cnki.jos.006813
    Abstract:
    Migrating from monolithic systems to microservice systems is one of the mainstream options for the industry to realize the reengineering of legacy systems, and microservice architecture refactoring based on monolithic legacy systems is the key to realizing migration. Currently, academia mainly focuses on the research on microservice identification methods, and there are many industry practices of legacy systems refactored into microservices. However, systematic approaches and efficient and robust tools are insufficient. Therefore, based on earlier research on microservices identification and model-driven development method, this study presents MSA-Lab, an integrated design platform for microservice refactoring of monolithic legacy systems based on the model-driven development approach. MSA-Lab analyzes the method call sequence in the running log of the monolithic legacy system, identifies and clusters classes and data tables for constructing abstract microservices, and generates a system architecture design model including the microservice diagram and microservice sequence diagram. The model has two core components: MSA-Generator for automatic microservice identification and design model generation and MSA-Modeller for visualization, interactive modeling, and model syntax constraint checking of microservice static structure and dynamic behavior models. This study conducts experiments in the MSA-Lab platform for effectiveness, robustness, and function transformation completeness on four open-source projects and carries out performance comparison experiments with three same-type tools. The results show that the platform has excellent effectiveness and robustness, function transform completeness for running logs, and superior performance.
    Available online:  May 24, 2023 , DOI: 10.13328/j.cnki.jos.006822
    Abstract:
    Data replication is an important way to improve the availability of distributed databases. By placing multiple database replicas in different regions, the response speed of local reading and writing operations can be increased. Furthermore, increasing the number of replicas can improve the linear scalability of the read throughput. In view of these advantages, a number of multi-replica distributed database systems have emerged in recent years, including some mainstream systems from the industry such as Google Spanner, CockroachDB, TiDB, and OceanBase, as well as some excellent systems from academia such as Calvin, Aria, and Berkeley Anna. However, these multi-replica databases bring a series of challenges such as consistency maintenance, cross-node transactions, and transaction isolation while providing many benefits. This study summarizes the existing replication architecture, consistency maintenance strategy, cross-node transaction concurrency control, and other technologies. It also analyzes the differences and similarities between several representative multi-replica database systems in terms of distributed transaction processing. Finally, the study builds a cross-region distributed cluster environment on Alibaba Cloud and conducts multiple experiments to study the distributed transaction processing performance of these several representative systems.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006727
    Abstract:
    This study presents the existing and optimized implementation methods for batched lower-upper (LU) matrix decomposition and batched inversion algorithms on the graphics processing unit (GPU). For batched LU decomposition, the study analyzes the number of reads and writes to the global memory when the Left-looking, Right-looking, and other commonly used blocked LU decomposition algorithms are implemented on the GPU. The blocked Left-looking algorithm with less memory access data is selected due to the characteristics of the GPU architecture. In the process of pivoting during LU decomposition, a parallel binary tree search algorithm suitable for the GPU architecture is adopted. In addition, to reduce the impact of the row interchange process caused by the pivoting on the performance of the algorithm, this study proposes two optimization techniques, namely, the Warp-based packet row interchange and row interchange delay. For batched inversion after LU decomposition, this study investigates the correction method employed in the matrix inversion process. When batched inversion is implemented on the GPU, a blocked matrix inversion algorithm with delayed correction is adopted to reduce access to the global memory during the correction. Furthermore, to speed up data reading and writing, the study adopts the optimization method of using more registers and shared memory and that of performing column interchange to reduce memory access data. In addition, a method of dynamic GPU resource allocation during operation is proposed to avoid the idleness of threads and the waste of shared memory and other GPU resources. Compared with the static one-time resource allocation method, the dynamic allocation method improves the performance of the algorithm significantly. Finally, 10000 random matrices with sizes between 33 and 190 data are tested on the TITAN V GPU, and the types of the tested data are single-precision complex, double-precision complex, single-precision real, and double-precision real. The floating-point arithmetic performance of the batched LU decomposition algorithm implemented in this study reaches about 2 TFLOPS, 1.2 TFLOPS, 1 TFLOPS, and 0.67 TFLOPS, respectively. This algorithm achieves the highest speedup of about 9×, 8×, 12×, and 13×, respectively, compared with the implementation in CUBLAS. The highest speedup achieved is about 1.2×–2.5×, 1.2×–3.2×, 1.1×–3× and 1.1×–2.7×, respectively, compared with the implementation in MAGMA. The floating-point arithmetic performance of the proposed batched inversion algorithm can reach about 4 TFLOPS, 2 TFLOPS, 2.2 TFLOPS, and 1.2 TFLOPS, respectively. This algorithm achieves the highest speedup of about 5×, 4×, 7×, and 7×, respectively, compared with the implementation in CUBLAS. The speedup is about 2×–3×, 2×–3×, 2.8×–3.4× and 1.6×–2×, respectively, compared with the implementation in MAGMA.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006728
    Abstract:
    In recent years, the research on influence maximization (IM) for social networks has attracted extensive attention from the scientific community due to the emergence of major application issues, such as information dissemination on the Internet and the blocking of COVID-19’s transmission chain. IM aims to identify a set of optimal influence seed nodes that would maximize the influence of information dissemination according to the propagation model for a specific application issue. The existing IM algorithms mainly focus on unidirectional-link influence propagation models and simulate IM issues as issues of optimizing the selection of discrete influence seed node combinations. However, they have a high computational time complexity and cannot be applied to solve IM issues for signed networks with large-scale conflicting relationships. To solve the above problems, this study starts by building a positive-negative influence propagation model and an IM optimization model readily applicable to signed networks. Then, the issue of selecting discrete seed node combinations is transformed into one of continuous network weight optimization for easier optimization by introducing a deep Q network composed of neural networks to select seed node sets. Finally, this study devises an IM algorithm based on evolutionary deep reinforcement learning for signed networks (SEDRL-IM). SEDRL-IM views the individuals in the evolutionary algorithm as strategies and combines the gradient-free global search of the evolutionary algorithm with the local search characteristics of reinforcement learning. In this way, it achieves the effective search for the optimal solution to the weight optimization issue of the Deep Q Network and further obtains the set of optimal influence seed nodes. Experiments are conducted on the benchmark signed network and real-world social network datasets. The extensive results show that the proposed SEDRL-IM algorithm is superior to the classical benchmark algorithms in both the influence propagation range and the solution efficiency.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006729
    Abstract:
    Social media topic detection aims to mine latent topic information from large-scale short posts. It is a challenging task as posts are short in form and informal in expression and user interactions in social media are complex and diverse. Previous studies only consider the textual content of posts or simultaneously model social contexts in homogeneous situations, ignoring the heterogeneity of social networks. However, different types of user interactions, such as forwarding and commenting, could suggest different behavior patterns and interest preferences and reflect different attention to the topic and understanding of the topic. In addition, different users have different influences on the development and evolution of the same topic. Specifically, compared with ordinary users, the leading authoritative users in a community play a more important role in topic inference. For the above reasons, this study proposes a novel multi-view topic model (MVTM) to infer more complete and coherent topics by encoding heterogeneous social contexts in the microblog conversation network. For this purpose, an attributed multiplex heterogeneous conversation network is built according to the interaction relationships among users and decomposed into multiple views with different interaction semantics. Then, the embedded representation of specific views is obtained by leveraging neighbor-level and interaction-level attention mechanisms, with due consideration given to different types of interactions and the importance of different users. Finally, a multi-view neural variational inference method is designed to capture the deep correlations among different views and adaptively balance their consistency and independence, thereby obtaining more coherent topics. Experiments are conducted on a Sina Weibo dataset covering three months, and the results reveal the effectiveness of the proposed method.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006730
    Abstract:
    Multiple-choice reading comprehension typically adopts the two-stage pipeline framework of evidence extraction and answer prediction, and the effect of answer prediction highly depends on evidence sentence extraction. Traditional evidence extraction methods mostly rely on phrase matching or supervise evidence extraction with noise labels. The resultant unsatisfactory accuracy significantly reduces the performance of answer prediction. To address the above problem, this study proposes a multiple-choice reading comprehension method based on multi-view graph encoding in a joint learning framework. The correlations among the sentences in the text and those of such sentences with question sentences are fully explored from multiple views to effectively model evidence sentences and their relationships. Moreover, evidence extraction and answer prediction tasks are jointly trained so that the strong correlations of the evidence with the answers can be exploited for joint learning, thereby improving the performance of evidence extraction and answer prediction. Specifically, this method encodes texts, questions, and candidate answers jointly with the multi-view graph encoding module. The relationships among the texts, questions, and candidate answers are captured from the three views of statistical characteristics, relative distance, and deep semantics, thereby obtaining question-answer-aware text encoding features. Then, a joint learning module combining evidence extraction with answer prediction is built to strengthen the relationships of evidence with answers through joint training. The evidence extraction submodule is designed to select evidence sentences and fuse the results with text encoding features selectively. The fusion results are then used by the answer prediction submodule to complete the answer prediction. Experimental results on the multiple-choice reading comprehension datasets ReCO and RACE demonstrate that the proposed method attains a higher ability to select evidence sentences from texts and ultimately achieves higher accuracy of answer prediction. In addition, joint learning combining evidence extraction with answer prediction significantly alleviates the error accumulation problem induced by the traditional pipeline framework.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006731
    Abstract:
    Event planning on event-based social networks (EBSNs) has been attracting research efforts for decades. The key insight of the event planning problem is assigning a group of users to a set of events, such that a pre-defined objective function is maximized, subject to a set of constraints. In real applications, important factors, such as event conflicts, event capacities, user capacities, social preferences between users, event preferences, abbreviated as conflict, capacity, and preference (CCP), are necessary to be considered in the event planning. This work summarizes these important factors as conflict, capacity, and preference, denoted by CCP for short. Existing works do not consider CCP when computing event plans. Hence, this paper proposes CCP-based event planning problem on event-based social networks, so that more reasonable event plans can be obtained. Since this is the first time to propose CCP-based event planning problem, none of the existing methods can be directly applied to solve it. Compared with previous event planning problem that only considers part of CCP factors, the challenges of solving the CCP-based event planning problem include the complexity of the new problem, more constraints involved. Hence, this paper proposes these algorithms to solve the CCP-based event planning problem in efficient and effective ways. Extensive experiments are conducted and the results prove the effectiveness and the efficiency of the proposed algorithms.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006733
    Abstract:
    WiFi is one of the most important communication modes at present, and indoor localization systems based on WiFi signals are most promising for widespread deployment and application in daily life. The latest research shows that such a system can achieve submeter-level localization accuracy when it utilizes the channel state information (CSI) obtained during WiFi communication for target localization. However, the accuracy of localization in experimental scenarios depends on many factors, such as the location of the test points, the layout of the WiFi devices, and that of the antennas. Moreover, the WiFi localization systems deployed often fail to provide the desired accuracy since performance prediction methods for WiFi CSI localization are still unavailable. For the above reasons, this study develops a performance prediction model for WiFi CSI localization that applies to diverse scenarios. Specifically, the study defines the error infinitesimal function between a pair of antennas on the basis of the basic physical CSI localization model. The error infinitesimal matrix and the corresponding heat map of localization performance are generated by analyzing the localization space. Then, multi-antenna fusion and multi-device fusion methods are adopted to extend the antenna pairs, thereby constructing a general performance prediction model for CSI localization. Finally, the study proposes integrating the abovementioned heat map with scenario maps to give due consideration to actual scenario maps and ultimately provide a customized performance prediction solution for a given scenario. In addition to the theoretical analysis, this study verifies the effectiveness of the proposed performance prediction model for localization with experimental data in two scenarios. The experimental results show that the actual localization accuracy is consistent with the proposed theoretical model in variation trend, and the model optimizes the localization accuracy by 32%–37%.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006734
    Abstract:
    Time series segmentation is an important research direction in the field of data mining. At present, the time series segmentation technique based on matrix profile (MP) has received increasing attention from researchers and has achieved great research results. However, this technique and its derivative algorithms also have their own short comings. For one thing, the matching of similar subsequences in the case of arcs crossing non-target activity states arises when the fast low-cost semantic segmentation algorithm based on MP is employed for time series segmentation of a given activity state and the nearest neighbors are connected by arcs. For another, the existing segmentation point extraction algorithm uses a given length window when extracting segmentation points. In this case, the segmentation points obtained are highly likely to exhibit large deviations from the real values, which reduces the accuracy. To address the above problems, this study proposes a time series segmentation algorithm limiting the arc cross, namely limit arc curve cross-FLOSS (LAC-FLOSS). This algorithm adds weights to arcs to obtain a kind of weighted arcs and solves the subsequence mismatch problem caused by the state crossing of the arcs by setting a matching distance threshold. In addition, an improved segmentation point extraction algorithm, namely, the improved extract regimes (IER) algorithm, is proposed. This algorithm extracts the extremes from the troughs according to the shape properties of the sequence of corrected arc crossings (CAC), thereby avoiding the problem that segmentation points are obtained at non-inflection points when the windows are used directly. Comparative experiments are conducted on the public datasets datasets_seg and MobiAct, and the results verify the feasibility and effectiveness of the above two solutions.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006735
    Abstract:
    Amid the in-depth integration of information technology and education, the booming online education has become the new normal of the education informatization process and has generated massive amounts of education data. However, online education also faces high dropout rates, low course completion rates, insufficient supervision, and other problems. How to mine and analyze the massive education data is the key to solving these problems. A learning community is a learning organization with learners as its core element, and it emphasizes the interactive communication, resource sharing, and collaborative learning among the learners in the learning process so that common learning tasks or goals can be completed. This study reviews, analyzes, and discusses the prospect of the research on learning communities in the online education environment. Firstly, the background and importance of learning communities in the online education environment are outlined. Secondly, the definitions of a learning community in different disciplines are presented. Thirdly, the construction methods for three types of learning communities, namely, homogeneous, heterogeneous, and hybrid learning communities, are summarized. Fourthly, the management mechanism for learning communities is discussed from the three aspects of sharing, collaboration, and incentive. Last but not least, directions for future research on learning communities are suggested.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006736
    Abstract:
    Video description technology aims to automatically generate textual descriptions with rich content for videos, and it has attracted extensive research interest in recent years. An accurate and elaborate method of video description generation not only should have achieved a global understanding of the video but also depends heavily on the local spatial and time-series features of specific salient objects. How to model a better video feature representation has always been an important but difficult part of video description tasks. In addition, most of the existing work regards a sentence as a chain structure and views a video description task as a process of generating a sequence of words, ignoring the semantic structure of the sentence. Consequently, the currently available algorithms are unable to handle and optimize complex sentence descriptions or avoid logical errors commonly seen in the long sentences generated. To tackle the problems mentioned above, this study proposes a novel generation method for interpretable video descriptions guided by language structure. Due consideration is given to both local object information and the semantic structure of the sentence by designing an attention-based structured tubelet localization mechanism. When it is incorporated with the parse tree constructed from sentences, the proposed method can adaptively attend to corresponding spatial-temporalfeatures with textual contents and further improve the performance of video description generation. Experimental results on mainstream benchmark datasets of video description tasks, i.e., Microsoft research video captioning corpus (MSVD) and Microsoft research video to text (MSR-VTT), show that the proposed approach achieves state-of-the-art performance on most of the evaluation metrics.
    Available online:  May 18, 2023 , DOI: 10.13328/j.cnki.jos.006737
    Abstract:
    Due to the continuous breakthrough and development of information and communication technologies, information access has become convenient on the one hand. On the other hand, private information is now easier to leak than before. The combination of the intelligent field and secure multiparty computation (SMC) technology is expected to solve privacy protection problems. Although SMC has solved many different privacy protection problems so far, problems that remain to be settled are numerous. Research results about the SMC of range and the sum of extremums are currently seldom reported. As a common statistical tool, range and sum of extremums have been widely used in practice. Therefore, the secure computation of range and the sum of extremes are of great research significance. This study proposes a new encoding method and solves two types of SMC problems by the method: One is the secure computation of range, and the other is that of the sum of extremums. The new encoding method is combined with the Lifted ElGamal threshold cryptosystem to design a secure range computation protocol for distributed private datasets in the scenario in which multiple parties participate and each party has one data. Then, the new encoding method is slightly modified for the secure computation of the sum of extremums in the same scenario. On this basis, the study further modifies the new encoding method and combines it with the Paillier cryptosystem to design a protocol for the secure computation of range and the sum of extremums on distributed private datasets in the scenario in which two parties participate and each party has more than one data. Furthermore, this study proves that the proposed protocols are secure in the semi-honest model with the simulation paradigm. Finally, the complexities of these protocols are tested by simulation experiments. The results of the efficiency analysis and experiments show that the simple and efficient proposed protocols can be widely used in practical applications and are important tools for solving many other SMC problems.
    Available online:  May 17, 2023 , DOI: 10.13328/j.cnki.jos.006826
    Abstract:
    Recently, with the popularity of ubiquitous computing, intelligent sensing technology has become the focus of researchers, and non-contact sensing based on WiFi is more and more popular in academia and industry because of its excellent generality, low deployment cost, and great user experience. The typical non-contact sensing work based on WiFi includes gesture recognition, breath detection, intrusion detection, behavior recognition, etc. For real-life deployment of these works, one of the major challenges is to avoid the interference of irrelevant behaviors in other irrelevant areas, so it is necessary to judge whether the target is in a specific sensing area or not, which means that the system should be able to determine exactly which side of the boundary line the target is on. However, the existing work cannot find a way to accurately monitor a freely set boundary, which hinders the actual implementation of WiFi-based sensing applications. In order to solve this problem, based on the physical essence of electromagnetic wave diffraction and the Fresnel diffraction model, this study finds a signal feature, namely Rayleigh distribution in Fresnel diffraction model (RFD), when the target passes through the link (the line between the WiFi receiver and transmitter antennas) and reveals the mathematical relationship between the signal feature and human activity. Then, the study realizes a boundary monitoring algorithm through line crossing detection by using the link as the boundary and considering the waveform delay caused by antenna spacing and the features of automatic?gain?control (AGC) when the link is blocked. On this basis, the study also implements two practical applications, that is, intrusion detection system and home state detection system. The intrusion detection system achieves a precision of more than 89% and a recall rate of more than 91%, while the home state detection system achieves an accuracy of more than 89%. While verifying the availability and robustness of the boundary monitoring algorithm, the study also shows the great potential of combining the proposed method with other WiFi-based sensing technologies and provides a direction for the actual deployment of WiFi-based sensing technologies.
    Available online:  May 17, 2023 , DOI: 10.13328/j.cnki.jos.006821
    Abstract:
    As challenges such as serious occlusions and deformations coexist, video segmentation with accurate robustness has become one of the hot topics in computer vision. This study proposes a video segmentation method with absorbing Markov chains and skeleton mapping, which progressively produces accurate object contours through the process of pre-segmentation—optimization—improvement. In the phase of pre-segmentation, based on the twin network and the region proposal network, the study obtains regions of interest for objects, constructs the absorbing Markov chains of superpixels in these regions, and calculates the labels of foreground/background of the superpixels. The absorbing Markov chains can perceive and propagate the object features flexibly and effectively and preliminarily pre-segment the target object from the complex scene. In the phase of optimization, the study designs the short-term and long-term spatial-temporal cue models to obtain the short-term variation and the long-term feature of the object, so as to optimize superpixel labels and reduce errors caused by similar objects and noise. In the phase of improvement, to reduce the artifacts and discontinuities of optimization results, this study proposes an automatic generation algorithm for foreground/background skeleton based on superpixel labels and positions and constructs a skeleton mapping network based on encoding and decoding, so as to learn the pixel-level object contour and finally obtain accurate video segmentation results. Many experiments on standard datasets show that the proposed method is superior to the existing mainstream video segmentation methods and can produce segmentation results with higher region similarity and contour accuracy.
    Available online:  May 17, 2023 , DOI: 10.13328/j.cnki.jos.006810
    Abstract:
    As the trusted decentralized application, smart contracts attract widespread attention, whereas their security vulnerabilities threaten the reliability. To this end, researchers employ various advanced technologies (such as fuzz testing, machine learning, and formal verification) to study several vulnerability detection technologies and yield sound effects. This study collects 84 related papers by July 2021 to systematically sort out and analyze existing vulnerability detection technologies of smart contracts. First of all, vulnerability detection technologies are categorized according to their core methodologies. These technologies are analyzed from the aspects of implementation methods, vulnerability categories, and experimental data. Additionally, the differences between domestic and international research in these aspects are compared. Finally, after summarizing the existing technologies, the study discusses the challenges of vulnerability detection technologies and potential research directions.
    Available online:  May 17, 2023 , DOI: 10.13328/j.cnki.jos.006814
    Abstract:
    Efficient mobile charging scheduling is a key technology to build wireless rechargeable sensor networks (WRSN) which have long life cycle and sustainable operation ability. The existing charging methods based on reinforcement learning only consider the spatial dimension of mobile charging scheduling, i.e., the path planning of mobile chargers (MCs), while leaving out the temporal dimension of the problem, i.e., the adjustment of the charging duration, and thus these methods have suffered some performance limitations. This study proposes a dynamic spatiotemporal charging scheduling scheme based on deep reinforcement learning (SCSD) and establishes a deep reinforcement learning model for dynamic adjustment of charging sequence scheduling and charging duration. In view of the discrete charging sequence planning and continuous charging duration adjustment in mobile charging scheduling, the study uses DQN to optimize the charging sequence for nodes to be charged and calculates and dynamically adjusts the charging duration of the nodes. By optimizing the two dimensions of space and time respectively, the SCSD proposed in this study can effectively improve the charging performance while avoiding the power failure of nodes. Simulation experiments show that SCSD has significant performance advantages over several well-known typical charging schemes.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006815
    Abstract:
    With the development of deep learning and steganography, deep neural networks are widely used in image steganography, especially in a new research direction, namely embedding an image message in an image. The mainstream steganography of embedding an image message in an image based on deep neural networks requires cover images and secret images to be input into a steganographic model to generate stego-images. But recent studies have demonstrated that the steganographic model only needs secret images as input, and then the output secret perturbation is added to cover images, so as to embed secret images. This novel embedding method that does not rely on cover images greatly expands the application scenarios of steganography and realizes the universality of steganography. However, this method currently only verifies the feasibility of embedding and recovering secret images, and the more important evaluation criterion for steganography, namely concealment, has not been considered and verified. This study proposes a high-capacity universal steganography generative adversarial network (USGAN) model based on an attention mechanism. By using the attention module, the USGAN encoder can adjust the perturbation intensity distribution of the pixel position on the channel dimension in the secret image, thereby reducing the influence of the secret perturbation on the cover images. In addition, in this study, the CNN-based steganalyzer is used as the target model of USGAN, and the encoder learns to generate a secret adversarial perturbation through adversarial training with the target model so that the stego-image can become an adversarial example for attacking the steganalyzer at the same time. The experimental results show that the proposed model can not only realize a universal embedding method that does not rely on cover images but also further improves the concealment of steganography.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006816
    Abstract:
    How brains realize learning and perception is an essential question for both artificial intelligence and neuroscience communities. Since the existing artificial neural networks (ANNs) are different from the real brain in terms of structures and computing mechanisms, they cannot be directly used to explore the mechanisms of learning and dealing with perceptual tasks in the real brain. The dendritic neuron model is a computational model to model and simulate the information processing process of neuron dendrites in the brain and is closer to biological reality than ANNs. The use of the dendritic neural network model to deal with and learn perceptual tasks plays an important role in understanding the learning process in the real brain. However, current learning models based on dendritic neural networks mainly focus on simplified dendritic models and are unable to model the entire signal-processing mechanisms of dendrites. To solve this problem, this study proposes a learning model of the biophysically detailed neural network of medium spiny neurons (MSNs). The neural network can fulfill corresponding perceptual tasks through learning. Experimental results show that the proposed model can achieve high performance on the classical image classification task. In addition, the neural network shows strong robustness under noise interference. By further analyzing the network features, this study finds that the neurons in the network after learning show stimulus selectivity, which is a classical phenomenon in neuroscience. This indicates that the proposed model is biologically plausible and implies that stimulus selectivity is an essential property of the brain in fulfilling perceptual tasks through learning.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006801
    Abstract:
    The Olympic heritage is the treasure of the world. The integration of technology, culture, and art is crucial to the diversified presentation and efficient dissemination of the heritage of the Beijing Winter Olympics. As an important trend form of digital museums in the information era, online exhibition halls lay a good foundation in the research on individual digital museums and interactive technologies, but so far, no systematic, intelligent, interactive, and friendly system of the Winter Olympics digital museum has been built. This study proposes an online exhibition hall construction method with interactive feedback for the Beijing 2022 Winter Olympics. By constructing an interactive exhibition hall with intelligent virtual agent, it has further explored the role of interactive feedback in disseminating intangible cultural heritage in a knowledge dissemination-based digital museum. To explore the influence of audio-visual interactive feedback on spreading Olympic spiritual culture in the exhibition hall and improve the user experience, the study conducts a user experiment with 32 participants. The results show that the constructed exhibition hall can greatly promote the dissemination of Olympic culture and spirit, and the introduction of audio-visual interactive feedback in the exhibition hall can improve users’ perceptual control, thereby improving the user experience.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006811
    Abstract:
    Basic linear algebra subprogram (BLAS) is one of the most basic and important math libraries. The matrix-matrix operations covered in the level-3 BLAS functions are particularly significant for a standard BLAS library and are widely employed in many large-scale scientific and engineering computing applications. Additionally, level-3 BLAS functions are computing intensive functions and play a vital role in fully exploiting the computing performance of processors. Multi-core parallel optimization technologies are studied for level-3 BLAS functions on SW26010-Pro, a domestic processor. According to the memory hierarchy of SW26010-Pro, this study designs a multi-level blocking algorithm to exploit the parallelism of matrix operations. Then, a data-sharing scheme based on remote memory access (RMA) mechanism is proposed to improve the data transmission efficiency among CPEs. Additionally, it employs triple buffering and parameter tuning to fully optimize the algorithm and hide the memory access costs of direct memory access (DMA) and the communication overhead of RMA. Besides, the study adopts two hardware pipelines and several vectorized arithmetic/memory access instructions of SW26010-Pro and improves the floating-point computing efficiency of level-3 BLAS functions by writing assembly code manually for matrix-matrix multiplication, matrix equation solving, and matrix transposition. The experimental results show that level-3 BLAS functions can significantly improve the performance on SW26010-Pro by leveraging the proposed parallel optimization. The floating-point computing efficiency of single-core level-3 BLAS is up to 92% of the peak performance, while that of multi-core level-3 BLAS is up to 88% of the peak performance.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006820
    Abstract:
    In large-scale and complex software systems, requirement analysis and generation are accomplished through a top-down process, and the construction of tracking relationships between cross-level requirements is very important for project management, development, and evolution. The loosely-coupled contribution approach of open-source systems requires each participant to easily understand the context and state of the requirements, which relies on cross-level requirement tracking. The issue description log is a common way of presenting requirements in open-source systems. It has no fixed template, and its content is diverse (including text, code, and debugging information). Furthermore, the terms can be freely used, and the gap in abstraction level between cross-level requirements is large, which brings great challenges to automatic tracking. In this paper, a correlation feedback method for key feature dimensions is proposed. Through static analysis of the project’s code structure, code-related terms and their correlation strength are extracted, and a code vocabulary base is constructed to alleviate the gap in abstraction level and the inconsistency of terminology between cross-level requirements. By measuring the importance of terms to requirement description and screening key feature dimensions on this basis, the inquiry statement is optimized to effectively reduce the noise of requirement description length, content form, and other aspects. Experiments with two scenarios on three open-source systems suggest that the proposed method outperforms baseline approaches in cross-level requirement tracking and improves F2 value to 29.01%, 7.75.1%, and 59,21% compared with vector space model (VSM), standard Rocchio, and trace bidirectional encoder representations from transformers (BERT), respectively.
    Available online:  May 10, 2023 , DOI: 10.13328/j.cnki.jos.006824
    Abstract:
    Remaining process time prediction is important for preventing and intervening in abnormal business operations. For predicting the remaining time, existing approaches have achieved high accuracy through deep learning techniques. However, most of these techniques involve complex model structures, and the prediction results are difficult to be explained, namely, unexplainable issues. In addition, the prediction of the remaining time usually uses the key attribute, namely activity, or selects several other attributes as the input features of the predicted model according to the domain knowledge. However, a general feature selection method is missing, which may affect both prediction accuracy and model explainability. To tackle these two challenges, this study introduces a remaining process time prediction framework based on an explainable feature-based hierarchical (EFH) model. Specifically, a feature self-selection strategy is first proposed, and the attributes that have a positive impact on the prediction task are obtained as the input features of the model through the backward feature deletion based on priority and the forward feature selection based on feature importance. Then an EFH model is proposed. The prediction results of each layer are obtained by adding different features layer by layer, so as to explain the relationship between input features and prediction results. The study also uses the light gradient boosting machine (LightGBM) and long short-term memory (LSTM) algorithms to implement the proposed approach, and the framework is general and not limited to the algorithms selected in this study. Finally, the proposed approach is compared with other methods on eight real-life event logs. The experimental results show that the proposed approach can select effective features and improve prediction accuracy. In addition, the prediction results are explained.
    Available online:  April 27, 2023 , DOI: 10.13328/j.cnki.jos.006678
    Abstract:
    As a software system is a complex artifact, the interaction between classes exerts a potential impact on software quality, with the cascading propagation effect of software defects as a typical case. How to accurately predict the reasonable relationship between classes in the software system and optimize the design structure is still an open problem in software quality assurance. From the perspective of software network, this study comprehensively considers the interactions between classes in a software system (class external graph, CEG), and those between internal methods of each class (class internal graph, CIG). The software system is abstracted into a software network with a graph of graphs structure. As a result, a class interaction prediction method based on the graph of graphs convolutional network is proposed. Firstly, the initial characteristics of class nodes are obtained through the convolution of each CIG. Then the representation vector of class nodes is updated through the convolution of CEG, and finally, the evaluation values between class nodes are calculated for interaction prediction. The experimental results on six Java open source projects show that the graph of graphs structure is helpful to improve the representation of software system structure. The average growth rates of the area under the curve (AUC) and average precision (AP) of the proposed method are more than 5.5% compared with those of the conventional network embedding methods. In addition, the average growth rates of AUC and AP are more than 9.36% and 5.22%, respectively compared with those of the two peer methods.
    Available online:  April 27, 2023 , DOI: 10.13328/j.cnki.jos.006685
    Abstract:
    Symmetric searchable encryption (SSE) can retrieve encrypted data without disclosing user privacy and has been widely studied and applied in cloud storage. However, in SSE schemes, semi-honest or dishonest servers may tamper with the data in files and return the untrusted files to users, so it is necessary to verify these files. Most existing verifiable SSE schemes are verified by the users locally, and malicious users may forge verification results, which cannot ensure verification fairness. To this end, this study proposes a verifiable dynamic symmetric searchable encryption scheme based on blockchain, VDSSE). VDSSE employs symmetric encryption to achieve forward security in the dynamic updating, and on this basis, the blockchain is utilized to verify the search results. During the verification, a new verification tag, Vtag, is proposed. The accumulation of Vtag is leveraged to compress the verification information, reduce the storage cost of verification information on the blockchain, and effectively support the dynamic verification of SSE schemes. Finally, experimental evaluation and security analysis are conducted on VDSSE to verify the feasibility and security of the scheme.
    Available online:  April 27, 2023 , DOI: 10.13328/j.cnki.jos.006689
    Abstract:
    In rich-resource scenarios, using similarity translation as the target prototype sequence can improve the performance of neural machine translation. However, in low-resource scenarios, due to the lack of parallel corpus resources, the prototype sequence cannot be matched, or the sequence quality is poor. To address this problem, this study proposes a low-resource neural machine translation approach with multi-strategy prototype generation, and the approach includes two phases. (1) Keyword matching and distributed representation matching are combined to retrieve prototype sequences, and the pseudo prototype generation approach is leveraged to generate available prototype sequences during retrieval failures. (2) The conventional encoder-decoder framework is improved for the effective employment of prototype sequences. The encoder side utilizes additional encoders to receive prototype sequences. The decoder side, while employing a gating mechanism to control information flow, adopts improved loss functions to reduce the negative impact of low-quality prototype sequences on the model. The experimental results on multiple datasets show that the proposed method can effectively improve the translation performance compared with the baseline models.
    Available online:  April 27, 2023 , DOI: 10.13328/j.cnki.jos.006720
    Abstract:
    Sparse triangular solve (SpTRSV) is a vital operation in preconditioners. In particular, in scientific computing program that solves partial differential equation systems iteratively, structured SpTRSV is a common type of issue and often a performance bottleneck that needs to be addressed by the scientific computing program. The commercial mathematical libraries tailored to the graphics processing unit (GPU) platform, represented by CUSPARSE, parallelize SpTRSV operations by level-scheduling methods. However, this method is weakened by time-consuming preprocessing and serious GPU thread idle when it is employed to deal with structured SpTRSV issues. This study proposes a parallel algorithm tailored to structured SpTRSV issues. The proposed algorithm leverages the special non-zero element distribution pattern of structured SpTRSV issues during task allocation to skip the preprocessing and analysis of the non-zero element structure of the input issue. Furthermore, the element-wise operation strategy used in the existing level-scheduling methods is modified. As a result, the problem of GPU thread idle is effectively alleviated, and the memory access latency of some non-zero elements in the matrix is concealed. This study also adopts a state variable compression technique according to the task allocation characteristics of the proposed algorithm, significantly improving the cache hit rate of the algorithm in state variable operations. Additionally, several hardware features of the GPU, including predicated execution, are investigated to comprehensively optimize algorithm implementation. The proposed algorithm is tested on NVIDIA V100 GPU, achieving an average 2.71× acceleration over CUSPARSE and a peak effective memory-access bandwidth of 225.2 GB/s. The modified element-wise operation strategy, combined with a series of other optimization measures for GPU hardware, attains a prominent optimization effect by yielding a nearly115% increase in the effective memory-access bandwidth of the proposed algorithm.
    Available online:  April 26, 2023 , DOI: 10.13328/j.cnki.jos.006805
    Abstract:
    The uncertainty of tasks in mobile edge computing scenarios makes task offloading and resource allocation more complex and difficult. Therefore, a continuous offloading and resource allocation method of uncertain tasks in mobile edge computing is proposed. Firstly, a continuous offloading model of uncertain tasks in mobile edge computing is built, and the multi-batch processing technology based on duration slice partition is employed to address task uncertainty. A multi-device computing resource coordination mechanism is designed to improve the carrying capacity of computation-intensive tasks. Secondly, an adaptive strategy selection algorithm based on load balancing is put forward to avoid channel congestion and additional energy consumption caused by the over-allocation of computing resources. Finally, the uncertain task scenario model is simulated based on Poisson distribution, and experimental results show that the reduction of time slice length can reduce the total energy consumption of the system. In addition, the proposed algorithm can achieve task offloading and resource allocation more effectively and can reduce energy consumption by up to 11.8% compared with comparison algorithms.
    Available online:  April 26, 2023 , DOI: 10.13328/j.cnki.jos.006807
    Abstract:
    Emotional dialogue technology focuses on the “emotional quotient” of conversational robots, aiming to give the robots the ability to observe, understand and express emotions as humans do. This technology can be seen as the intersection of emotional computing and dialogue technology, and can simultaneously consider the “intelligent quotient” and “emotional quotient” of conversational robots to realize spiritual companionship, emotional comfort, and psychological guidance for users. Combined with the characteristics of emotions in dialogues, this study provides a comprehensive analysis of emotional dialogue technology: 1) Three important technical points including emotion recognition, emotion management, and emotion expression in dialogue scenarios are shown, and the technology of emotional dialogues in multimodal scenarios is expanded. 2) This study presents the latest research progress on technology points related to emotional dialogues and summarizes the main challenges and possible solutions correspondingly. 3) Data resources for emotional dialogue technologies are introduced. 4) The difficulty and prospect of emotional dialogue technology are pointed out.
    Available online:  April 26, 2023 , DOI: 10.13328/j.cnki.jos.006809
    Abstract:
    In a hybrid cloud environment, enterprise business applications and data are often transferred across different cloud services. For complex and diversified cloud service environments, most hybrid cloud applications adopt access control policies made around only access subjects and adjust the policies manually, which cannot meet the fine-grained dynamic access control requirements at different stages of the data life cycle. This study proposes AHCAC, an adaptive access control method oriented to data life cycle in a hybrid cloud environment. Firstly, the the policy description idea based on key attributes are employed to unify the heterogeneous policies of the full life cycle of data under the hybrid cloud. Especially, the “stage” attribute is introduced to explicitly identify the life-cycle state of data, which is the basis for achieving fine-grained access control oriented to data life cycle. Secondly, in view of the similarity and consistency of access control policy with the same life-cycle stage, the policy distance is defined, and a hierarchical clustering algorithm based on the policy distance is proposed to construct the corresponding data access control policy in each life-cycle stage. Finally, when the life-cycle stage of data is changed, the adaptation and loading of policies of corresponding data stages in the policy evaluation are triggered through key attribute matching, which realizes the adaptive access control oriented to the data life cycle. This study also conducts experiments to verify the effectiveness and feasibility of the proposed method on OpenStack and open-source policy evaluation engine Balana.
    Available online:  April 19, 2023 , DOI: 10.13328/j.cnki.jos.006797
    Abstract:
    In edge computing scenarios, some tasks to be performed will be offloaded to the edge server, which can reduce the load of mobile devices, enhance the performance of mobile applications, and lower the cost of mobile devices. For delay-sensitive tasks, it is critical to ensure they are completed within the deadlines. However, the limited resource of edge servers results in the fact that when data transmission and task processing from multiple devices are received at the same time, some tasks have to wait in queue before they are scheduled. As a result, the long waiting time may cause time-out failure, which will also make it impossible to balance the performance goals of several devices. Therefore, this study optimizes the task scheduling order on the edge server based on computation offloading. Firstly, the task scheduling is modeled as a long-term optimization issue, and the online learning method based on a combination multi-arm bandit is employed to dynamically adjust the scheduling order of the server. Secondly, the dynamically changing order of task execution will lead to different levels of performance enhancement for task offloading, which will influence the validity of offloading decisions. The deep-Q learning method with perturbed reward is adopted to determine the execution sites for tasks to improve the robustness of offloading strategies. Simulation results show that the proposed strategy can balance multiple user objectives and lower the system cost simultaneously.
    Available online:  April 19, 2023 , DOI: 10.13328/j.cnki.jos.006799
    Abstract:
    Several methods have been proposed to address complex questions of knowledge base question answering (KBQA). However, the complex semantic composition and the possible absence of inference paths lead to the poor reasoning effect of complex questions. To this end, this study proposes the CGL-KBQA method based on the global and local features of knowledge graphs. The method employs the knowledge embedding technique to extract the topological structure and semantic features of knowledge graphs as the global features of the candidate entity node, and models the complex questions as a composite triple classification task based on the entity representation and question composition. At the same time, the core inference paths generated by graphs during the search process are utilized as local features, which are then combined with the semantic similarity of questions to construct different dimensional features of the candidate entities and finally form a hybrid feature scorer. Since the final inference paths may be missing, this study also designs a cluster module with unsupervised multi-clustering methods to select final answer clusters directly according to the feature representation of candidate entities, thereby making reasoning under incomplete KG possible. Experimental results show that the proposed method performs well on two common KBQA datasets, especially when KG is incomplete.
    Available online:  April 19, 2023 , DOI: 10.13328/j.cnki.jos.006760
    Abstract:
    In recent years, the localization and tracking of moving targets have been widely used in scenes including indoor navigation, smart homes, security monitoring, and smart medical services. Radio frequency (RF)-based contactless localization and tracking have attracted extensive attention from researchers. Among them, the commercial IR-UWB-based technology can achieve target localization and tracking at low costs and power consumption and has strong development potential. However, most of the existing studies have the following problems: 1) Limited tracking scenes. Modeling and processing methods are only for outdoor or relatively empty indoor scenes under ideal conditions. 2) Limited movement states of targets and unduly ideal modeling. 3) Low tracking accuracy caused by fake moving targets. To solve these problems, this study proposes a moving target tracking method using IR-UWB on the basis of understanding the composition of the received signal spectrum in multipath scenes. First, the dynamic components of the originally received signal spectrum are extracted. Then, the Gaussian blur-based multipath elimination and distance extraction algorithm is employed to eliminate multipath interference, which only retains primary reflection information directly related to the moving target and therefore accurately obtains the distance variation curve of the target. Subsequently, a multi-view fusion algorithm is proposed to fuse the distance information of the devices from different views to achieve accurate localization and tracking of a single freely moving target. In addition, a real-time moving target tracking system based on the low-cost commercial IR-UWB radar is established. The experimental results in the real indoor home scene show that the error between the center position of the human body estimated by the system and the real motion trajectory is always within 20 cm. Moreover, the system remains robust even if influencing factors such as the experimental environment, experimenter, activity speed, and equipment height are altered.
    Available online:  April 19, 2023 , DOI: 10.13328/j.cnki.jos.006761
    Abstract:
    SPN construction is the most widely used overall construction of block ciphers at present, which is adopted by block ciphers such as AES and ARIA. The security analysis of SPN ciphers is a research hotspot in cryptanalysis. The application of the subspace trail cryptanalysis to the typical two-dimensional SPN ciphers and typical three-dimensional SPN ciphers can yield the corresponding subspace trails and general properties based on the subspace trails separately. These properties are independent of the secret key and the detailed definitions of the S-box and MixColumns matrix. They can be specifically described as follows: For a typical two-dimensional SPN cipher whose state can be formalized into a two-dimensional array of n×m, the number of different ciphertext pairs belonging to the same coset of the mixed subspace in the ciphertexts obtained by five rounds of encryption of all plaintexts belonging to the same coset of the quasi-diagonal subspace must be a multiple of 2n–1. For a typical three-dimensional SPN cipher whose state can be formalized into a three-dimensional array of l×n×m, the number of different ciphertext pairs belonging to the same coset of the mixed subspace in the ciphertexts obtained by seven rounds of encryption of all plaintexts belonging to the same coset of the quasi-diagonal subspace must be a multiple of 2nl–1. In addition, this study not only proves these properties but also makes experimental verification on the internal permutations of PHOTON and small-scale variants of Rijndael, 3D, and Saturnin algorithms. The experimental results are completely consistent with these properties.
    Available online:  April 13, 2023 , DOI: 10.13328/j.cnki.jos.006780
    Abstract:
    Core network slicing achieves flexible networking by combining virtualized network functions (VNFs). However, the failure of any VNF due to software and hardware failures will cause an interruption of the slice service. Since network slices share resources, a specific isolation mechanism is required to meet slice robustness demands. Most of the existing availability guarantee mechanisms focus on random VNF failures, and some of them involving external attacks rarely consider special isolation requirements of network slices. To realize slice availability guarantee under isolation mechanisms, this study proposes a method to guarantee network slice availability based on multi-level isolation. First, an availability guarantee model of core network resource awareness is built to meet the isolation requirements with consuming the least number of backup resources. Then, an isolation level assessment model is proposed to evaluate the isolation level of VNFs. Finally, a multi-level isolated backup algorithm (MLIBA) is proposed to solve the availability guarantee problem. In addition, an equivalent backup instance-based calculation method is put forward to address the PP-complete problem of availability calculation for a shared backup. Simulation results show that the proposed availability calculation method has high accuracy, and the introduction of multi-level isolation can double the robustness of slices. The comparison with existing studies shows that under the same isolation constraints and availability targets, the proposed method can reduce resource consumption by 20%–70% and increase the proportion of effective resources by 5%–30%.
    Available online:  April 13, 2023 , DOI: 10.13328/j.cnki.jos.006798
    Abstract:
    In recent years, with the rapid development of blockchain, the types of cryptocurrencies and anonymous transactions have been increasingly diversified. How to make optimal decisions in the transaction type of cryptocurrency market is the concern of users. The users’ decision-making goal is to minimize transaction costs and maximize privacy while ensuring that transactions are packaged. The cryptocurrency trading market is complex, and cryptocurrency technologies differ greatly from each other. Existing studies focus on the Bitcoin market, and few of them discuss other anonymous currency markets such as Zcash and users’ anonymous demands. Therefore, this study proposes a game-based general cryptocurrency trading market model and explores the trading market and users’ decisions on transaction types and costs by combining the anonymous needs of users and employing game theory. Taking Zcash, the most representative optional cryptocurrency, as an example, it analyzes the trading market in combination with the CoinJoin transaction, simulates the trading process about how users and miners find the optimal strategy, and discusses the impact of block size, discount factors, and the number of users on the trading market and user behaviors. Additionally, the model is simulated in a variety of market types to conduct in-depth discussion of the experimental results. Taking a three-type trading market as an example, in the context of vicious fee competition in the trading market, when plnum = 75, θ= 0.4, st = 100, sz = 400, all users are inclined to choose CoinJoin in the early transaction stage (first 500 rounds). In the middle and late part of the market (15002000 rounds), 97% of users with a privacy sensitivity below 0.7 tend to choose CoinJoin, while 73% of users with a privacy sensitivity above 0.7 tend to choose shielded transactions. CoinJoin transactions and block sizes above 400 can alleviate the vicious competition of transaction fees to some extent. The proposed model can help researchers understand the game of different cryptocurrency trading markets, analyze user trading behavior, and reveal market operation rules.
    Available online:  April 13, 2023 , DOI: 10.13328/j.cnki.jos.006803
    Abstract:
    As a new technology that combines reversible data hiding and fragile watermarking, image reversible authentication (RA) can not only realize the fragile authentication of images but also recover the original carrier image without distortion while extracting the authentication code. Thus, it is of great significance to authenticate the originality and integrity of images. Existing reversible authentication methods have low authentication accuracy and cannot effectively protect images with complex textures or some areas with complex textures in the images. To this end, this study proposes a new reversible authentication method. Firstly, images to be authenticated are divided into blocks, and the obtained sub-blocks are classified as differential blocks (DB) and shifting blocks (SB) according to their embedding capacity. Different reversible embedding methods are employed to embed the authentication codes into different types of blocks. It also adopts a hierarchical embedding strategy to increase embedding capacity and improve the authentication effects of each sub-block. On the authentication side, tamper detection and localization can be realized by the authentication code extracted from each sub-block. In addition, this method can be combined with dilation and corrosion in morphology to refine tamper detection marks and further improve the detection accuracy rate. Experimental results show that the proposed method can protect images with smooth texture and complex texture under the same authentication accuracy, and can also realize independent authentication and restoration of almost all sub-blocks, which has widespread applicability.
    Available online:  April 13, 2023 , DOI: 10.13328/j.cnki.jos.006806
    Abstract:
    By transferring the knowledge of the source domain to the target domain with similar tasks, domain adaptation aims to assist the latter to learn better. When the data label set of the target domain is a subset of the source domain labels, the domain adaptation of this type of scenario is called partial domain adaptation (PDA). Compared with general domain adaptation, although PDA is more general, it is more challenging with few related studies, especially with the lack of systematic reviews. To fill this gap, this study conducts a comprehensive review, analysis and summary of existing PDA methods, and provides an overview and reference of subject research for the relevant community. Firstly, an overview of the PDA background, concepts, and application fields is summarized. Secondly, according to the modeling characteristics, existing PDA methods are divided into two categories: promoting positive transfer and alleviating negative transfer, and this study reviews and analyzes them respectively. Then, the commonly used experimental benchmark datasets are categorized and summarized. Finally, the problems in existing PDA studies are analyzed to point out possible future development directions.
    Available online:  April 04, 2023 , DOI: 10.13328/j.cnki.jos.006669
    Abstract:
    The programmable data plane (PDP), allowing offloading and accelerating network applications, creates revolutionary development opportunities for such applications. Also, it promotes the innovation and evolution of the network by supporting the rapid implementation and deployment of new protocols and services. It has thus been a research hotspot in the field of the network in recent years. With its general computing architecture and rich on-chip resources and extended interfaces, field-programmable gate array (FPGA) provides a variety of implementations of PDP for a wider range of application scenarios. It also offers the possibility to explore more general PDP abstraction. Therefore, FPGA-based PDP (F-PDP) has been widely concerned by the academic and industrial communities. In this study, F-PDP abstraction is described by category. Then, the research progress of key technologies for building network applications with F-PDP is outlined, and programmable network devices based on F-PDP are presented. After that, the application research based on F-PDP in recent years is reviewed in detail from three aspects: improving network performance, building a network measurement framework, and deploying network security applications. Finally, the possible future research trends of F-PDP are discussed.
    Available online:  April 04, 2023 , DOI: 10.13328/j.cnki.jos.006670
    Abstract:
    Bayesian network (BN), as a preliminary framework for representing and inferring uncertain knowledge, is widely used in social network, knowledge graph, medical diagnosis, etc. The centric computing task of BN-based analysis, diagnosis, and decision-support in specific fields includes multiple probabilistic inferences. However, the high time complexity is doomed on the same BN by using the traditional inference methods, due to the several intermediate results of probability calculations that cannot be shared and reused among different inferences. Therefore, to improve the overall efficiency of multiple inferences on the same BN, this study proposes the method of BN embedding and corresponding probabilistic inferences. First, by incorporating the idea of graph embedding, the study proposes a BN embedding method based on the autoencoder and attention mechanism by transforming BN into the point mutual information matrix to preserve the directed a cyclic graph and conditional probability parameters simultaneously. Specifically, each coding layer of the autoencoder generates node embedding by using the correlation between a node and its neighbors (parent and child nodes) to preserve the probabilistic dependencies. Then, the method for probabilistic inferences to measure the joint probability by using the distance between embedding vectors is proposed. Experimental results show that the proposed method outperforms other state-of-the-art methods in efficiency, achieving accurate results of probabilistic inferences.
    Available online:  April 04, 2023 , DOI: 10.13328/j.cnki.jos.006674
    Abstract:
    Code review is an important mechanism in the distributed development of modern software. In code review, providing the context information of the current changes can help code reviewers understand the evolution of a certain source code quickly, thereby enhancing the efficiency and quality of code review. Existing studies have provided some commit history tracking methods and corresponding tools, but these methods cannot further extract auxiliary information relevant to code review from historical data. Therefore, this study proposes a novel code change tracking approach for code review named C2Tracker. Given a fine-grained code change at the method (function) level, C2Tracker can automatically track the history commits which are related to the code changes. Furthermore, the frequent co-occurrence changed code elements and relevant code changes are mined to help reviewers understand the current code changes and make decisions. Experimental verification is conducted on ten well-known open-source projects. The results show that the accuracy of C2Tracker in tracking historical commits, mining frequent co-occurrence code elements, and tracking related code change fragments are 97%, 95%, and 97%, respectively. Compared with existing review methods, C2Tracker greatly improves its code review efficiency and quality in specific cases. Additionally, reviewers acknowledge that it can play a significant role in helping improve the efficiency and quality of most review cases.
    Available online:  April 04, 2023 , DOI: 10.13328/j.cnki.jos.006675
    Abstract:
    Weakly supervised object localization aims to train target locators only by image-level labels instead of accurate location annotations for algorithm training. Some existing methods can only identify the most discriminative region of the target object and are incapable of covering the complete object, or can easily be misled by irrelevant background information, thereby leading to inaccurate object locations. Therefore, this study proposes a weakly supervised object localization algorithm based on attention mechanism and categorical hierarchy. The proposed method extracts a more complete object area by performing mean segmentation on the attention map of the convolutional neural network. In addition, the category hierarchy network is utilized to weaken the attention caused by background areas, which achieves more accurate object location results. Extensive experimental results on multiple public datasets show that the proposed method can yield better localization effects than other weakly supervised object localization methods under various evaluation metrics.
    Available online:  April 04, 2023 , DOI: 10.13328/j.cnki.jos.006676
    Abstract:
    Reference counts are widely employed in large-scale low-level systems including Linux kernel to manage shared resources, and should be consistent with the number of objects referring to resources. Otherwise, bugs of improper update of reference counts may be caused, and resources can never be released or will be released earlier. To detect improper updates of reference counts, available static detection methods have to know the functions which increase reference counts or decrease the counts. However, manually collecting prior knowledge of reference counts is too time-consuming and may be incomplete. Though mining-based methods can reduce the dependency on prior knowledge, it is difficult to effectively detect path-sensitive bugs containing improper updates of reference counts. To this end, this study proposes a method RTDMiner that deeply integrates data mining into static analysis to detect improper updates of reference counts. First, according to the general principles of reference counts, the data mining technique is leveraged to identify functions that raise or reduce reference counts. Then, a path-sensitive static analysis method is employed to detect defective paths that increase reference counts instead of decreasing the counts. To reduce false positives, the study adopts the data mining technique to identify exceptional patterns during detection. The experiment results on the Linux kernel demonstrate that the proposed method can automatically identify functions increasing or decreasing reference counts with the precision of nearly 90%. Moreover, 24 out of the top 50 suspicious bugs detected by RTDMiner have been confirmed to be real bugs by kernel maintainers.
    Available online:  March 29, 2023 , DOI: 10.13328/j.cnki.jos.006748
    Abstract:
    The authenticated data structure (ADS) solves the problem of untrusted servers in outsourced data storage scenarios as users can verify the correctness and integrity of the query results returned by untrusted servers through the ADS. Nevertheless, the security of data owners is difficult to guarantee, and attackers can tamper with the ADS stored by data owners to impede the integrity and correctness verification of query results. Data owners can store the ADS on the blockchain to solve the above problem by leveraging the immutable nature of the blockchain. However, the existing ADS implementation schemes have high maintenance costs on the blockchain and most of them only support the verifiable query of static data. At present, an efficient ADS tailored to the blockchain is still to be designed. By analyzing the gas consumption mechanism of smart contracts and the gas consumption of the ADS based on the traditional Merkle hash tree (MHT), this study proposes SMT, a new ADS, which achieves efficient and verifiable query of streaming data and has a lower gas consumption on the blockchain. Finally, the study verifies the efficiency of SMT both theoretically and experimentally and proves the security of SMT through security analysis.
    Available online:  March 15, 2023 , DOI: 10.13328/j.cnki.jos.006802
    Abstract:
    Multi-label text classification methods based on deep learning lack multi-granularity learning of text information and the utilization of constraint relations between labels. To solve these problems, this study proposes a multi-label text classification method with enhancing multi-granularity information relations. First, this method embeds text and labels in the same space by joint embedding and employs the BERT pre-trained model to obtain the implicit vector feature representation of text and labels. Then, three multi-granularity information relations enhancing modules including document-level information shallow label attention (DISLA) classification module, word-level information deep label attention (WIDLA) classification module, and label constraint relation matching auxiliary module are constructed. The first two modules carry out multi-granularity learning from shared feature representation: the shallow interactive learning between document-level text information and label information, and the deep interactive learning between word-level text information and label information. The auxiliary module improves the classification performance by learning the relation between labels. Finally, the comparison with current mainstream multi-label text classification algorithms on three representative datasets shows that the proposed method achieves the best performance on main indicators of Micro-F1, Macro-F1, nDCG@k, and P@k.
    Available online:  March 08, 2023 , DOI: 10.13328/j.cnki.jos.006756
    Abstract:
    Density peaks clustering (DPC) is a density-based clustering algorithm that can intuitively determine the number of clusters, identify clusters of any shape, and automatically detect and exclude abnormal points. However, DPC still has some shortcomings: The DPC algorithm only considers the global distribution, and the clustering performance is poor for datasets with large cluster density differences. In addition, the point allocation strategy of DPC is likely to cause a domino effect. Hence, this study proposes a DPC algorithm based on representative points and K-nearest neighbors (KNN), namely, RKNN-DPC. First, the KNN density is constructed, and the representative points are introduced to describe the global distribution of samples and propose a new local density. Then, the KNN information of samples is used to propose a weighted KNN allocation strategy to relieve the domino effect. Finally, a comparative experiment is conducted with five clustering algorithms on artificial datasets and real datasets. The experimental results show that the RKNN-DPC algorithm can more accurately identify cluster centers and obtain better clustering results.
    Available online:  March 02, 2023 , DOI: 10.13328/j.cnki.jos.006759
    Abstract:
    The graphical password mitigates the burden of memorizing traditional textual passwords and simplifies the process of entering passwords, which has been widely applied to user authentication of mobile devices in recent years. Existing graphical password authentication schemes face critical threats. First, graphical passwords are vulnerable to shoulder-surfing attacks, namely that users’ graphical passwords may be leaked if attackers capture their login information through eyes or cameras. More seriously, these schemes are subject to credential leakage attacks. In other words, as the server stores authentication credentials related to the graphical passwords of users to verify their identities, if attackers obtain these credentials, they can perform offline password guessing attacks to retrieve users’ graphical passwords. To solve the above problems, this study proposes a secure graphical password authentication scheme, dubbed GADL. GADL embeds random challenge values into the graphical passwords of users to resist shoulder-surfing attacks, and thus attackers cannot obtain users’ passwords even if they capture their login information. To address credential database leakage of the server, GADL utilizes a deterministic threshold blind signature technique to protect users’ graphical passwords. In this technique, multiple key servers are utilized to assist users in the credential generation, which ensures that attackers cannot perform offline guessing attacks to obtain any knowledge of users’ passwords even if they obtain users’ credentials. The security analysis given in this study proves that GADL is resistant to the aforementioned attacks. In addition, the comprehensive performance evaluation of GADL demonstrates its high performance in terms of computation, storage, and communication overhead and proves that it can be easily deployed on mobile devices.
    Available online:  March 02, 2023 , DOI: 10.13328/j.cnki.jos.006779
    Abstract:
    With the development of cloud computing and service architectures including software as a service (SaaS) and function as a service (FaaS), data centers, as the service provider, constantly face resource management. The quality of service (QoS) should be guaranteed, and the resource cost should be controlled. Therefore, a method to accurately measure computing power consumption becomes a key research issue for improving resource utilization and keeping the load pressure in the acceptable range. Due to mature virtualization technologies and developing parallel technologies, the traditional estimation metric CPU utilization fails to address interference caused by resource competition, thus leading to accuracy loss. However, the hyper-threading (HT) technology is employed as the main data center processor, which makes it urgent to estimate the computing power of HT processors. To address this estimation challenge, this study proposes the APU method to estimate the computing power consumption for HT processors based on the understanding of the HT running mechanism and thread behavior modeling. Considering that users with different authorities can access different system levels, two implementation schemes are put forward: one based on the hardware support and the other based on the operating system (OS). The proposed method adopts CPU utilization as the input without demands for other dimensions. Additionally, it reduces the development and deployment costs of new monitoring tools without the support of special hardware architectures, thereby making the method universal and easy to apply. Finally, SPEC benchmarks further prove the effectiveness of the method. The estimation errors of the three benchmarks are reduced from 20%, 50%, and 20% to less than 5%. For further proving the applicability, the APU method is leveraged to ByteDance clusters for showing its effects in case studies.
    Available online:  March 02, 2023 , DOI: 10.13328/j.cnki.jos.006763
    Abstract:
    The emergence of the dynamic link library (DLL) provides great convenience for developers, which improves the interaction between the operating system (OS) and applications. However, the potential security problems of DLL cannot be ignored. Determining how to mine DLL-hijacking vulnerabilities during the running of Windows installers is important to ensure the security of Windows OS. In this paper, the attribute features of numerous installers are collected and extracted, and the double-layer bi-directional long short-term memory (BiLSTM) neural network is applied for machine learning from the perspectives of installers, the invocation modes of DLL from installers, and the DLL file itself. The multi-dimensional features of the vulnerability data set are extracted, and unknown DLL-hijacking vulnerabilities are mined. In experiments, DLL-hijacking vulnerabilities can be effectively detected from Windows installers, and 10 unknown vulnerabilities are discovered and assigned CNVD authorizations. In addition, the effectiveness and integrity of this method are further verified by comparison with other vulnerability analyzers.
    Available online:  March 02, 2023 , DOI: 10.13328/j.cnki.jos.006764
    Abstract:
    Entity resolution widely exists in data tasks such as data quality control, information retrieval, and data integration. Traditional entity resolution methods mainly focus on relational data, while with the development of big data technology, the application requirements of cross-modal data are generated due to the proliferation of different modal data including texts and images. Hence, cross-modal data entity resolution has become a fundamental problem in big data processing and analysis. In this study, the research development of cross-modal entity resolution is reviewed, and its definition and evaluation indexes are introduced. Then, with the construction of inter-modal relationships and the maintenance of intra-modal relationships as the main line, existing research results are surveyed. In addition, widely used methods are tested on different open datasets, and their differences and reasons behind them are analyzed. Finally, the problems in the present research are concluded, on the basis of which the future research trends are given.
    Available online:  February 22, 2023 , DOI: 10.13328/j.cnki.jos.006641
    Abstract:
    Abnormal behavior detection is one of the important functions in the intelligent surveillance system, which plays an active role in ensuring public security. To improve the detection rate of abnormal behavior in surveillance videos, this study designs a semi-supervised abnormal behavior detection network based on a probabilistic memory model from the perspective of learning the distribution of normal behavior, in an attempt to deal with the great imbalance between normal behavior data and abnormal behavior data. The network takes an auto-encoding network as the backbone network and uses the gap between the predicted future frame and the real frame to measure the intensity of the anomaly. When extracting spatiotemporal features, the backbone network employs three-dimensional causal convolutional and temporally-shared full connection layers to avoid future information leakage and ensure the temporal sequence of information. In terms of auxiliary modules, a probabilistic model and a memory module are designed from the perspective of probability entropy and diverse patterns of normal behavior data to improve the quality of video frame reconstruction in the backbone network. Specifically, the probabilistic model uses the autoregressive process to fit the input data distribution, which promotes the model to converge to the low-entropy state of the normal distribution; the memory module stores the prototypical features of normal behavior in the historical data to realize the coexistence of multi-modal data and avoid the reconstruction of abnormal video frames caused by excessive participation of the backbone network. Finally, ablation experiments and comparison experiments with classic algorithms are carried out on public datasets to examine the effectiveness of the proposed algorithm.
    Available online:  February 22, 2023 , DOI: 10.13328/j.cnki.jos.006652
    Abstract:
    After years of technical development and attack-defense confrontation, the reinforcement technology for Android applications has matured to the extent that protection granularity has gradually developed from general dynamic Dalvik executable (DEX) modification to a highly customized Native-layer obfuscation mechanism. Client code protection is strengthened by continuously increasing reverse analysis difficulty and workload. For the newly emerged reinforcement technology of obfuscator low level virtual machine (OLLVM) obfuscation, this study proposes an automatic anti-obfuscation solution CiANa based on Capstone and flow-sensitive concolic execution. The Capstone engine is used to analyze the basic block and its instruction structure, thereby identifying the real blocks scattered in the control flow graph of program disassembly. Then, the execution sequence of the real blocks is determined by leveraging flow-sensitive concolic execution. Finally, the real block assembly instructions are repaired to obtain anti-obfuscated executable binary files. The comparative experimental results show that CiANa can recover the Android Native files under OLLVM obfuscation in the ARM/ARM64 architecture. As the first framework that offers effective anti-obfuscation and generates executable files for all versions (Debug/Release version) of OLLVM in the ARM/ARM64 architecture, CiANa provides necessary auxiliary support for reverse analysis.
    Available online:  February 22, 2023 , DOI: 10.13328/j.cnki.jos.006757
    Abstract:
    Mixed precision has made many advances in deep learning and precision tuning and optimization. Extensive research shows that mixed precision optimization for stencil computation is challenging. Moreover, the research achievements secured by the polyhedral model in the field of automatic parallelization indicate that the model provides a good mathematical abstraction for loop nesting, on the basis of which loop transformations can be performed. This study designs and implements an automatic mixed precision optimizer for Stencil computation on the basis of polyhedral compilation technology. By performing iterative domain partitioning, data flow analysis, and scheduling tree transformation on the intermediate representation layers, this study implements the source-to-source automatic generation of mixed precision codes for Stencil computation for the first time. The experiments demonstrate that the code after automatic mixed precision optimization can give full play to its parallelism potential and improve the performance of the program by reducing precision redundancy. With high-precision computing as the benchmark, the maximum speedup is 1.76, and the geometric average speedup is 1.15 on the x86 architecture; on the new-generation Sunway architecture, the maximum speedup is 1.64, and the geometric average speedup is 1.20.
    Available online:  February 22, 2023 , DOI: 10.13328/j.cnki.jos.006758
    Abstract:
    With the increasingly powerful performance of neural network models, they are widely used to solve various computer-related tasks and show excellent capabilities. However, a clear understanding of the operation mechanism of neural network models is lacking. Therefore, this study reviews and summarizes the current research on the interpretability of neural networks. A detailed discussion is rendered on the definition, necessity, classification, and evaluation of research on model interpretability. With the emphasis on the focus of interpretable algorithms, a new classification method for the interpretable algorithms of neural networks is proposed, which provides a novel perspective for the understanding of neural networks. According to the proposed method, this study sorts out the current interpretable methods for convolutional neural networks and comparatively analyzes the characteristics of interpretable algorithms falling within different categories. Moreover, it introduces the evaluation principles and methods of common interpretable algorithms and expounds on the research directions and applications of interpretable neural networks. Finally, the problems confronted in this regard are discussed, and possible solutions to these problems are given.
    Available online:  February 15, 2023 , DOI: 10.13328/j.cnki.jos.006750
    Abstract:
    Entity recognition is a key task of information extraction. With the development of information extraction technology, researchers turn the research direction from the recognition of simple entities to the recognition of complex ones. Complex entities usually have no explicit features, and they are more complicated in syntactic constructions and parts of speech, which makes the recognition of complex entities a great challenge. In addition, existing models widely use span-based methods to identify nested entities. As a result, they always have an ambiguity in the detection of entity boundaries, which affects recognition performance. In response to the above challenge and problem, this paper proposes an entity recognition model GIA-2DPE based on prior semantic knowledge and type embedding. The model uses keyword sequences of entity categories as prior semantic knowledge to improve the cognition of entities, utilizes type embedding to capture potential features of different entity types, and then combines prior knowledge with entity-type features through the gated interactive attention mechanism to assist in the recognition of complex entities. Moreover, the model uses 2D probability encoding to predict entity boundaries and combines boundary features and contextual features to enhance accurate boundary detection, thereby improving the performance of nested entity recognition. This study conducts extensive experiments on seven English datasets and two Chinese datasets. The results show that GIA-2DPE outperforms state-of-the-art models and achieves a 10.4% F1 boost compared with the baseline in entity recognition tasks on the ScienceIE dataset.
    Available online:  February 15, 2023 , DOI: 10.13328/j.cnki.jos.006754
    Abstract:
    Tag-aware recommendation algorithms use tagged data to enhance the recommendation models’ understanding of user preferences and item attributes, which attract extensive attention in the field. Most existing methods, however, neglect the diversities of user concerns, item attributes, and tag semantics and interfere with the correlation inference of the three, which affects the recommendation results. Therefore, this paper introduces the disentangled graph collaborative filtering (DGCF) method into the tag-aware recommendation task and proposes a DGCF-based explainable tag-aware recommendation (DETRec) method. It disentangles the perspectives of users, items, and tags to provide explainable recommendation references. Specifically, DETRec utilizes a correlation graph construction module to model the user–item–tag correlations. Then, it employs a neighborhood routing mechanism and a message propagation mechanism to disentangle the nodes to form the sub-graphs of attributes and thereby describe the nodal correlations under different attributes. Finally, it generates recommendation references on the basis of these attribute sub-graphs. This study implements two types of DETRec instantiation: 1) DETRec based on a single graph (DETRec-S), which describes all correlations of user, item, and tag nodes in a single graph, and 2) DETRec based on multiple graphs (DETRec-M), which utilizes three bipartite graphs to describe the user–item, item–tag, and user–tag correlations separately. Extensive experiments on three public datasets demonstrate that the above two types of DETRec instantiation are significantly superior to the baseline model and generate the references corresponding to the recommendation results. Hence, DETRec is an effective explainable tag-aware recommendation algorithm.
    Available online:  February 15, 2023 , DOI: 10.13328/j.cnki.jos.006765
    [Abstract] (397) [HTML] (0) [PDF 4.99 M] (3879)
    Abstract:
    In recent years, software construction, operation, and evolution have encountered many new requirements, such as the need for efficient switching or configuration in development and testing environments, application isolation, resource consumption reduction, and higher efficiency of testing and deployment. These requirements pose great challenges to developers in developing and maintaining software. Container technology has the potential of releasing developers from the heavy workload of development and maintenance. Of particular note, Docker, as the de facto industrial standard for containers, has recently become a popular research area in the academic community. To help researchers understand the status and trends of research on Docker containers, this study conducts a systematic literature review by collecting 75 high-level papers in this field. First, quantitative methods are used to investigate the basic status of research on Docker containers, including research quantity, research quality, research areas, and research methods. Second, the first classification framework for research on Docker containers is presented in this study, and the current studies are systematically classified and reviewed from the dimensions of the core, platform, and support. Finally, the development trends of Docker container technology are discussed, and seven future research directions are summarized.
    Available online:  February 15, 2023 , DOI: 10.13328/j.cnki.jos.006762
    Abstract:
    The extended Berkeley packet filter (eBPF) mechanism in the Linux kernel can safely load user-provided untrusted programs into the kernel. In the eBPF mechanism, the verifier checks these programs and ensures that they will not cause the kernel to crash or access the kernel address space maliciously. In recent years, the eBPF mechanism has developed rapidly, and its verifier has become more complex as more and more new features are added. This study observes two problems of the complex eBPF verifier. One is the “false negative” problem: There are many bugs in the complex security check logic of the verifier, and attackers can leverage these bugs to design malicious eBPF programs that can pass the verifier to attack the kernel. The other is the “false positive” problem: Since the verifier adopts the static check method, only conservative checks can be performed due to the lack of runtime information. This may cause the originally safe program to fail the check of the verifier and only support limited semantics, which brings difficulties to the development of eBPF programs. Further analysis shows that the static simulation execution check mechanism in the eBPF verifier features massive codes, high complexity, and conservative analysis, which are the main reasons for security vulnerabilities and false positives. Therefore, this study proposes to replace the static simulation execution check mechanism in the eBPF verifier with a lightweight dynamic check method. The bugs and conservative checks that originally existed in the eBPF verifier due to simulation execution no longer exist, and hence, the above-mentioned “false negative” and “false positive” problems can be eliminated. Specifically, the eBPF program is run in a kernel sandbox, which dynamically checks the memory access of the program in the runtime to prevent it from accessing the kernel memory illegally. For efficient implementation of a lightweight kernel sandbox, the Intel protection keys for supervisor (PKS), a new hardware feature, is used to perform a zero-overhead memory access check, and an efficient interaction method between the kernel and the eBPF program in the sandbox is presented. The evaluation results show that this study can eliminate memory security vulnerabilities of the eBPF verifier (this type of vulnerability has accounted for more than 60% of the total vulnerabilities of the eBPF verifier since 2020). Moreover, in the scenario of high-throughput network packet processing, the performance overhead brought by the lightweight kernel sandbox is lower than 3%.
    Available online:  February 08, 2023 , DOI: 10.13328/j.cnki.jos.006651
    Abstract:
    Parallelization is one of the most effective blockchain scalability solutions, and the existing parallelization schemes can be classified into two categories, i.e., starlike structure and parallel structure, according to the network structure. However, the current research lacks the analyses of factors affecting the performance boundary and performance bottleneck in starlike sharding structure. To address this problem, this study abstracts a general starlike sharding structure of blockchains for the schemes adopting different starlike sharding structure, and the transaction process in this general structure is quantitatively modeled to derive the relationship between throughput and the number of shards in starlike sharding structure. According to the constructed model, there exists a performance limit in starlike sharding structure and an optimal sharding quantity to maximize the system throughput. An explicit functional relationship exists between the maximal throughput and the functional complexity of the mainchain. With the proposed throughput model, related blockchain systems can balance the number of shards and the functional complexity of the mainchain to reach the theoretical upper limit of system throughput with the consideration of their specific design. Therefore, the work of this study has significant guiding value in the design of the schemes adopting starlike parallelization.
    Available online:  February 08, 2023 , DOI: 10.13328/j.cnki.jos.006660
    Abstract:
    Various business relationships and routing policies exist among the autonomous systems (ASes) in an inter-domain routing system. Routing propagation violating the export policy agreements among the ASes is likely to cause route leaks, ultimately leading to serious consequences such as network interruption, traffic eavesdropping, and link overload. Verifying routing policy compliance is thus essential for ensuring the security and stability of the inter-domain routing system. However, the dual requirements of ASes for the autonomous configuration and privacy protection of local routing policies increase the difficulty in verifying routing policy compliance and consequently pose a hard problem that remains to be settled properly in the field of inter-domain routing security. This study proposes a blockchain-based verification method for inter-domain routing policy compliance. With blockchain and the cryptographic technology as trust endorsements, this method enables ASes to publish, interact, verify, and execute routing policy expectations in a safe and private manner. The authenticity of the routing propagation process is ensured by generating route attestations corresponding to routing updates. Thus, the verification of routing policy compliance is completed by multi-domain cooperation. A prototype system is implemented, and experiments and analyses are carried out on real routing data. The results show that the proposed method offers traceable verification of export policy compliance of routing propagation without leaking the business relationships and local routing policies among ASes, suppresses policy-violating routing propagation effectively with reasonable overhead, and maintains a remarkable ability to suppress policy-violating routing even in partial deployment scenarios.
    Available online:  January 18, 2023 , DOI: 10.13328/j.cnki.jos.006663
    Abstract:
    Distributed hash table (DHT) is widely used in distributed storage because of its efficient data addressing. Nevertheless, traditional DHT-based storage has to store data in specified nodes to achieve efficient data addressing, which restricts the application scope of the DHT technology severely. Taking heterogeneous storage networks for example, the storage space, bandwidth, and stability of nodes vary greatly. Choosing appropriate data storage nodes according to data features and the performance differences among the nodes can greatly improve the data access efficiency. However, the tight coupling between data and storage location disqualifies the traditional DHT-based storage from being applied to heterogeneous storage networks. Therefore, this study proposes a vRoute algorithm to decouple the data identifier from storage location in DHT-based storage. By building a distributed data index based on Bloom Filter, the vRoute algorithm allows data to be stored in any node of the network without reducing the efficiency of data addressing. It is implemented by extending the Kademlia algorithm, and its validity is verified theoretically. Finally, the simulation experiments show that vRoute achieves a data addressing efficiency close to that of the traditional DHT algorithm with low storage and network overhead.
    Available online:  January 18, 2023 , DOI: 10.13328/j.cnki.jos.006649
    Abstract:
    In a published study, the problem of using Turing reduction to solve ε-NN is studied. In other words, given a query point q, a point set P, and an approximate factor ε, the purpose is to return the approximate nearest neighbor of q in P with an approximation ratio of not more than 1+ε. Moreover, a Turing reduction algorithm with O(logn) query time complexity is proposed, where the query time is the number of times that the oracle is invoked. The comparison indicates that the O(logn) query time is the lowest compared to that of all the existing algorithms. However, the disadvantage of the proposed algorithm is that there is a factor of O((d/ε)d) in the preprocessing time complexity and space complexity. When the number of dimensions d is high, or the approximation factor ε is small, the factor would become unacceptable. Therefore, this study revises the reduction algorithm and analyzes the expected time complexity and space complexity of the algorithm when the input point set follows the Poisson point process. As a result, the expected preprocessing time complexity is reduced to O(nlogn), and the expected space complexity is reduced to O(nlogn), while the expected query time complexity remains O(logn). In this sense, the future work raised in the published study is completed.
    Available online:  January 13, 2023 , DOI: 10.13328/j.cnki.jos.006818
    Abstract:
    With the development of Internet of Things (IoT) technology, IoT devices are widely applied in many areas of production and life. However, IoT devices also bring severe challenges to equipment asset management and security management. Firstly, Due to the diversity of IoT device types and access modes, it is often difficult for network administrators to know the IoT device types and operating status in the network. Secondly, IoT devices are becoming the focus of cyber attacks due to their limited computing and storage resources, which makes it difficult to deploy traditional defense measures. Therefore, it is important to acknowledge the IoT devices in the network through device identification and detect anomalies based on the device identification results, so as to ensure the normal operation of IoT devices. In recent years, academia has carried out a lot of research on the above issues. This study systematically reviews the work related to IoT device identification and anomaly detection. In terms of device identification, existing research can be divided into passive identification methods and active identification methods according to whether data packets are sent to the network. The passive identification methods are further investigated according to the identification method, identification granularity, and application scenarios. The study also investigates the active identification methods according to the identification method, identification granularity, and detection granularity. In terms of anomaly detection, the existing work can be divided into detection methods based on machine learning algorithms and rule-matching methods based on behavioral norms. On this basis, challenges in IoT device identification and anomaly detection are summarized, and the future development direction is proposed.
    Available online:  January 13, 2023 , DOI: 10.13328/j.cnki.jos.006542
    Abstract:
    With the wide application of global positioning system (GPS), more and more electric bicycles are equipped with GPS sensors. Massive trajectory data recorded by those sensors are of great value in many fields, such as users’ travel patterns analysis, decision support for urban planners, and so on. However, the low-cost GPS sensors widely used on electric bicycles cannot provide high-precision positioning. Besides, the map matching for the electric bicycles’ track data is more complex and challenging due to: (1) many stay points on electric bicycles’ trajectories; (2) higher sampling frequency and shorter distance between adjacent track points on electric bicycle’s track data; (3) some roads only open for electric bicycles, and the accuracy of matching is sensitive to the quality of the road network. To solve those issues mentioned above, an adaptive and accurate road network map matching algorithm is proposed named KFTS-AMM, which consists of two main components: the segmented Kalman filtering based trajectory simplification (KFTS) algorithm and segmented hidden Markov model based adaptive map matching (AMM) algorithm. Since Kalman filtering algorithm can be used for optimal state estimation, the trajectory simplification algorithm KFTS can make the trajectory curve smoother and reduce the impact of abnormal points on the accuracy of map matching by fixing the trajectory points automatically in the process of trajectory simplification. Besides, the matching algorithm AMM is used to reduce the impact of invalid trajectory segments on the map matching accuracy. Moreover, stay points identification and merging step are added into the processing of track data, and the accuracy is further improved. Extensive experiments conducted on the real-world track dataset of electric bicycles in Zhengzhou city show that the proposed approach KFTS-AMM outperforms baselines in terms of accuracy and can speed up the matching process by using the simplified track data significantly.
    Available online:  January 13, 2023 , DOI: 10.13328/j.cnki.jos.006537
    Abstract:
    The development of artificial intelligence brings more and more challenges to data hiding technology, and it is urgent to improve the security of existing steganography methods. In this study, a generative multiple adversarial steganography algorithm based on U-Net network structure is proposed to improve the image data hiding ability. A generative multiple adversarial steganography network (GMASN), including the generative adversarial network, the steganalyzer optimization network and the steganalysis network, is firstly constructed, and the anti steganalysis ability of the steganography image is improved through the competition of the networks in the GMASN. At the same time, aiming at the problem that the existing generative adversarial network can only generate low-quality images randomly, a generative network based on U-Net structure is designed to transfer the details of the reference image to the generated carrier image, by which the image can be generated objectively with high visual quality. Moreover, the image discrimination loss, mean square error (MSE) loss, and steganalysis loss are dynamically combined in the proposed scheme to enable the GMASN to converge rapidly and stably. Experimental results show that the PSNR of the generated carrier image can reach 48.60 dB, and the discrimination rate between the generated carrier image and the steganographic image is 50.02%. The proposed algorithm can generate high-quality carrier images suitable for data hiding, enable the steganographic network to converge rapidly and stably, and improve the security of image steganography effectively.
    Available online:  January 04, 2023 , DOI: 10.13328/j.cnki.jos.006646
    Abstract:
    SQL is a programming language that is widely used to operate relation databases. Many users (such as data analysts and junior programmers) will encounter various difficulties when writing SQL query programs due to the lack of programming experience and knowledge of SQL syntax. Currently, the research on the automatic synthesis of SQL query programs from the <input-output> (I/O) example tables has attracted more and more attention. The inductive SQL synthesis with positive and negative tuples (ISST) method proposed in this study can automatically synthesize SQL query programs that meet the users’ expectations by the I/O example tables edited by users and containing a small number of tuples. The ISST method contains five main stages: constructing the SQL query program sketches, expanding the worksheet data, dividing the sets of positive and negative examples, inductively synthesizing selection predicates, and sorting after verifying. The candidate set of SQL query programs is verified on the online database PostgreSQL, and the candidate set of synthesized SQL query programs is scored and sorted according to the principle of Occam’s razor. The ISST method is implemented using the Java language and then is evaluated on a test set containing 28 samples. The results reveal that the ISST method can correctly synthesize 24 of the samples, which takes an average of 2 seconds.
    Available online:  January 04, 2023 , DOI: 10.13328/j.cnki.jos.006647
    [Abstract] (450) [HTML] (0) [PDF 6.56 M] (1259)
    Abstract:
    In recent years, graph neural networks (GNNs) have attracted wide attention due to their powerful and flexible representation ability. Considering the increasing scale of graph data and the limitation of the video memory capacity, it becomes more challenging to train GNNs with traditional general deep learning systems, and such training cannot give full play to the performance of GPU devices. To achieve efficient use of GPU hardware for GNN training is one of the important research issues in this field. Traditional approaches employ sparse matrix multiplication for the calculation process of GNNs. When the video memory capacity of GPU devices is limited, the computation tasks are distributed to each device by distributed matrix multiplication. Their shortcomings are mainly as follows: (1) Sparse matrix multiplication ignores the sparse distribution of the graph data, which results in low computation efficiency. (2) These methods ignore the computation and memory access characteristics of GPU and fail to utilize the hardware resources. To improve the training efficiency, some studies propose to reduce the costs of each iteration and storage requirements through graph sampling techniques, which also support flexible distributed scaling. Due to the stochastics and variance, however, these methods often affect the model accuracy. Therefore, this study proposes a high-performance GNN training framework for multi-GPUs. Different GNN partition strategies for multi-GPUs are explored, and the influence of different graph ordering patterns on the GPU performance during the calculation process of GNNs is investigated to ensure the accuracy of the model. Moreover, block-sparsity-aware optimization methods are put forward for GPU memory access. The prototype system is achieved using C++ and CuDNN. The experiments on four large-scale GNN datasets demonstrate that (1) the graph re-ordering method improves the cache hit rate of GPU by around 40% and doubles the computation speedup; (2) compared to the existing system DGL, the proposed system achieves a total speedup of 5.8x.
    Available online:  January 04, 2023 , DOI: 10.13328/j.cnki.jos.006665
    Abstract:
    As the scale of business data increases, distributed online analytical processing (OLAP) is widely performed in business intelligence (BI), enterprise resource planning (ERP), user behavior analysis, and other application scenarios to support large-scale data analysis. Moreover, distributed OLAP overcomes the limitations of single-machine storage and stores data in memory to improve the performance of OLAP. However, after the in-memory distributed OLAP eliminates disk input/output (I/O), the join operation becomes one of its new performance bottlenecks. As a common practice in OLAP, the join operation involves a huge amount of data accessing and computation operations. By analyzing existing methods for the join operation, this study presents a graph structure indexing method that can accelerate the join operation and a new join method called LinkJoin based on it. Graph structure indexing stores the in-memory position of data in the form of a graph structure according to the join relationship specified by users. The join method based on graph structure indexing reduces the amount of data accessing and computation operations with a low complexity equivalent to that of the hash join. This study expands the state-of-the-art open-source in-memory OLAP system called MonetDB from a single-machine system to a distributed one and designs and implements a join method based on graph structure indexing on it. A series of designs and optimizations are also conducted in the aspects of graph indexing structure, columnar storage, and distributed execution engine to improve the distributed OLAP performance of the system. The test results show that in the TPC-H benchmark tests, the join operation based on graph structure indexing improves the performance on queries with join operations by 1.64 times on average and 4.1 times at most. For the join operation part of these queries, it enhances the performance by 9.8–22.1 times.
    Available online:  December 30, 2022 , DOI: 10.13328/j.cnki.jos.006724
    Abstract:
    The task of knowledge tracing is to trace the changes in students’ knowledge state and predict their future performance in learning according to their historical learning records. In recent years, knowledge-tracing models based on attention mechanisms are markedly superior to traditional knowledge-tracing models in both flexibility and prediction performance. Only taking into account exercises involving single concept, most of the existing deep models cannot directly deal with exercises involving multiple concepts, which are, nevertheless, vast in intelligent education systems. In addition, how to improve interpretability is one of the key challenges facing deep knowledge tracing models. To solve the above problems, this study proposes a deep knowledge tracing model based on the embedding off used multiple concepts that considers the relationships among the concepts in exercises involving multiple concepts. Furthermore, the study puts forward two novel embedding methods for multiple concepts and combines educational psychology models with forgetting factors to improve prediction performance and interpretability. Experiments reveal the superiority of the proposed model over existing models in prediction performance on large-scale real datasets and verify the effectiveness of each module of the proposed model.
    Available online:  December 30, 2022 , DOI: 10.13328/j.cnki.jos.006777
    Abstract:
    Smoothed particle hydrodynamics (SPH) is one key technology for fluid simulation. With the growing demand for applications of SPH fluid simulation technology in production practices, many relevant studies have emerged in recent years, which improve the visual authenticity, efficiency, and stability simulated by physical properties including fluid incompressibility, viscosity, and surface tension. Additionally, some researchers focus on high-quality simulation in complex scenarios and a unified simulation framework with multiple scenarios and materials, thereby enhancing the application efficiency of SPH fluid simulation technology. This study discusses and summarizes related research on SPH fluid simulation technology from the above aspects, and proposes a prospect for the technology.
    Available online:  December 30, 2022 , DOI: 10.13328/j.cnki.jos.006800
    Abstract:
    With the rapid growth and further application of deep learning (DL), the scale of DL training continues to expand, and memory insufficiency has become one of the major bottlenecks threatening DL availability. Memory swapping mechanism is the key mechanism to alleviate the memory problem of DL training. This mechanism leverages the “time-varying” memory requirement of DL training and moves the data between specific computing accelerating device memory and external storage according to demands. The operation of DL training tasks can be ensured by replacing an accumulated memory requirement with an instant one. This study surveys the memory swapping mechanism for DL training from the aspect of time-varying memory requirements. Key studies of an operator feature-based memory swapping-out mechanism, a data dependency based swapping-in mechanism, and efficiency-driven joint swapping-in and swapping-out decisions are summarized. Finally, the development prospect of this technology is pointed out.
    Available online:  December 30, 2022 , DOI: 10.13328/j.cnki.jos.006804
    Abstract:
    Stochastic configuration network (SCN), as an emerging incremental neural network model, is different from other randomized neural network methods. It can configure the parameters of hidden layer nodes through supervision mechanisms, thereby ensuring the fast convergence performance of SCN. Due to the advantages of high learning efficiency, low human intervention, and strong generalization ability, SCN has attracted a large number of national and international scholars and developed rapidly since it was proposed in 2017. In this study, SCN research is summarized from the aspects of basic theories, typical algorithm variants, application fields, and future research directions of SCN. Firstly, the algorithm principles, universal approximation capacity, and advantages of SCN are analyzed theoretically. Secondly, typical variants of SCN are studied, such as DeepSCN, 2DSCN, Robust SCN, Ensemble SCN, Distributed SCN, Parallel SCN, and Regularized SCN. Then, the applications of SCN in different fields, including hardware implementation, computer vision, medical data analysis, fault detection and diagnosis, and system modeling and prediction are introduced. Finally, the development potential of SCN in convolutional neural network architectures, semi-supervised learning, unsupervised learning, multi-view learning, fuzzy neural network, and recurrent neural network is pointed out.
    Available online:  December 28, 2022 , DOI: 10.13328/j.cnki.jos.006662
    Abstract:
    Recent research studies on social recommendation have focused on the joint modeling of the explicit and implicit relations in social networks and overlooked the special phenomenon that high-order implicit relations are not equally important to each user. The importance of high-order implicit relations to users with plenty of neighbors differs greatly from that to users with few neighbors. In addition, due to the randomness of social relation construction, explicit relations are not always available. This study proposes a novel adaptive high-order implicit relations modeling (AHIRM) method, and the model consists of three components. Specifically, unreliable relations are filtered, and potential reliable relations are identified, thereby mitigating the adverse effects of unreliable relations and alleviating the data sparsity issue. Then, an adaptive random walk algorithm is designed to capture neighbors at different orders for users according to normalized node centrality, construct high-order implicit relations among the users, and ultimately reconstruct the social network. Finally, the graph convolutional network (GCN) is employed to aggregate information about neighbor nodes. User embeddings are thereby updated to model the high-order implicit relations and further alleviate the data sparsity issue. The influence of social structure and personal preference are both considered during modeling, and the process of social influence propagation is simulated and retained. Comparative verification of the proposed model and the existing algorithms are conducted on the LastFM, Douban, and Gowalla datasets, and the results verify the effectiveness and rationality of the proposed AHIRM model.
    Available online:  December 28, 2022 , DOI: 10.13328/j.cnki.jos.006644
    Abstract:
    Constraint Programming (CP) is one of the classical paradigms for representing and solving combinatorial problems. Extensional constraints, also called table constraints, are the most common type of constraints in CP, and most CP problems can be expressed by table constraints. In the problem-solving process, consistency algorithms are used to reduce the search space, and the simple table reduction (STR) algorithms are the most efficient consistency algorithms with table constraints, including Compact-Table (CT) and STRbit algorithms, both of which maintain the generalized arc consistency (GAC) during the search. In addition, the full pairwise consistency (fPWC) is stronger than GAC in the pruning capability, and the most efficient fPWC maintenance algorithm is the PW-CT algorithm. Over the years, many consistency algorithms with table constraints are proposed to improve the pruning capability and efficiency. Factor-decomposition encoding (FDE) recodes trivial problems, which enlarges the scale of the problem model to some extent, and as a result, maintaining a relatively weak GAC on a new model is equivalent to maintaining a strong fPWC on the original model. Currently, the appropriate STR algorithms for FDE are STRFDE and STR2 rather than CT as the CT algorithm may produce memory overflow. In the process of maintaining the consistency algorithm, it is necessary to call each constraint iteratively to perform its consistency algorithm to filter the search space. This process is called constraint propagation. The dynamic submission scheme is a parallel constraint propagation scheme, which can schedule constraint execution propagation algorithms in parallel, and it is particularly effective in large-scale problems. Therefore, this study proposes PSTRFDE for FDE by improving STRFDE and dynamic submission propagation algorithms. PSTRFDE can be embedded into the dynamic submission scheme to further improve the efficiency of constraint problem solving. Extensive experiments indicate that PSTRFDE can reduce the used memory compared with CT and STRbit, and compared with the results of STRFDE and STR2, the efficiency of PSTRFDE can be further improved. The research demonstrates that PSTRFDE is the most efficient filtering algorithm for FDE at present.
    Available online:  December 22, 2022 , DOI: 10.13328/j.cnki.jos.006643
    Abstract:
    This study proposes a new feature constrained distillation learning method for visual anomaly detection, which makes full use of the features of the teacher model to instruct the student model to efficiently identify abnormal images. Specifically, the Vision Transformer (ViT) model is introduced as the backbone network of anomaly detection tasks, and a central feature strategy is put forward to constrain the output features of the student network. Considering the strong feature expressiveness of the teacher network, the central feature strategy is developed to dynamically generate the feature representation centers of normal samples for the student network from the teacher network. In this way, the ability of the student network to describe the feature output of normal data is improved, and the feature difference between the student and teacher networks in abnormal data is widened. In addition, to minimize the difference between the student and teacher networks in the feature representation of normal images, the proposed method leverages the Gram loss function to constrain the relationship between the coding layers of the student network. Experiments are conducted on three general anomaly detection data sets and one real-world industrial anomaly detection data set, and the experimental results demonstrate that the proposed method significantly improves the performance of visual anomaly detection compared with the state-of-the-art methods.
    Available online:  December 22, 2022 , DOI: 10.13328/j.cnki.jos.006661
    Abstract:
    Although traditional watermarking attack methods can obstruct the correct extraction of watermark information, they reduce the visual quality of watermarked images greatly. Therefore, a novel imperceptible watermarking attack method based on residual learning is proposed. Specifically, a watermarking attack model based on a convolutional neural network is constructed for the end-to-end nonlinear learning between a watermarked image and an unwatermarked one. A mapping from the watermarked image to the unwatermarked one is thereby accomplished to achieve the purpose of watermarking attack. Then, a proper number of feature extraction blocks are selected according to the embedding region of watermark information to extract a feature map containing watermark information. As the difference between the two images is insignificant, the learning ability of the watermarking attack model is limited in the training process, making it difficult for the model to reach a convergence state. A residual learning mechanism is thus introduced to improve the convergence speed and learning ability of the watermarking attack model. The imperceptibility of the attacked image can be improved by reducing the difference between the residual image (the subtraction between the watermarked image and the extracted feature map) and the unwatermarked one. In addition, a dataset for training the watermarking attack model is constructed with the super-resolution dataset DIV2K2017 and the attacked robust color image watermarking algorithm based on quaternion exponent moments. The experimental results show the proposed watermarking attack model can attack a robust watermarking algorithm with a high bit error rate (BER) without compromising the visual quality of watermarked images.
    Available online:  December 22, 2022 , DOI: 10.13328/j.cnki.jos.006659
    Abstract:
    The order of label learning is crucial to a classifier chains method. Therefore, this study proposes a classifier chains method based on the association rules and topological sequence (TSECC). Specifically, a measurement strategy for label dependencies based on strong association rules is designed by leveraging frequent patterns. Then, a directed acyclic graph is constructed according to the dependency relationships among the labels to topologically sort all the vertices in the graph. Finally, the topological sequence obtained is used as the order of label learning to iteratively update each label’s classifier successively. In particular, to reduce the impact of “lonely” labels with no or low label dependencies on the prediction performance on the other labels, TSECC excludes “lonely” labels out of the topological sequence and uses a binary relevance model to train them separately. Experimental results on a variety of public multi-label datasets show that TSECC can effectively improve classification performance.
    Available online:  December 16, 2022 , DOI: 10.13328/j.cnki.jos.006655
    Abstract:
    Improving the quality and diversity of generated samples has always been one of the main challenging tasks in the field of generative adversarial network (GAN). For this reason, a bidirectional constraint GAN (BCGAN) is proposed. Compared with the traditional GAN variants, this network adds one more generator module to the architecture design. The two generators approach the data distribution of real samples from two different directions. Then, according to the network architecture of BCGAN, this study designs a new loss function and analyzes and proves it theoretically. During BCGAN training, the diversity of the generated samples is enriched by increasing the distance between the data distribution of two generated samples, and the difference of the discriminator between the data distribution of the two generated samples is reduced to stabilize the training process and thereby improve the quality of the generated samples. Finally, experiments are carried out on a synthetic dataset and three open challenge datasets. This series of experiments show that compared with other generative methods, the proposed method fits real data distribution better and effectively improves the quality and diversity of generated samples. In addition, the training process of this method is smoother and more stable.
    Available online:  December 16, 2022 , DOI: 10.13328/j.cnki.jos.006511
    Abstract:
    Stable learning aims to leverage the knowledge obtained only from a single training data to learn a robust prediction model for accurately predicting label of the test data from a different but related distribution. To achieve promising performance on the test data with agnostic distributions, existing stable learning algorithms focus on eliminating the spurious correlations between the features and the class variable. However, these algorithms can only weaken part of the spurious correlations between the features and the class variable, but can not completely eliminate the spurious correlations. Furthermore, these algorithms may encounter the overfitting problem in learning the prediction model. To tackle these issues, this study proposes a sample reweighting and dual classifiers based stable learning algorithm, which jointly optimizes the weights of samples and the parameters of dual classifiers to learn a robust prediction model. Specifically, to estimate the effects of all features on classification, the proposed algorithm balances the distribution of confunders by learning global sample weights to remove the spurious correlations between the features and the class variable. In order to eliminate the spurious correlations between some irrelevant features and the class variable and weaken the influence of irrelevant features on the weighting process of samples, the proposed algorithm selects and removes some irrelevant features before sample reweighting. To further improve the generalization ability of the model, the algorithm constructs two classifiers and learns a prediction model with an optimal hyperplane by minimizing the parameter difference between the two classifiers during learning the prediction model. Using synthetic and real-world datasets, the experiments have validated the effectiveness of the proposed algorithm.
    Available online:  December 08, 2022 , DOI: 10.13328/j.cnki.jos.006514
    Abstract:
    In smart healthcare, cloud computing and the Internet of Things are combined to solve the problem of real-time access to large-scale data. However, the data is uploaded to a remote cloud. It increases additional communication cost and transmission delay. Fog computing has been introduced into smart healthcare to solve this problem. The fog servers assist the cloud server to complete data storage and access locally. It contributes to low latency and high mobility. Since the medical data is highly sensitive, how to design a fog computing-based smart healthcare authentication protocol has become a research hotspot. If the data is tampered illegally, the consequences will be catastrophic. Hence, the authentication protocol should be secure against various attacks and realize the secure data transmission among users, fog nodes, and cloud servers. This study analyzes two schemes for smart healthcare, and points out that Hajian et al.’s scheme cannot resist stolen verifier attack, denial of service attacks impersonation attacks, node capture attack, and session key disclosure attacks; Wu et al.’s scheme cannot resist offline password guessing attacks and impersonation attacks. Furthermore, a fog computing-based three-party authentication and key agreement protocol are proposed for smart healthcare. The security is proved by using the random oracle model, the BAN logic, and heuristic analysis. As result, it is secure against known attacks. The performance comparison with related schemes shows that the proposed scheme is more suitable for fog computing-based smart healthcare.
    Available online:  December 08, 2022 , DOI: 10.13328/j.cnki.jos.006656
    Abstract:
    The observation of tumor location and growth is an important link in the formulation of tumor treatment plans. Intervention methods based on medical images can be employed to visually observe the status of the tumor in the patient in a non-invasive way, predict the growth of the tumor, and ultimately help physicians develop a treatment plan specific to the patient. This study proposes a new deep network model, namely the conditional adversarial spatiotemporal encoder model, to predict tumor growth. This model mainly consists of three parts: the tumor prediction generator, the similarity score discriminator, and conditions composed of the patient’s personal situations. The tumor prediction generator predicts the tumor in the next period according to the tumor images of two periods. The similarity score discriminator is used to calculate the similarity between the predicted tumor and the real one. In addition, this study adds the patient’s personal situations as conditions to the tumor growth prediction process. The proposed model is experimentally verified on two collected medical datasets. The experimental results achieve a recall rate of 76.10%, an accuracy rate of 91.70%, and a Dice coefficient of 82.4%, indicating that the proposed model can accurately predict the tumor images of the next period.
    Available online:  December 08, 2022 , DOI: 10.13328/j.cnki.jos.006654
    Abstract:
    Community is an important attribute of information networks. Community search, as an important content of information network analysis, aims to find a set of nodes that meet the conditions specified by the user. As heterogeneous information networks contain more comprehensive and richer structural and semantic information, community search in such networks has received extensive attention in recent years. However, the existing community search methods for heterogeneous information networks cannot be directly applied when the search conditions are complex. For this reason, this study defines community search under complex conditions and proposes search algorithms considering asymmetric meta-paths, constrained meta-paths, and prohibited node constraints. These three algorithms respectively use the meta-path completion strategy, the strategy of adjusting batch search with labeling, and the way of dividing complex search conditions to search communities. Moreover, two optimization algorithms respectively based on the pruning strategy and the approximate strategy are designed to improve the efficiency of the search algorithm with prohibited node constraints. A large number of experiments are performed on real datasets, and the experimental results verify the effectiveness and efficiency of the proposed algorithms.
    Available online:  November 30, 2022 , DOI: 10.13328/j.cnki.jos.006533
    Abstract:
    Recommendation systems, which can effectively filter information based on user preferences, has been applied widely. The problem of cold start and data sparsity becomes more and more serious with the explosive growth of the number of users. Multi-source data fusion, which can effectively alleviate the recommendation accuracy under the conditions of data sparsity and the cold start problem, is favored by researchers. Its main idea is to fuse auxiliary information of users in other aspects for missing values filling to optimize the accuracy of target recommendation service. Nevertheless, more serious risk of privacy disclosure is introduced due to the relations between data. To solve the above problems, this study proposes a deep cross-domain recommendation model with privacy protection. In detail, a deep learning collaborative recommendation method is designed featuring multi-source data fusion and differential privacy protection. On the one hand, this method fuses auxiliary domain information to improve the accuracy of recommendation and corrects the deviation of abnormal points to improve the performance of the recommender system; on the other hand, this method adds noise in the collaborative training process based on differential privacy model to solve the data security problem in data fusion. In order to evaluate the long tail effect in the recommendation system, this study proposes a new metric—discovery degree for the first time, which is used to measure the ability of the recommendation algorithm to find users’ invisible requirements. Based on the performance comparison and analysis of the existing algorithms, the results show that the proposed method has better recommendation accuracy and diversity than the existing methods on the premise of ensuring privacy security, and can effectively discover the hidden needs of users.
    Available online:  November 30, 2022 , DOI: 10.13328/j.cnki.jos.006534
    Abstract:
    In order to improve the CPU utilization of spacecraft computers, the new generation of spacecraft operating system uses a hybrid scheduling algorithm that includes both fixed-point starting tasks and sporadic tasks. Among them, fixed-point starting tasks are often safety-critical tasks and need to be started at fixed points and cannot be blocked during execution. Under the condition that fixed-point starting tasks and sporadic tasks coexist, the existing real-time lock protocols cannot guarantee that the blocking time of fixed-point starting tasks is zero, so on the basis of the classic priority ceiling protocol, a real-time lock protocol based on the idea of avoidance blocking is proposed in this study to ensure that sporadic tasks' access to shared resources will not affect the execution of fixed-point starting tasks by judging in advance and setting virtual starting point. At the same time, by temporarily increasing the access priority of some resources, the cost caused by task preemption can be reduced. This paper presents the worst blocking time of the above lock protocol and uses the schedulable rate experiments to analyze its performance. Experiments show that in the case of short critical sections, this protocol can control the schedulability loss caused by accessing shared resources to under 27%.
    Available online:  November 30, 2022 , DOI: 10.13328/j.cnki.jos.006523
    Abstract:
    Microservice is becoming the mainstream architecture of the cloud-based software systems because of its agile development and rapid deployment. However, the structure of a microservice system is complex, it often has hundred of service instances. Moreover, the call relationship between services is extremely complex. When an anomaly occurs in the microservice system, it is difficult to locate the root causes of the anomaly. The end-to-end request tracing method becomes the standard configuration of a microservice system to solve this problem. However, current methods of distributed request tracing are intrusive to applications and heavily rely on the developers’ expertise in request tracing. Besides, it is unable to start or stop the tracing functionality at runtime. These defects not only increase the burden of developers but also restrict the adoption of distributed request tracing technique in practice. This study designs and implements a transparent request tracing system named Trace++, which can generate tracing code automatically and inject the generated code into the running application by using dynamic code instrumentation technology. Trace++ is low intrusive to programs, transparent to developers, and can start or stop the tracing functionality flexibly. In addition, the adaptive sampling method of Trace++ effectively reduces the cost of request tracing. The results of the experiments conducted on TrainTicket, a microservice system, show that Trace++ can discover the dependencies between services accurately and its performance cost is close to the source code instrumentation method when it starts request tracing. When the request tracing functionality is stopped, Trace++ incurs no performance cost. Moreover, the adaptive sampling method can preserve the representative trace data while 89.4% of trace data are reduced.
    Available online:  November 30, 2022 , DOI: 10.13328/j.cnki.jos.006527
    Abstract:
    BLAS (basic linear algebra subprograms) is an important module of the high-performance extended math library, which is widely used in the field of scientific and engineering computing. Level 1 BLAS provides vector-vector operation, Level 2 BLAS provides matrix-vector operation. This study designs and implements highly optimized Level 1 and Level 2 BLAS routines for SW26010-Pro, a domestic many-core processor. A reduction strategy among CPEs is designed based on the RMA communication mechanism, which improves the reduction efficiency of many Level 1 and Level 2 BLAS routines. For TRSV and TPSV and other routines that have data dependencies, a series of efficient parallelization algorithms are proposed. The algorithm maintains data dependencies through point-to-point synchronization and designs an efficient task mapping mechanism that is suitable for triangular matrices, which reduces the number of point-to-point synchronizations effectively, and improves the execution efficiency. In this study, adaptive optimization, vector compression, data multiplexing, and other technologies have further improved the memory access bandwidth utilization of Level 1 and Level 2 BLAS routines. The experimental results show that the memory access bandwidth utilization rate of the Level 1 BLAS routines can reach as high as 95%, with an average bandwidth of more than 90%. The memory access bandwidth utilization rate of Level 2 BLAS routines can reach 98%, with an average bandwidth of more than 80%. Compared with the widely used open-source linear algebra library GotoBLAS, the proposed implementation of Level 1 and Level 2 BLAS routines achieved an average speedup of 18.78 times and 25.96 times. With the optimized Level 1 and Level 2 BLAS routines, LQ decomposition, QR decomposition, and eigenvalue problems achieved an average speedup of 10.99 times.
    Available online:  November 30, 2022 , DOI: 10.13328/j.cnki.jos.006529
    Abstract:
    Secure multi-party computation is one of the hot issues in international cryptographic community. The secure computation of intersection-sum is a new problem of secure multi-party computation. The problem has important theoretical significance and practical value in the fields of industry, commerce, and healthcare. The existing solutions are designed under the condition that the private sets are subsets of a universal set and the intersection cardinality will be leaked and there are some false probabilities. This study, based on the Paillier cryptosystem, designs three protocols for the intersection-sum problem. These protocols are secure in the semi-honest model. Protocol 1 privately computes the number of common identifiers (i.e., user identifier intersection cardinality) and the sum of the integer values associated with these users, Protocol 2 and Protocol 3 privately compute the sum of the associated integer values of intersection elements without leaking the intersection cardinality. The whole computation process does not reveal any more information about their private inputs except for the intersection-sum. The protocols do not restrict that the private sets are subsets of a universal set, and they can be applied in more scenarios. It is proved, by using the simulation paradigm, that these protocols are secure in the semi-honest model. The efficiency of the protocols is also tested by experiments.
    Available online:  November 30, 2022 , DOI: 10.13328/j.cnki.jos.006517
    Abstract:
    One of the main challenges of blockchain technology is to ensure the privacy protection of transaction identity under the condition of open ledger and multi-party consensus. At present, the identity privacy protection scheme based on anonymous authentication and transaction mixing in public blockchain is difficult to be popularized in the industry due to the lack of supervision. Based on the identity privacy protection scheme in Monero, this study introduces the role of the regulator, designs a supervised privacy protection scheme for the transaction receiver based on one-time address encryption and zero knowledge proof. It also designs a linkable revocable ring signature scheme based on linkable ring signature and revocable ring signature so as to implement the supervised privacy protection scheme for transaction sender based on autonomous mixing. The scheme can not only protect the identity privacy of the participants, but also support the offline transaction identity recovery for the regulator so as to achieve the regulatory purpose of “controllable anonymity”. The analysis and test results show that the algorithm operation time is millisecond in this scheme, which can meet the performance requirements of blockchain in non-high frequency transaction scenarios.
    Available online:  November 30, 2022 , DOI: 10.13328/j.cnki.jos.006519
    Abstract:
    General matrix multiply (GEMM) is one of the most used functions in scientific and engineering computation, and it is also the base function of many linear algebra libraries. Its performance usually has essential influence on the whole application. Besides, because of its intensity in computation, its efficiency is often considered as an important metric of the hardware platform. This study conducts systematic optimization to dense GEMM on the domestic SW1621 processor. Based on analysis of the baseline code and profiling of various overhead, as well as utilization of the architectural features and instruction set, optimization for DGEMM is carefully designed and performed, including blocking scheme, packing mechanism, kernel function implementation, data prefetch, etc. Besides, a code generator is developed, which can generate different assembly and C code according to the input parameters. Using the code generator, together with auto-tuning scripts, it is able to find the optimal values for the tunable parameters. After applying the optimizations and tuning, the proposed single thread DGEMM achieved 85% of the peak performance of a single core, and 80% of the performance of the entire chip of 16 cores. The optimization to DGEMM not only improves the performance of BLAS on SW1621, but also provides an important reference for optimizing dense data computation on SW series multi-core machines.
    Available online:  October 26, 2022 , DOI: 10.13328/j.cnki.jos.006755
    Abstract:
    The critical reliability and availability of distributed systems are threatened by crash recovery bugs caused by incorrect crash recovery mechanisms and their implementations. The detection of crash recovery bugs, however, can be extremely challenging since these bugs only manifest themselves when a node crashes under special timing conditions. This study presents a novel approach Deminer to automatically detect crash recovery bugs in distributed systems. Observations in the large-scale distributed systems show that node crashes that interrupt the execution of related I/O write operations, which store a piece of data (i.e., common data) in different places, e.g., different storage paths or nodes, are more likely to trigger crash recovery bugs. Therefore, Deminer detects crash recovery bugs by automatically identifying and injecting such error-prone node crashes under the usage guidance of common data. Deminer first tracks the usage of critical data in a correct run. Then, it identifies I/O write operation pairs that use the common data and predicts error-prone injection points of a node crash on the basis of the execution trace. Finally, Deminer tests the predicted injection points of the node crash and checks failure symptoms to expose and confirm crash recovery bugs. A prototype of Deminer is implemented and evaluated on the latest versions of four widely used distributed systems, i.e., ZooKeeper, HBase, YARN, and HDFS. The experimental results show that Deminer is effective in finding crash recovery bugs. Deminer has detected six crash recovery bugs.
    Available online:  October 26, 2022 , DOI: 10.13328/j.cnki.jos.006753
    [Abstract] (655) [HTML] (0) [PDF 1.88 M] (1262)
    Abstract:
    As a complement and extension of the terrestrial network, the satellite network contributes to the acceleration of bridging the digital divide between different regions and can expand the coverage and service range of the terrestrial network. However, the satellite network features highly dynamic topology, long transmission delay, and limited on-board computing and storage capacity. Hence, various technical challenges, including routing scalability and transmission stability, are encountered in the organic integration of the satellite network and the terrestrial network and the construction of a global space-ground integrated network (SGIN). Considering the research challenges of SGIN, this paper describes the international and domestic research progress of SGIN in terms of network architecture, routing, transmission, multicast-based content delivery, etc., and then discusses the research trends.
    Available online:  October 26, 2022 , DOI: 10.13328/j.cnki.jos.006752
    Abstract:
    Most traditional information hiding methods embed secret data by modifying cover data, which inevitably leaves traces of modification in cover data, and hence, it is difficult to resist the detection of the existing steganalysis algorithms. Consequently, the technique of coverless information hiding emerges, which hides secret data without modifying cover data. To improve the hiding capacity and robustness of coverless information hiding, this study proposes a constructive data hiding method based on texture synthesis and recognition with image style transfer. Firstly, natural images and texture images of different categories are used to construct the content image database and the textural style image database, respectively. A mapping dictionary of binary codes is established according to the categories of natural images in the content image database. Secondly, the labeled textural image database should be constructed and input into the convolutional neural network as a training dataset, and the texture image recognition model can be obtained by iterative training. In this way, the secret data can be extracted from stego images at the receiving end. During secret data hiding, natural images are selected from the content image database according to to-be-embedded secret data fragments, which are synthesized to form a stego mosaic image. Then, a texture image is randomly selected from the textural style image database, and the stego texture image can be generated by the selected texture image and the stego mosaic image with the strategy of style transfer to achieve secret data hiding. During secret data extraction, the obtained texture image recognition model can accurately identify the original categories of stego texture images corresponding to natural images, and secret data can be finally extracted by reference to the mapping dictionary. The experimental results demonstrate that the proposed method can achieve the stego texture image with a satisfactory visual effect and a high hiding capacity, and it illustrates strong robustness to attacks such as JPEG compression and Gaussian noise.
    Available online:  October 26, 2022 , DOI: 10.13328/j.cnki.jos.006749
    Abstract:
    Code change is a kind of key behavior in software evolution, and its quality has a large impact on software quality. Modeling and representing code changes is the basis of many software engineering tasks, such as just-in-time defect prediction and recovery of software product traceability. The representation learning technologies for code changes have attracted extensive attention and have been applied to diverse applications in recent years. This type of technology targets at learning to represent the semantic information in code changes as low-dimensional dense real-valued vectors, namely, learning the distributed representation of code changes. Compared with the conventional methods of manually designing code change features, such technologies offers the advantages of automatic learning, end-to-end training, and accurate representation. However, this field is still faced with some challenges, such as great difficulties in utilizing structural information and the absence of benchmark datasets. This study surveys and summarizes the recent progress of studies and applications of representation learning technologies for code changes, and it mainly consists of the following four parts. (1) The study presents the general framework of representation learning of code changes and its application. (2) Subsequently, it reviews the currently available representation learning technologies for code changes and summarizes their respective advantages and disadvantages. (3) Then, the downstream applications of such technologies are summarized and classified. (4) Finally, this study discusses the challenges and potential opportunities ahead of representation learning technologies for code changes and suggests the directions for the future development of this type of technology.
    Available online:  October 26, 2022 , DOI: 10.13328/j.cnki.jos.006648
    Abstract:
    The many-objective evolutionary algorithm based on decomposition is the main approach to solving many-objective optimization problems, but its performance largely depends on the matching degree between the adopted reference vectors and the real Pareto front (PF). Existing decomposition-based many-objective evolutionary algorithms can hardly deal with all kinds of many-objective optimization problems with different PF at the same time. To solve this issue, this study proposes a many-objective evolutionary algorithm based on the curvature estimation (MaOEA-CE) of PF. The core of the proposed algorithm includes two aspects: Firstly, on the basis of PF curvature estimation, different reference vectors are generated in each iteration to gradually match the real PF of different kinds of problems. Secondly, with the estimated PF curvature, the appropriate aggregation function is used to select elite solutions and dynamically adjust the generated reference vector in the environmental selection, which can improve the convergence while maintaining the diversity of the population. Moreover, MaOEA-CE is compared with seven advanced many-objective algorithms on three mainstream problem sets for testing, i.e., DTLZ, WFG, and MaF, to verify its effectiveness. The experimental results prove that MaOEA-CE has strong competitiveness.
    Available online:  October 14, 2022 , DOI: 10.13328/j.cnki.jos.006526
    Abstract:
    The file hierarchy ciphertext policy attribute-based encryption (FH-CP-ABE) scheme realizes multi-level files encryption with the single access policy, which saves the computation cost of encryption and decryption and the storage cost of ciphertext. Nevertheless, the existing file hierarchy CP-ABE scheme cannot support graded user access, while suffers due to the unauthorized access. For this reason, a file hierarchy CP-ABE scheme that supports graded user access is proposed. In the proposed scheme, the graded user access tree is constructed, and the ciphertext subsections are reconstructed to support the access requirements of graded users, thus eliminate the possibility of users to conduct unauthorized access. The security analysis shows that the proposed scheme can resist selective chosen-plaintext attack. Both theoretical and experimental analyses show that the proposed scheme is more efficient in terms of computation and storage compared to related scheme.
    Available online:  September 30, 2022 , DOI: 10.13328/j.cnki.jos.006536
    Abstract:
    Cross-modal hashing can greatly improve the efficiency of cross-modal retrieval by mapping data of different modalities into more compact hash codes. Nevertheless, existing cross-modal hashing methods usually use a binary similarity matrix, which cannot accurately describe the semantic similarity relationships between samples and suffer from the squared complexity problem. In order to better mine the semantic similarity relationships of data, this study presents a label enhancement based discrete cross-modal hashing method (LEDCH). It first leverages the prior knowledge of transfer learning to generate the label distribution of samples, then constructs a stronger similarity matrix through the label distribution, and generates the hash codes by an efficient discrete optimization algorithm with a small quantization error. Finally, experimental results on two benchmark datasets validate the effectiveness of the proposed method on cross-modal retrieval tasks.
    Available online:  September 30, 2022 , DOI: 10.13328/j.cnki.jos.006541
    Abstract:
    The asymmetric flow generated by the widely deployed address translation technology brings challenges to the design of load balancing system. To solve the problem of insufficient use of multi-core processors and network card hardware capabilities by software load balancers, an asymmetric flow load balancing method based on flow characteristics is proposed. Firstly, a data packet dispatching algorithm to dispatch packets into expected CPU core via hardware is proposed. Then, an elephant flow detection algorithm is constructed by analyzing of the temporal and spatial characteristics of packet sequences. Finally, based on detected results, a load balance offloading method is proposed. The experimental results show that, asymmetric flow load balancing method can correctly handle the asymmetric flow. Meanwhile, the average throughput rate increases by 14.5%.
    Available online:  September 30, 2022 , DOI: 10.13328/j.cnki.jos.006545
    Abstract:
    Recently, with the continuous improvement of realism requirements of movies, games, virtual reality applications, etc., the real-time rendering of translucent materials such as human organs and milk has become more and more important. For most of the current subsurface scattering calculation methods, it is difficult to correct the scattering range. To tackle this estimation issue, a new subsurface scattering calculation formula is proposed to accurately represent the maximum scattering distance. First, the brute-force Monte Carlo photon tracking results are simulated to obtain the reflectance profile results. Second, the selected polynomial model is used to fit the reflectance profile to calculate the precise maximum scattering range at the shading point. To begin with, a new importance sampling scheme is proposed to reduce the number of Monte Carlo samples, thereby increasing the computational efficiency. In addition, the required parameters are only provided by the reflectance on the shading points and the mean free path of the material, so as to flexibly adjust the rendering effect. Experiments results have shown that the proposed model can avoid the previous error estimation of the scattering range, and has more accurate rendering results of the complex reflectivity area of the material. Meanwhile, the rendering rate meets real-time requirements.
    Available online:  September 23, 2022 , DOI: 10.13328/j.cnki.jos.006543
    Abstract:
    The security of traditional cryptographic algorithms is based on the black-box attack model. In this attack model, the attacker can only obtain the input and output of the cryptographic algorithm, but not the internal details of the cryptographic algorithm. In recent years, the concept of white-box attack model has been proposed. In the white-box attack model, attackers can not only obtain the input and output of cryptographic algorithm, but also directly observe or change the internal data of cryptographic algorithm. In order to ensure the security of existing cryptographic algorithms under white-box attack environment, redesigning the existing cryptographic algorithms through white-box cryptography technology without changing their functions is called white-box implementation of existing cryptographic algorithms. It is of great significance to study the design and analysis of the white-box implementation scheme for solving the issue of digital rights management. In recent years, a kind of side channel analysis method for white-box implementation schemes has emerged. This kind of analysis method only needs to know a few internal details of white-box implementation schemes, then it can extract the key. Therefore, it is the analysis method with practical threat to the existing white-box implementation schemes. It is of great practical significance to analyze the existing white-box implementation schemes to ensure the security of the schemes. The typical representative of this kind of analysis method is the differential computation analysis (DCA) based on the principle of differential power analysis. This study analyzes the Bai-Wu white-box SM4 scheme based on DCA. Based on the research results of the statistical characteristics of n-order uniform random invertible matrix on GF(2), an improved DCA (IDCA) is proposed, which can significantly improve the analysis efficiency on the premise of almost constant success rate. The results also show that the Bai-Wu white-box SM4 scheme can not guarantee the security in the face of DCA, therefore, it must be further improved to meet the security requirements of practical scenarios.
    Available online:  September 23, 2022 , DOI: 10.13328/j.cnki.jos.006532
    Abstract:
    As an effective technique for black-box state machine models of software systems, model learning (a.k.a. automata learning) can be divided into active and passive learning. Based on given input and output alphabets, the minimum complete state machine of the target system can be obtained in polynomial time through active interaction with the black box system. And the algorithm of equivalence query is still a big obstacle to the development and application of active automata learning tools. This study discusses the influence of counterexamples on the learning algorithms with the discrimination tree, and defines the comparison rules of hypotheses, and proposes two principles of constructing test cases. According to the principle, the Wp-method equivalence query algorithm is improved to produce better hypotheses and effectively reduce the number of queries and symbols. Based on the LearnLib, three kinds of automata are used as experimental objects to verify the effectiveness of the principle and the improved algorithm.
    Available online:  September 23, 2022 , DOI: 10.13328/j.cnki.jos.006530
    Abstract:
    This study proposes a convolutional neural network (CNN) based Transformer to solve the panoptic segmentation task. The method draws on the inherent advantages of the CNN in image feature learning and avoids increase in the amount of calculation when the Transformer is transplanted into the vision task. The CNN-based Transformer is attributed to the two basic structures of the projector performing the feature domain transformation and the extractor responsible for the feature extraction. The effective combination of the projector and the extractor forms the framework of the CNN-based Transformer. Specifically, the projector is implemented by a lattice convolution that models the spatial relationship of the image by designing and optimizing the convolution filter configuration. The extractor is performed by a chain network that improves feature extraction capabilities by chain block stacking. Considering the framework and the substantial function of panoptic segmentation, the CNN-based Transformer is successfully applied to solve the panoptic segmentation task. The experimental results on the MS COCO and Cityscapes datasets demonstrate that the proposed method has excellent performance.
    Available online:  September 20, 2022 , DOI: 10.13328/j.cnki.jos.006531
    Abstract:
    The chosen-ciphertext attack (CCA) security model can effectively figure active attacks in reality. The existing cryptosystems against CCA are mainly designed by foreign countries, and China is lack of its CCA secure cryptosystems. Although there are general transformation approaches to achieving CCA security, they lead to an increase in both computational overhead and communication overhead. Based on the SM9 encryption algorithm, this study proposes an identity-based broadcast encryption scheme with CCA security. The design is derived from the SM9, and the size of the private key and ciphertext is constant and independent of the number of receivers chosen in the data encryption phase. Specifically, the private key includes one element, and the ciphertext is composed of three elements. If the GDDHE assumption holds, the study proves that the proposed scheme has selective CCA security under the random oracle model. In order to achieve CCA security, a dummy identity is introduced in designing the encryption algorithm, and the identity can be used to answer the decryption query successfully. Analysis shows that the proposed scheme is comparable to the existing efficient identity-based broadcast encryption schemes in terms of computational efficiency and storage efficiency.
    Available online:  September 20, 2022 , DOI: 10.13328/j.cnki.jos.006664
    Abstract:
    Smart contracts running on the blockchain can hardly be modified after deployment, and their call and execution rely on a consensus procedure. Consequently, existing debugging methods that require the modification of the smart contract code or the interruption of execution cannot be directly applied to smart contracts. Since the running of a smart contract is composed of ordered execution of blockchain transactions, tracing the execution of the transactions is an effective approach to render the smart contract more debuggable. The major goal of tracing blockchain transaction execution is to unveil how a blockchain transaction produces such a result in execution. The execution of a blockchain transaction relies on the internal state of the blockchain, and this state is determined by the execution results of previous transactions, which results in transitive dependencies. Such dependencies and the characteristics of the execution environment the blockchain provides bring challenges to tracing. The tracing of blockchain transaction execution is mainly faced with three challenges: how to obtain enough information for tracing from the production environment in which the smart contract is deployed, how to obtain the dependencies among the blockchain transactions, and how to ensure the consistency between the result of tracing and the real execution online. This study proposes a tracing method for blockchain transaction execution based on recording and replay. By building a recording and replay mechanism in the contract container, the proposed method enables the recording of state reading and writing operations during transaction execution without modifying the contract code and interrupting the running of the smart contract. A transaction dependency analysis method based on state reading and writing is proposed to support the retracing of previous transactions linked by dependencies on demand. Moreover, a verification mechanism for reading and writing operation recording is designed to ensure that the consistency between the replaying execution and the real online execution can be verified. The tracing method can trace the execution of the blockchain transaction that calls the smart contract, which can be used in debugging of smart contracts. When loss is caused by the failure of smart contracts, the tracing result can be used as evidence. Experiments are conducted for a performance comparison between storing recorded reading and writing operations on chain and off chain. The advantages and effectiveness of the proposed method in tracing blockchain transaction execution are revealed by a case study.
    Available online:  September 09, 2022 , DOI: 10.13328/j.cnki.jos.006525
    Abstract:
    With machine learning widely applied to the natural language processing (NLP) domain in recent years, the security of NLP tasks receives growing natural concerns. Existing studies found that small modifications in examples might lead to wrong machine learning predictions, which was also called adversarial attack. The textual adversarial attack can effectively reveal the vulnerability of NLP models for improvement. Nevertheless, existing textual adversarial attack methods all focus on designing complex adversarial example generation strategies with a limited improvement of success rate, and the highly invasive modifications bring the decline of textual quality. Thus, a simple and effective method with high adversarial example quality is in demand. To solve this problem, the sememe-level sentence dilution algorithm (SSDA) and the dilution pool construction algorithm (DPCA) are proposed from a new perspective of improving the process of adversarial attack. SSDA is a new process that can be freely embedded into the classical adversarial attack workflow. SSDA first uses dilution pools constructed by DPCA to dilute the original examples, then generates adversarial examples through those diluted examples. It can not only improve the success rate of any adversarial attack methods without any limit of datasets or victim models but also obtain higher adversarial example quality compared with the original method. Through the experiments of different datasets, dilution pools, victim models, and textual adversarial attack methods, it is successfully verified the improvement of SSDA on the success rate and proved that dilution pools constructed by DPCA can further enhance the dilution ability of SSDA. The experiment results demonstrate that SSDA reveals more vulnerabilities of models than classical methods, and DPCA can help SSDA to improve success rate with higher adversarial example quality.
    Available online:  June 06, 2022 , DOI: 10.13328/j.cnki.jos.006642
    Abstract:
    The computation offloading problem of multi-access edge computing (MEC) has become one of the research focuses. The current computation offloading scheme only considers the computation offloading problem in the cloud, edge, and end structures and does not take into account the attributes of the public and private clouds. In this study, a novel computation offloading scheme is proposed, which considers the relationship between the public cloud and private cloud in edge computing and regards the public cloud as a supplement to private cloud resources to alleviate the insufficient computing power caused by the limitations of private cloud resources. Moreover, a two-layer Stackelberg game is established to solve the computation offloading problem. The optimal strategies of each player are obtained upon the analysis of the strategies and profits of the public cloud, the private cloud, and users, and the existence and uniqueness of the Nash equilibrium solution to the two-layer game are proved. The simulation results and analysis verifies the feasibility of the computation offloading scheme based on the two-layer Stackelberg game. Compared with the computation offloading scheme based on the single-layer Stackelberg game, the proposed scheme is more efficient and more suitable for edge computing environments.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006645
    [Abstract] (869) [HTML] (0) [PDF 4.10 M] (1792)
    Abstract:
    Event extraction is to automatically extract event information in which users are interested from unstructured natural language texts and express it in a structured form. Event extraction is an important direction in natural language processing and understanding and is of high application value in different fields, such as government management of public affairs, financial business, and biomedicine. According to the degree of dependence on manually labeled data, the current event extraction methods based on deep learning are mainly divided into two categories: supervised learning and distantly-supervised learning. This article provides a comprehensive overview of current event extraction techniques in deep learning. Focusing on supervised methods such as CNN, RNN, GAN, GCN, and distant supervision, this study systematically summarizes the research in recent years. Additionally, the performance of different deep learning models is compared and analyzed in detail. Finally, the challenges facing event extraction are analyzed, and the research trends are forecasted.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006657
    [Abstract] (438) [HTML] (0) [PDF 8.80 M] (1109)
    Abstract:
    Lattice-based cryptanalysis, an analysis method using the algorithms solving hard lattice problems to analyze the security of public-key cryptosystems, has become one of the powerful mathematical tools for studying the security of the Rivest-Shamir-Adleman (RSA)-type cryptographic algorithms. The key point of this method is the construction of the lattice basis. There exists a general strategy for lattice basis construction. However, this general strategy fails to fully and flexibly utilize the algebraic structure of the RSA algorithm and its variants. In recent years, lattice-based cryptanalysis of RSA-type algorithms mostly focuses on introducing special techniques of lattice base construction on the basis of the general strategy. This study starts by outlining lattice-based cryptanalysis and the general strategy for lattice basis construction and summarizing several commonly used techniques of lattice basis construction. Subsequently, the main achievements in lattice-based cryptanalysis of the standard RSA algorithm are reviewed, and they involve factoring with known bits, small private exponent attacks, and partial key exposure attacks. Then, the special algebraic structures of several mainstream variants of the RSA algorithm and the techniques of lattice basis construction applicable to these variants are summarized. Finally, the available work on lattice-based cryptanalysis of the RSA algorithm and its variants is classified and summed up, and the prospects of the research and development of lattice-based cryptanalysis are presented.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006650
    [Abstract] (350) [HTML] (0) [PDF 6.78 M] (1000)
    Abstract:
    Discourse structure analysis aims to understand the overall structure of an article and the semantic relationships between its various parts. As a research hotspot of natural language processing, it has developed rapidly in recent years. This study first summarizes the mainstream discourse structure analysis theories in English and Chinese and then introduces the research on the popular English and Chinese discourse corpora as well as their calculation models. On this basis, this study surveys the current work context of discourse structure analysis in Chinese and English and constructs its research framework. Moreover, the current research trends and focuses are summarized, and the application of discourse structure in downstream tasks is introduced briefly. Finally, this study points out the issues and challenges in the current Chinese discourse structure analysis to provide guidance and help for future research.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006520
    [Abstract] (450) [HTML] (0) [PDF 7.32 M] (1040)
    Abstract:
    In view of the fact that the syntactic relationship is not fully utilized and the argument role is missing in event extraction, an event extraction based on dual attention mechanism (EEDAM) method is proposed to improve the accuracy and recall rate of event extraction. Firstly, sentence coding is based on four embedded vectors and dependency relation is introduced to construct dependency relation graph, so that deep neural network can make full use of syntactic relation. Then, through graph transformation attention network, new dependency arcs and aggregate node information are generated to capture long-range dependencies and potential interactions, weighted attention network is integrated to capture key semantic information in sentences, and sentence level event arguments are extracted to improve the prediction ability of the model. Finally, the key sentence detection and similarity ranking are used to fill in the document level arguments. The experimental results show that the event extraction method based on dual attention mechanism can improve the accuracy rate, recall rate, and F1-score by 17.82%, 4.61%, and 9.80% respectively compared with the optimal baseline joint multiple Chinese event extractor (JMCEE) on ACE2005 data set. On the data set of dam safety operation records, the accuracy, recall rate, and F1 score are 18.08%, 4.41%, and 9.93% higher than the optimal baseline JMCEE, respectively.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006666
    [Abstract] (679) [HTML] (0) [PDF 6.45 M] (1565)
    Abstract:
    During software development, developers use third-party libraries extensively to achieve code reuse. Due to the dependencies among different third-party libraries, the incompatibilities among them lead to errors during the installing, loading, or calling of those libraries and ultimately result in system anomalies. Such a problem is called a dependency conflict (DC, also referred to as conflict dependency or CD) issue of third-party libraries. The root cause of such issues is that the third-party libraries loaded fail to cover the required features (e.g., methods) cited by the software. DC issues often occur during the download and install, project compiling, and running of third-party libraries and are difficult to locate. Fixing DC issues requires developers to know the differences among the versions of the third-party libraries they use accurately, and the complex dependencies among the third-party libraries increase the difficulty in this work. To identify the DC issues in the software before its running and to deal with the system anomalies caused by those issues during running, researchers around the world have conducted various studies on such issues. This study presents a systematic review of this research topic from four aspects, including the empirical analysis of third-party library usage, the cause analysis of DC issues, and the detection methods and common fixing ways for such issues. Finally, the potential research opportunities in this field are discussed.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006677
    [Abstract] (722) [HTML] (0) [PDF 7.50 M] (1152)
    Abstract:
    Distributed ledger (DL), as a distributed data management architecture, maintains data records (the ledgers) across distributed nodes based on consensus mechanisms and protocols. It can comprehensively record all information of data ownership, transmission, and trading chains in distributed ledgers. Additionally, data will be not tampered and denied throughout the life cycle of data production and transactions, providing an endorsement for data rights confirmation, protection, and audit. Blockchain is a typical implementation of DL systems. With the emerging digital economy applications including digital currency and data asset trading, DL technologies receive increasingly widespread attention. However, system performance is one of the key technical bottlenecks for large-scale application of DL systems, and ledger performance optimization has become a focus of the academia and industry. The study investigates the methods, technologies, and typical solutions of DL performance optimization from four perspectives of system architecture, ledger data structure, consensus mechanism, and message communication.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006671
    [Abstract] (922) [HTML] (0) [PDF 5.32 M] (1631)
    Abstract:
    Inverse reinforcement learning (IRL), also known as inverse optimal control (IOC), is an important research method of reinforcement learning and imitation learning. IRL solves a reward function from expert samples, and the optimal strategy is then solved to imitate expert strategies. In recent years, fruitful achievements have been yielded by IRL in imitation learning, with widespread application in vehicle navigation, path recommendation, and robotic optimal control. First, this study presents the theoretical basis of IRL. Then, from the perspective of reward function construction methods, IRL algorithms based on linear and non-linear reward functions are analyzed. The algorithms include maximum marginal IRL, maximum entropy IRL, maximum entropy deep IRL, and generative adversarial imitation learning. In addition, frontier research directions of IRL are reviewed to compare and analyze relevant representative algorithms containing IRL with incomplete expert demonstrations, multi-agent IRL, IRL with sub-optimal expert demonstrations, and guiding IRL. Finally, the primary challenges of IRL and future developments in its theoretical and application significance are summarized.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006672
    Abstract:
    The basic concept of the multiobjective optimization evolutionary algorithm based on decomposition (MOEA/D) is to transform a multiobjective optimization problem into a set of subproblems (single-objective or multiobjective) for optimization solutions. Since MOEA/D was proposed in 2007, it has attracted extensive attention from Chinese and international scholars and become one of the most representative multiobjective optimization evolutionary algorithms. This study summarizes the research progress on MOEA/D in the past thirteen years. The advances include algorithm improvements of MOEA/D, research of MOEA/D on many-objective optimization and constraint optimization,and application of MOEA/D in some practical issues. Then, several representative improved algorithms of MOEA/D are compared through experiments. Finally, the study presents several potential research topics of MOEA/D in the future.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006679
    [Abstract] (721) [HTML] (0) [PDF 5.88 M] (1186)
    Abstract:
    Deep learning (DL) systems have powerful learning and reasoning capabilities and are widely employed in many fields including unmanned vehicles, speech recognition, intelligent robotics, etc. Due to the dataset limit and dependence on manually labeled data, DL systems are prone to unexpected behaviors. Accordingly, the quality of DL systems has received widespread attention in recent years, especially in safety-critical fields. Fuzz testing with strong fault-detecting ability is utilized to test DL systems, which becomes a research hotspot. This study summarizes existing fuzz testing for DL systems in the aspects of test case generation (including seed queue construction, seed selection, and seed mutation), test result determination, and coverage analysis. Additionally, commonly used datasets and metrics are introduced. Finally, the study prospects for the future development of this field.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006680
    Abstract:
    In recent years, deep learning technology has made remarkable progress in many computer vision tasks. More and more researchers have tried to apply it to medical image processing, such as the segmentation of an atomical structures in high-throughput medical images (CT, MRI), which can improve the efficiency of image reading for doctors. Deep learning models for training medical image processing need a large amount of labeled data, and the data from a separate medical institution can not meet this requirement. Moreover, due to the difference in medical equipment and acquisition protocols, the data from different medical institutions are largely heterogeneous. This results in the difficulty in obtaining reliable results on the data of a certain medical institution with the model trained by data from other medical institutions. In addition, the distribution of different medical data in patients’ disease stages is uneven, thereby reducing the reliability of the model. Technologies including domain adaptation and multi-site learning emerge to reduce the impact of data heterogeneity and improve the generalization ability of the model. As a research hotspot in transfer learning, domain adaptation is intended to transfer knowledge learned from the source domain to data of the unlabeled target domain. Multi-site learning and federated learning with non-independent and identically distributed data aim to improve the robustness of the model by learning a common representation on multiple datasets. This study investigates, analyzes, and summarizes domain adaptation, multi-site learning, and federated learning with non-independent and identically distributed datasets in recent years, providing references for related research.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006681
    Abstract:
    Heterogeneous information network is a representation of heterogeneous data. How to integrate complex semantic information of heterogeneous data is one of the challenges faced by recommendation systems. A high-order embedded learning framework for heterogeneous information networks based on weak ties featured by semantic information and information transmission abilities is constructed. The framework includes three modules of initial information embedding, high-order information embedding aggregation, and recommendation prediction. The initial information embedding module first adopts the best trust path selection algorithm to avoid information loss caused by sampling a fixed number of neighbors in a full-relational heterogeneous information network. Then the newly defined importance measure factors of multi-task shared characteristics based on multi-head attention are adopted to filter out the semantic information of each node. Additionally, combined with the interactive structure, the network nodes are effectively characterized. The high-order information embedding aggregation module realizes high-order information expression by integrating weak ties and good knowledge representation ability of network embedding. The hierarchical propagation mechanism of heterogeneous information networks is utilized to aggregate the characteristics of sampled nodes into the nodes to be predicted. The recommendation prediction module employs the influence recommendation method of high-order information to complete the recommendation. The framework is characterized by rich embedded nodes, fusion of shared attributes, and implicit interactive information. Finally, the experiments have verified that UI-HEHo can effectively improve the accuracy of rating prediction, as well as the pertinence, novelty and diversity of recommendation generation. Especially in application scenarios with sparse data, UI-HEHo yields good recommendation effects.
    Available online:  May 24, 2022 , DOI: 10.13328/j.cnki.jos.006684
    [Abstract] (591) [HTML] (0) [PDF 6.40 M] (1036)
    Abstract:
    Distributed systems play an important role in computing environments. Consensus protocols are employed to guarantee consistency among nodes. Design errors in the consensus protocols might cause failure in system operation and bring catastrophic consequences to humans and the environment. Therefore, it is important to prove the correctness of consensus protocols. Formal verification can strictly prove the correctness of target properties in designed models, which is suitable for verifying consensus protocols. However, the expanding scale of distributed systems results in more complicated issues and challenges to formal verification of consensus protocols. The method for formal verification of the consensus protocol design and increase in the verification scale are significant research issues in the formal verification of consensus protocols. This study investigates the current research on the employment of formal methods to verify consensus protocols, summarizes the key modeling methods and verification technologies, and proposes future research directions in this field.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006535
    Abstract:
    Heterogeneous information networks can be used for modeling several applications in the real world. Their representation learning has received extensive attention from scholars. Most of the representation learning methods extract structural and semantic information based on meta-paths and their effectiveness in network analysis have been proved. However, these methods ignore the node internal information and different degrees of importance of meta-path instances. Besides, they can capture only the local node information. Thus, this study proposes a heterogeneous network representation learning method fusing mutual information and multiple meta-paths. First, a meta-path internal encoding method called relational rotation encoding is used, which captures the structural and semantic information of the heterogeneous information network according to adjacent nodes and meta-path context nodes. It uses an attention mechanism to model the importance of each meta-path instance. Then, an unsupervised heterogeneous network representation learning method fusing mutual information maximization and multiple meta-paths is proposed and mutual information can capture both global and local information. Finally, experiments are conducted on two real datasets. Compared with the current mainstream algorithms as well as some semi-supervised algorithms, the results show that the proposed method has better performance on node classification and clustering.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006615
    Abstract:
    Asynchronous programs utilize asynchronous non-blocking calls to achieve program concurrency, and they are widely applied in parallel and distributed systems. However, it is very complex to verify asynchronous programs, and the difficulty can be ranked as EXPSPACE in terms of both safety and liveness. This study proposes a program model of asynchronous programs and defines two problems of asynchronous programs, namely, ?-equivalence and ?-reachability. In addition, the two problems can be proved to be NP-complete by reducing the 3-CNF-SAT to the problems and making them further reduced to the reachability of the communication-free Petri net. The case shows that the two problems can solve the verification problems of asynchronous programs.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006616
    Abstract:
    The reliable functioning of safety-critical IT systems depends heavily on the correct execution of programs. Deductive program verification can be performed to guarantee the correct execution to a large extent. There are already a plethora of programming languages in use, and new languages oriented toward high-reliability scenarios are still being invented. As a result, it is difficult to devise a full-fledged logical system for each language to support the verification of programs and prove the soundness and completeness of the logical system with respect to the formal semantics of the language. Furthermore, language-independent verification techniques offer sound verification procedures parameterized over the formal semantics of programming languages. The specialization of the verification procedure with the formal semantics of a concrete programming language directly gives rise to a verification procedure for the language. This study proposes a language-independent verification technique based on big-step operational semantics. The technique features a unified procedure for sound reasoning about program structures that potentially cause unbounded behavior, such as iteration and recursion. In particular, the study employs a functional formalization of big-step operational semantics to support the explicit representation of the computation performed by the sub-structures of a program. This representation enables the exploitation of the auxiliary information provided for these sub-structures in the unified reasoning process. In addition, the study has proved the soundness and relative completeness of the proposed technique, evaluated the technique by verification examples in imperative and functional programming languages, and formalized all theoretical results and verification examples in the Coq proof assistant, and thus provides a basis for developing a language-independent program verifier with big-step operational semantics based on a proof assistant.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006617
    Abstract:
    With the development of the Internet, the 5th generation (5G) of mobile communication technology emerges. The 5G authentication and key agreement (5G-AKA) protocol is proposed mainly to achieve two-way authentication between users and service networks. However, recent research suggests that the protocol may be subject to information deciphering and message replay attacks. At the same time, it is found that some variants of the current 5G-AKA cannot satisfy the protocol’s unlinkability. Therefore, in response to these shortcomings, this study proposes an improvement plan called SM-AKA. SM-AKA is composed of two parallel sub-protocols in a novel way. In addition, through the flexible mode switching, lightweight sub-protocols (GUTI submodule) are frequently adopted, and the other sub-protocol (SUPI submodule) is used to deal with abnormalities caused by authentication. Therefore, this mechanism not only realizes the efficient authentication between users and networks but also improves the stability of the protocol. Furthermore, the freshness of variables has been effectively maintained to prevent the replay of messages, and strict encryption and decryption methods have further strengthened the security of the protocol. Finally, the study carries out a complete evaluation of SM-AKA. Through formal modeling, attack assumptions, and Tamarin derivation, it is proved that the plan can achieve the authentication and privacy goals, and the theoretical analysis has demonstrated the performance advantage of the protocol.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006634
    Abstract:
    Interference among wireless signals hinders the concurrent transmission of signals and reduces the throughput of wireless networks. Link scheduling is an effective way to improve throughput and decrease transmission delay of wireless networks as the signal-to-interference-plus-noise ratio (SINR) model can accurately describe the inherent characteristics of wireless signal propagation and truly reflect the interference among wireless signals. Therefore, this study proposes an online distributed link scheduling (OLD_LS) algorithm in the dynamic wireless networks with the constant approximation factor of the SINR model. Specifically, online means that nodes can join and leave wireless networks at any time, and this arbitrary behavior of nodes reflects the dynamic characteristics of wireless networks. The OLD_LS algorithm partitions the network region into hexagons to localize the global interference of the SINR model. In addition, a leader election (LE) subroutine in dynamic networks is designed in this study. It is shown that as long as the dynamic rate of nodes is less than 1/ ε, LE can elect a leader with a high probability in the time complexity of O(logn + logR), where ε is a constant satisfying ε ≤ 5(1?21?α/2)/6, with $\alpha $ being the path loss exponent, n the number of senders, and R the longest link length. To the best of our knowledge, the algorithm proposed in this study is the first OLD_LS algorithm for dynamic wireless networks.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006635
    [Abstract] (1236) [HTML] (0) [PDF 9.23 M] (1393)
    Abstract:
    Network measurement is the basis of scenes including network performance monitoring, traffic management, and fault diagnosis, and in-band network telemetry (INT) has become the focus of network measurement research due to its timeliness, accuracy, and scalability. With the emergence and development of programmable data planes, many practical INT solutions have been proposed thanks to their rich information feedback and flexible function deployment. First, this study analyzes the principles and deployment challenges of typical INT solutions INT and AM-PM. Second, according to the optimization measures and extension of INT, it studies the characteristics of the optimization mechanism from the aspects of the data collection process and multi-tasking, as well as the feasibility of technology extension in terms of wireless networks, optical networks, and hybrid networks. Third, in view of the applications of INT in typical scenes, the characteristics of these INT applications are comparatively investigated from the perspectives of in-network performance sensing, network-level telemetry systems, traffic scheduling, and fault diagnosis. Finally, a research summary of INT is made, and the future research directions are predicted.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006636
    Abstract:
    This study proposes a new classical key recovery attack against schemes such as Feistel, Misty, and Type-1/2 generalized Feistel schemes (GFS), which creatively combines the birthday attack with the periodic property of Simon’s algorithm. Although Simon’s algorithm can recover the periodic value in polynomial time, this study requires the birthday bound to recover the corresponding periodic value in the classical setting. By this new attack, the key to a 5-round Feistel-F scheme can be recovered with the time complexity of ${\rm{O}}({2^{3n/4}})$ under the chosen plaintexts and ciphertexts of ${\rm{O}}({2^{n/4}})$ , and the corresponding memory complexity is ${\rm{O}}({2^{n/4}})$. Compared with the results of Isobe and Shibutani, the above result not only increases one round but also requires lower memory complexity. For the Feistel-FK scheme, a 7-round key recovery attack is constructed. In addition, the above approach is applied to construct the key recovery attacks against Misty schemes and Type-1/2 GFS. Specifically, the key recovery attacks against the 5-round Misty L-F and Misty R-F schemes and those against the 6-round Misty L-KF/FK and Misty R-KF/FK schemes are given; for the d-branch Type-1 GFS, a d2-round key recovery attack is presented, and when d≥6, the number of rounds of the key recovery attack is superior to those of the existing key recovery attacks.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006637
    Abstract:
    Comment generation for software codes has been an important research task in the field of software engineering in the past few years. Several research efforts have achieved impressive results on the open-source datasets that contain copious <code snippet, comment> pairs. In the practice of software enterprises, however, the codes to be commented usually belong to a software project library, and it should be decided first on which code lines the comment generation can achieve better performance; moreover, the code snippets to be commented have different lengths and granularity. Thus, a code comment generation method is required, which can integrate commenting decisions and comment generation and is resistant to noise. To this end, CoComment, a software project-oriented code comment generation approach, is proposed in this study. This approach can automatically extract domain-specific basic concepts from software project documents and then uses code parsing and text matching to propagate and expand these concepts. On this basis, automatic code commenting decisions are made by locating code lines or segments related to these concepts, and corresponding natural language comments with high readability are generated upon the fusion of concepts and contexts with templates. Comparative experiments are conducted on three enterprise software projects containing more than 46000 manually annotated code comments. The experimental results demonstrate the proposed approach can effectively make code commenting decisions and generate more helpful code comments compared with existing methods, which provides an integrated solution to code commenting decisions and comment generation for software projects.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006638
    Abstract:
    With the rapid development of technologies such as the Internet of Things (IoT) and cloud computing, portable health clinics (PHCs) have been realized and widely used in telemedicine. Relying on the significant advantages of 5G communications, China has actively promoted the construction of smart healthcare and built a multi-function and high-quality telemedicine information service platform.The realization of telemedicine represented by PHCs is inseparable from the technical support of remote data-sharing systems. At present, the remote data-sharing system combining IoT and the cloud server (CS) has attracted wide attention due to its flexibility and efficiency, but its privacy and security issues are rarely studied. Considering the sensitivity of medical data, this paper endeavors to study the security and privacy issues in the PHC data-sharing system. As a result, in the PHC system, this study achieves the secure uploading of IoT awareness data, normalization of personalized ciphertexts, dynamic multi-user fine-grained access control, and efficient decryption operations, and it also presents formal security verification. The specific innovations of this study are as follows: (1) The classical proxy re-encryption (PRE) and attribute-based encryption algorithms are improved, and an IPRE-TO-FAME combined encryption mechanism is proposed to ensure the data-sharing security of the PHC system with cloud-edge collaboration. (2) To address the challenge of key updates caused by many highly distributed IoT terminals, this paper uses the idea of PRE to realize the key updates on the basis of the unilateral transformation without changing the keys to IoT terminals. Meanwhile, the re-encryption entities can be regarded as fully trusted in the application scenarios of this study, which is different from the situation of the conventional PRE mechanism, where the re-encryption entities are usually untrusted third-party servers. Therefore, the conventional PRE algorithm is improved, and an efficient improved PRE (IPRE) algorithm is put forward to adapt to the scenarios proposed in this study. (3) The classical fast attribute-based message encryption (FAME) mechanism is improved to enable dynamic multi-user fine-grained access control. In this way, users can easily use portable intelligent devices to access data anytime and anywhere. The security proofs, theoretical analysis, and experimental results reveal that the proposed solution is highly secure and practical, which is an effective way to ensure secure PHC data sharing.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006594
    [Abstract] (751) [HTML] (0) [PDF 9.71 M] (1583)
    Abstract:
    The realization of safe and efficient behavior decision-making has become a challenging issue for autonomous driving. As autonomous driving industries develop vigorously, industrial professionals and academic members have proposed many autonomous driving behavior decision-making approaches. However, due to the influence of environmental uncertainties as well as requirements for effectiveness and high security of the decision, existing approaches fail to take all these factors into account. Therefore, this study proposes an autonomous driving behavior decision-making approach with the RoboSim model based on the Bayesian network. First, based on domain ontology, the study analyzes the semantic relationship between elements in autonomous driving scenarios and predicts the intention of dynamic entities in scenarios by the LSTM model, so as to provide driving scenario information for establishing the Bayesian network. Next, the autonomous driving behavior decision-making in specific scenarios is inferred by the Bayesian network, and the state transition of the RoboSim model is employed to carry the dynamic execution of behavior decision-making and eliminate the redundant operation of the Bayesian network, thus improving the efficiency of decision-making. The RoboSim model is platform-independent. In addition, it can simulate the decision-making cycle and support validation technologies in different forms. To ensure the safety of the behavior decision-making, this study uses a model checking tool UPPAAL to verify and analyze the RoboSim model. Finally, based on lane change and overtaking cases, this study validates the feasibility of the proposed approach and provides a feasible way to achieve safe and efficient autonomous driving behavior decision-making.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006614
    Abstract:
    Determinization of an automaton refers to the transformation of a nondeterministic automaton into a deterministic automaton recognizing the same language, which is one of the fundamental notions in automata theory. Determinization of ω automata serves as a basic step in the decision procedures of SnS, CTL*, and μ-calculus. Meanwhile, it is also the key to solving infinite games. Therefore, studies on the determinization of ω automata are of great significance. This study focuses on a kind of ω automata called Streett automata. Nondeterministic Streett automata can be transformed into equivalent deterministic Rabin or Parity automata. In the previous work, the study has obtained algorithms with optimal state complexity and optimal asymptotical performance, respectively. Furthermore, it is necessary to develop a tool supporting Streett determinization, so as to evaluate the actual effect of proposed algorithms and show the procedure of determinization visually. This study first introduces four different Streett determinization structures including μ-Safra trees, H-Safra trees, compact Streett Safra trees, and LIR-H-Safra trees. By H-Safra trees, which are optimal, and μ-Safra trees, deterministic Rabin transformation automata are obtained. In addition, deterministic parity transformation automata are constructed via another two structures, where LIR-H-Safra trees are asymptotically optimal. Furthermore, based on the open source software named graphical tool for omega-automata and logics (GOAL), the study develops a tool for Streett determinization and names it NS2DR&PT to support the four structures. Besides, corresponding test sets are constructed by randomly generating 100 Streett automata, and comparative experiments are carried out. Results show that the actual effect of state complexity in each structure is consistent with theoretical analysis. Moreover, the efficiency of different algorithms is compared and analyzed.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006595
    [Abstract] (485) [HTML] (0) [PDF 4.84 M] (1128)
    Abstract:
    ARM develops an M-Profile vector extension solution in terms of ARMv8.1-M micro processor architecture and names it ARM Helium. It is declared that ARM Helium can increase the machine learning performance of the ARM Cortex-M processor by up to 15 times. As the Internet of Things develops rapidly, the correct execution of microprocessors is important. In addition, the official manual of instruction sets provides a basis for developing chip simulators and on-chip applications, and thus it is the basic guarantee of program correctness. This study introduces these mantic correctness of vectorized machine learning instructions in the official manual of the ARMv8.1-M architecture by using K Framework. Furthermore, the study automatically extracts pseudo codes describing the vectorized machine learning instruction operation based on the manual and then formalizes them in semantics rules. With the executable framework provided by K Framework, the correctness of machine learning instructions in arithmetic operation is verified.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006612
    Abstract:
    The security of the trusted execution environment (TEE) has been concerned by Chinese and foreign researchers. Memory tag technology utilized in TEE helps to achieve finer-grained memory isolation and access control mechanisms. Nevertheless, prior works often rely on testing or empirical analysis to show their effectiveness, which fails to strongly guarantee the functional correctness and security properties. This study proposes a general formal model framework for memory tag-based access control and introduces a security analysis method in access control based on model checking. First, a general model framework for the access control model of TEE based on memory tag is constructed through a formal method, and those access control entities are formally defined. The defined rules include access control rules and tag update rules. Then abstract machines under the framework are incrementally designed and implemented with formal language B. In addition, the basic properties of the machines are formalized through invariant constraints. Next, a TEE implementation called TIMBER-V is used as an application case. The TIMBER-V access control model is constructed by instantiating these abstract machines, and the security properties are formally specified. Furthermore, the functional correctness and security of the models are verified based on model checking. Finally, this study simulates the specific attack scenarios, and these attacks are successfully detected. The evaluation results have proved the effectiveness of the security analysis method.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006538
    [Abstract] (668) [HTML] (0) [PDF 1.82 M] (1215)
    Abstract:
    Existing malware similarity measurement methods cannot accommodate code obfuscation technology and lack the ability to model the complex relationships between malware. This study proposes a malware similarity measurement method called API relation graph enhanced multiple heterogeneous proxembed (RG-MHPE) based on multiplex heterogeneous graph to solve the above problems. This method first uses the dynamic and static feature of malware to construct the multiplex heterogeneous graph and then proposes an enhanced proximity embedding method based on relational paths to solve the problem that proximity embedding cannot be applied to the similarity measurement of the multiplex heterogeneous graph. In addition, this study extracts knowledge from API documents on the MSDN website, builds an API relation graph, learns the similarity between Windows APIs, and effectively slows down the aging speed of similarity measurement models. Finally, the experimental results show that RG-MHPE has the best performance in similarity measurement performance and model anti-aging ability.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006552
    [Abstract] (602) [HTML] (0) [PDF 3.59 M] (1107)
    Abstract:
    As emerging technologies develop rapidly, domain software puts forward new requirements for development efficiency. In addition, as a declarative programming language with concise syntax and well-defined semantics, Datalog can help developers solve complex problems rapidly and achieve smooth development and thus has attracted wide attention in recent years. However, when solving real-world problems, the existing single-machine Datalog engines are often limited by the size of memory capacity and possess no scalability. To solve these problems, this study designs and implements a Datalog engine based on out-of-core computing. Firstly, a series of operators supporting out-of-core computing are designed to compute the Datalog program, and then the program is converted into a C++ program with the operators. Next, the study designs a partition strategy based on Hash and a minimum replacement scheduling strategy based on search tree pruning. After that, the corresponding partition files are scheduled and computed to generate the final results. Based on this method, the study establishes the prototype tool DDL (disk-based Datalog engine) and selects widely used real-world Datalog programs to conduct experiments on both synthetic and real-world datasets. The experimental results show that DDL has positive performance and high scalability.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006618
    Abstract:
    Data races are common defects in multi-threaded programs. Traditional data race analysis methods fail to achieve both recall and precision, and their detection reports cannot locate the root cause of defects. Due to the advantages of Petri nets in terms of accurate behavior description and rich analysis tools in the modeling and analysis of concurrent systems, this study proposes a new data race detection method based on Petri net unfolding technology. First, a Petri net model of the program is established by analyzing and mining a program running trace. The model implies different traces of the program even though it is mined from a single trace, which can reduce the false negative rate of traditional dynamic analysis methods while ensuring performance. After that, a Petri net unfolding-based detection method of program potential data races is proposed, which improves the efficiency significantly compared with static analysis methods and can clearly show the triggering path of data race defects. Finally, for the potential data race detected in the previous stage, a scheduling schema is designed to replay the defect based on the CalFuzzer platform, which can eliminate false positives and ensure the authenticity of detection results. In addition, the corresponding prototype system is developed, and the effectiveness of the proposed method is verified by open program instances.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006593
    [Abstract] (713) [HTML] (0) [PDF 8.51 M] (1355)
    Abstract:
    In recent years, deep reinforcement learning has been widely used in sequential decisions with positive effects, and it has outstanding advantages in application scenarios with high-dimensional input and large state spaces. However, deep reinforcement learning faces some limitations such as a lack of interpretability, inefficient initial training, and a cold start. To address these issues, this study proposes a dynamic decision framework combing explicit knowledge reasoning with deep reinforcement learning. The framework successfully embeds the priori knowledge in intelligent agent training via explicit knowledge representation and gets the agent intervened by the knowledge reasoning results during the reinforcement learning, so as to improve the training efficiency and the model’s interpretability. The explicit knowledge in this study is categorized into two kinds, namely, heuristic acceleration knowledge and evasive safety knowledge. The heuristic acceleration knowledge intervenes in the decision of the agent in the initial training to speed up the training, while the evasive safety knowledge keeps the agent from making catastrophic decisions to keep the training process stable. The experimental results show that the proposed framework significantly improves the training efficiency and the model’s interpretability under different application scenarios and reinforcement learning algorithms.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006558
    Abstract:
    Feature requests refer to suggestions to perfect existing features or requests for new features proposed by software users on open platforms, and they can reflect users’ wishes and needs. In addition, efficient and accurate analysis and processing of feature requests play a vital role in improving user satisfaction and product competitiveness. With users’ active participation, feature requests have become an important source of software requirements. However, feature requests are different from traditional requirements in terms of source, content, and form. Therefore, methods of applying feature requests to software development must differ from that of traditional requirements. At present, massive research focuses on applying feature requests to software development, e.g., feature requests’ acquisition, classification, prioritization, quality management, developer recommendation, and location of relevant codes. As related research emerges constantly, it is increasingly necessary to review user feature request analysis and processing. This study analyzes 121 global academic research papers on how to analyze and process feature requests in the software development process and systematically sorts existing research results from the perspective of applying feature requests to software development. In addition, the study summarizes research topics on feature requests, suggests that feature requests be applied to software development, and makes a comparison with traditional requirements engineering processes. Furthermore, it analyzes existing research methods of different requirement engineering and points out the difference. Finally, the research direction of feature requests is discussed to provide guidance for future researchers.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006592
    [Abstract] (1440) [HTML] (0) [PDF 4.38 M] (2468)
    Abstract:
    In recent years, artificial intelligence (AI) has rapidly developed. AI systems have penetrated people’s lives and become an indispensable part. However, these systems require a large amount of data to train models, and data disturbances will affect their results. Furthermore, as the business becomes diversified, and the scale gets complex, the trustworthiness of AI systems has attracted wide attention. Firstly, based on the trustworthiness attributes proposed by different organizations and scholars, this study introduces nine trustworthiness attributes of AI systems. Next, in terms of the data, model, and result trustworthiness, the study discusses methods for measuring the data, model, and result trustworthiness of existing AI systems and designs an evidence collection method of AI trustworthiness. Then, it summarizes the trustworthiness measurement theory and methods of AI systems. In addition, combined with attribute-based software trustworthiness measurement methods and blockchain technologies, the study establishes a trustworthiness measurement framework for AI systems, which includes methods of trustworthiness attribute decomposition and evidence acquisition, the federation trustworthiness measurement model, and the blockchain-based trustworthiness measurement structure of AI systems. Finally, it describes the opportunities and challenges of trustworthiness measurement technologies for AI systems.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006351
    [Abstract] (521) [HTML] (0) [PDF 1.06 M] (1063)
    Abstract:
    How to detect sudden events in data streams on social media is a popular research topic in natural language processing. However, current methods for extracting emergencies have problems of low accuracy and low efficiency. In order to solve these problems, this paper proposes an emergency detection method based on the characteristics of word correlation, which can quickly detect emergency events from the social network data stream, so that relevant decision makers can take timely and effective measures to deal with, making the negative impact of emergencies can be reduced as much as possible to maintain social stability. First of all, through noise filtering and emotion filtering, we get microblog texts full of negative emotions. Then, based on the time information, time slice the Weibo data to calculate the word frequency characteristics, user influence and word frequency growth rate characteristics of each word of the data in each time window, and use the burst calculation method to extract the burst word. According to the word2vec model, similar words are merged, and the characteristic similarity of the burst words is used to form a burst word relationship graph. Finally, the multi-attribute spectral clustering algorithm is used to optimally divide the word relationship graph, and pay attention to abnormal words when the time window slides, and to judge the sudden events through the structural changes caused by the sudden changes of the words in the sub-graph. It is known from the experimental results that the emergency event detection method has a better event detection effect in the real-time blog post data stream. Compared with the existing methods, the emergency detection method proposed in this paper can meet the needs of emergency detection. Not only can it detect the detailed information of sub-events, but also the relevant information of events can be accurately detected.
    Available online:  October 18, 2017 , DOI:
    [Abstract] (2756) [HTML] (0) [PDF 525.21 K] (4459)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,315-325页. 原文链接如下:https://doi.org/10.1145/3106237.3106242, 读者如需引用该文请标引原文出处。
    Available online:  October 18, 2017 , DOI:
    [Abstract] (2733) [HTML] (0) [PDF 352.38 K] (5612)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,303-314页. 原文链接如下:https://doi.org/10.1145/3106237.3106239, 读者如需引用该文请标引原文出处。
    Available online:  September 11, 2017 , DOI:
    [Abstract] (3182) [HTML] (0) [PDF 276.42 K] (2613)
    Abstract:
    GitHub, a popular social-software-development platform, has fostered a variety of software ecosystems where projects depend on one another and practitioners interact with each other. Projects within an ecosystem often have complex inter-dependencies that impose new challenges in bug reporting and fixing. In this paper, we conduct an empirical study on cross-project correlated bugs, i.e., causally related bugs reported to different projects, focusing on two aspects: 1) how developers track the root causes across projects; and 2) how the downstream developers coordinate to deal with upstream bugs. Through manual inspection of bug reports collected from the scientific Python ecosystem and an online survey with developers, this study reveals the common practices of developers and the various factors in fixing cross-project bugs. These findings provide implications for future software bug analysis in the scope of ecosystem, as well as shed light on the requirements of issue trackers for such bugs.
    Available online:  June 21, 2017 , DOI:
    [Abstract] (3242) [HTML] (0) [PDF 169.43 K] (2737)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在IEEE Transactions on Software Engineering 2017 已录用待发表. 原文链接如下:http://ieeexplore.ieee.org/document/7792694, 读者如需引用该文请标引原文出处。
    Available online:  June 13, 2017 , DOI:
    [Abstract] (4443) [HTML] (0) [PDF 174.91 K] (3166)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 39th International Conference on Software Engineering, Pages 27-37, Buenos Aires, Argentina — May 20 - 28, 2017, IEEE Press Piscataway, NJ, USA ?2017, ISBN: 978-1-5386-3868-2 原文链接如下:http://dl.acm.org/citation.cfm?id=3097373, 读者如需引用该文请标引原文出处。
    Available online:  January 25, 2017 , DOI:
    [Abstract] (3321) [HTML] (0) [PDF 254.98 K] (2533)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 871-882. DOI: https://doi.org/10.1145/2950290.2950364 原文链接如下:http://dl.acm.org/citation.cfm?id=2950364, 读者如需引用该文请标引原文出处。
    Available online:  January 18, 2017 , DOI:
    [Abstract] (3780) [HTML] (0) [PDF 472.29 K] (2488)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Pages 133—143, Seattle WA, USA, November 2016. 原文链接如下:http://dl.acm.org/citation.cfm?id=2950327, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017 , DOI:
    [Abstract] (3555) [HTML] (0) [PDF 293.93 K] (2318)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE'16), 810 – 821, November 13 - 18, 2016. 原文链接如下:https://doi.org/10.1145/2950290.2950310, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017 , DOI:
    [Abstract] (3884) [HTML] (0) [PDF 244.61 K] (2627)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE 2016, 原文链接如下:http://dl.acm.org/citation.cfm?doid=2950290.2950313, 读者如需引用该文请标引原文出处。
    Available online:  December 12, 2016 , DOI:
    [Abstract] (3425) [HTML] (0) [PDF 358.69 K] (2641)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE'16会议上Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 原文链接如下:http://dl.acm.org/citation.cfm?id=2950340, 读者如需引用该文请标引原文出处。
    Available online:  September 30, 2016 , DOI:
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在ASE2016会议上http://ase2016.org/ 原文链接如下:http://dl.acm.org/citation.cfm?id=2970366 读者如需引用该文请标引原文出处。
    Available online:  September 09, 2016 , DOI:
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 俊杰的文章发表在ASE2016会议上,http://ase2016.org/。 原文链接如下:http://dl.acm.org/citation.cfm?doid=2970276.2970300 请读者标引时请引注原文出处。
    Available online:  September 07, 2016 , DOI:
    Abstract:
    CCF 软件工程专业委员会白晓颖教授(清华大学)推荐。 原文发表在 ASE 2016 Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering。 全文链接:http://dx.doi.org/10.1145/2970276.2970307。 重要提示:读者如引用该文时请标注原文出处。
    Available online:  August 29, 2016 , DOI:
    Abstract:
    CCF软件工程专业委员会白晓颖教授(清华大学)推荐。 该论文发表在ACM Transactions on Software Engineering and Methodology (TOSEM, Vol. 25, No. 2, Article 13, May 2016),被ICSE 2016主会邀请为“Journal first”报告, 全文参见http://dl.acm.org/citation.cfm?id=2876443。 论文作者包括北京大学的周明辉,马秀娟,张路和梅宏,以及田纳西大学的Audris Mockus。 重要提示:读者如引用该文时请标注原文出处。
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291, DOI:
    [Abstract] (36575) [HTML] (0) [PDF 832.28 K] (77516)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2010,21(3):427-437, DOI:
    [Abstract] (32307) [HTML] (0) [PDF 308.76 K] (36695)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2011,22(1):71-83, DOI:10.3724/SP.J.1001.2011.03958
    [Abstract] (29234) [HTML] (0) [PDF 781.42 K] (52185)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2016,27(1):45-71, DOI:10.13328/j.cnki.jos.004914
    [Abstract] (28450) [HTML] (1439) [PDF 880.96 K] (28397)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2009,20(5):1337-1348, DOI:
    [Abstract] (27525) [HTML] (0) [PDF 1.06 M] (42979)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2008,19(1):48-61, DOI:
    [Abstract] (27363) [HTML] (0) [PDF 671.39 K] (59302)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2009,20(2):271-289, DOI:
    [Abstract] (26450) [HTML] (0) [PDF 675.56 K] (41065)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2005,16(1):1-7, DOI:
    [Abstract] (21620) [HTML] (0) [PDF 614.61 K] (19115)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2004,15(3):428-442, DOI:
    [Abstract] (20209) [HTML] (0) [PDF 1009.57 K] (15282)
    Abstract:
    With the rapid development of e-business, web applications based on the Web are developed from localization to globalization, from B2C(business-to-customer) to B2B(business-to-business), from centralized fashion to decentralized fashion. Web service is a new application model for decentralized computing, and it is also an effective mechanism for the data and service integration on the web. Thus, web service has become a solution to e-business. It is important and necessary to carry out the research on the new architecture of web services, on the combinations with other good techniques, and on the integration of services. In this paper, a survey presents on various aspects of the research of web services from the basic concepts to the principal research problems and the underlying techniques, including data integration in web services, web service composition, semantic web service, web service discovery, web service security, the solution to web services in the P2P (Peer-to-Peer) computing environment, and the grid service, etc. This paper also presents a summary of the current art of the state of these techniques, a discussion on the future research topics, and the challenges of the web services.
    2010,21(8):1834-1848, DOI:
    [Abstract] (19766) [HTML] (0) [PDF 682.96 K] (53280)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2005,16(5):857-868, DOI:
    [Abstract] (19522) [HTML] (0) [PDF 489.65 K] (28350)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2009,20(1):54-66, DOI:
    [Abstract] (19106) [HTML] (0) [PDF 1.41 M] (48089)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2012,23(4):962-986, DOI:10.3724/SP.J.1001.2012.04175
    [Abstract] (18259) [HTML] (0) [PDF 2.09 M] (29275)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):32-45, DOI:10.3724/SP.J.1001.2012.04091
    [Abstract] (18207) [HTML] (0) [PDF 408.86 K] (28838)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2009,20(3):524-545, DOI:
    [Abstract] (17089) [HTML] (0) [PDF 1.09 M] (20599)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2009,20(1):124-137, DOI:
    [Abstract] (16529) [HTML] (0) [PDF 1.06 M] (20616)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2009,20(11):2965-2976, DOI:
    [Abstract] (16125) [HTML] (0) [PDF 442.42 K] (13876)
    Abstract:
    This paper studies uncertain graph data mining and especially investigates the problem of mining frequent subgraph patterns from uncertain graph data. A data model is introduced for representing uncertainties in graphs, and an expected support is employed to evaluate the significance of subgraph patterns. By using the apriori property of expected support, a depth-first search-based mining algorithm is proposed with an efficient method for computing expected supports and a technique for pruning search space, which reduces the number of subgraph isomorphism testings needed by computing expected support from the exponential scale to the linear scale. Experimental results show that the proposed algorithm is 3 to 5 orders of magnitude faster than a na?ve depth-first search algorithm, and is efficient and scalable.
    2004,15(8):1208-1219, DOI:
    [Abstract] (16123) [HTML] (0) [PDF 948.49 K] (12472)
    Abstract:
    With the explosive growth of network applications and complexity, the threat of Internet worms against network security becomes increasingly serious. Especially under the environment of Internet, the variety of the propagation ways and the complexity of the application environment result in worm with much higher frequency of outbreak, much deeper latency and more wider coverage, and Internet worms have been a primary issue faced by malicious code researchers. In this paper, the concept and research situation of Internet worms, exploration function component and execution mechanism are first presented, then the scanning strategies and propagation model are discussed, and finally the critical techniques of Internet worm prevention are given. Some major problems and research trends in this area are also addressed.
    2009,20(5):1226-1240, DOI:
    [Abstract] (15950) [HTML] (0) [PDF 926.82 K] (14894)
    Abstract:
    This paper introduces the concrete details of combining the automated reasoning techniques with planning methods, which includes planning as satisfiability using propositional logic, Conformant planning using modal logic and disjunctive reasoning, planning as nonmonotonic logic, and Flexible planning as fuzzy description logic. After considering experimental results of International Planning Competition and relevant papers, it concludes that planning methods based on automated reasoning techniques is helpful and can be adopted. It also proposes the challenges and possible hotspots.
    2003,14(10):1717-1727, DOI:
    [Abstract] (15807) [HTML] (0) [PDF 839.25 K] (13203)
    Abstract:
    Sensor networks are integration of sensor techniques, nested computation techniques, distributed computation techniques and wireless communication techniques. They can be used for testing, sensing, collecting and processing information of monitored objects and transferring the processed information to users. Sensor network is a new research area of computer science and technology and has a wide application future. Both academia and industries are very interested in it. The concepts and characteristics of the sensor networks and the data in the networks are introduced, and the issues of the sensor networks and the data management of sensor networks are discussed. The advance of the research on sensor networks and the data management of sensor networks are also presented.
    2009,20(2):350-362, DOI:
    [Abstract] (15717) [HTML] (0) [PDF 1.39 M] (38155)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2015,26(1):62-81, DOI:10.13328/j.cnki.jos.004701
    [Abstract] (15329) [HTML] (1274) [PDF 1.04 M] (23660)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2014,25(4):839-862, DOI:10.13328/j.cnki.jos.004558
    [Abstract] (15049) [HTML] (1386) [PDF 1.32 M] (17709)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2009,20(10):2729-2743, DOI:
    [Abstract] (14207) [HTML] (0) [PDF 1.12 M] (9949)
    Abstract:
    In a multi-hop wireless sensor network (WSN), the sensors closest to the sink tend to deplete their energy faster than other sensors, which is known as an energy hole around the sink. No more data can be delivered to the sink after an energy hole appears, while a considerable amount of energy is wasted and the network lifetime ends prematurely. This paper investigates the energy hole problem, and based on the improved corona model with levels, it concludes that the assignment of transmission ranges of nodes in different coronas is an effective approach for achieving energy-efficient network. It proves that the optimal transmission ranges for all areas is a multi-objective optimization problem (MOP), which is NP hard. The paper proposes an ACO (ant colony optimization)-based distributed algorithm to prolong the network lifetime, which can help nodes in different areas to adaptively find approximate optimal transmission range based on the node distribution. Furthermore, the simulation results indicate that the network lifetime under this solution approximates to that using the optimal list. Compared with existing algorithms, this ACO-based algorithm can not only make the network lifetime be extended more than two times longer, but also have good performance in the non-uniform node distribution.
    2012,23(1):1-20, DOI:10.3724/SP.J.1001.2012.04100
    [Abstract] (14003) [HTML] (0) [PDF 1017.73 K] (28947)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2012,23(5):1148-1166, DOI:10.3724/SP.J.1001.2012.04195
    [Abstract] (13966) [HTML] (0) [PDF 946.37 K] (16035)
    Abstract:
    With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here, the features, influence and related products of cloud databases are first discussed. Then, research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.
    2000,11(11):1460-1466, DOI:
    [Abstract] (13864) [HTML] (0) [PDF 520.69 K] (10262)
    Abstract:
    Intrusion detection is a highlighted topic of network security research in recent years. In this paper, first the necessity o f intrusion detection is presented, and its concepts and models are described. T hen, many intrusion detection techniques and architectures are summarized. Final ly, the existing problems and the future direction in this field are discussed.
    2013,24(8):1786-1803, DOI:10.3724/SP.J.1001.2013.04416
    [Abstract] (13571) [HTML] (0) [PDF 1.04 M] (15205)
    Abstract:
    Many specific application oriented NoSQL database systems are developed for satisfying the new requirement of big data management. This paper surveys researches on typical NoSQL database based on key-value data model. First, the characteristics of big data, and the key technique issues supporting big data management are introduced. Then frontier efforts and research challenges are given, including system architecture, data model, access mode, index, transaction, system elasticity, load balance, replica strategy, data consistency, flash cache, MapReduce based data process and new generation data management system etc. Finally, research prospects are given.
    2004,15(4):571-583, DOI:
    [Abstract] (13479) [HTML] (0) [PDF 1005.17 K] (8898)
    Abstract:
    For most peer-to-peer file-swapping applications, sharing is a volunteer action, and peers are not responsible for their irresponsible bartering history. This situation indicates the trust between participants can not be set up simply on the traditional trust mechanism. A reasonable trust construction approach comes from the social network analysis, in which trust relations between individuals are set up upon recommendations of other individuals. Current p2p trust model could not promise the convergence of iteration for trust computation, and takes no consideration for model security problems, such as sybil attack and slandering. This paper presents a novel recommendation-based global trust model and gives a distributed implementation method. Mathematic analyses and simulations show that, compared to the current global trust model, the proposed model is more robust on trust security problems and more complete on iteration for computing peer trust.
    2006,17(7):1588-1600, DOI:
    [Abstract] (13398) [HTML] (0) [PDF 808.73 K] (13336)
    Abstract:
    Routing technology at the network layer is pivotal in the architecture of wireless sensor networks. As an active branch of routing technology, cluster-based routing protocols excel in network topology management, energy minimization, data aggregation and so on. In this paper, cluster-based routing mechanisms for wireless sensor networks are analyzed. Cluster head selection, cluster formation and data transmission are three key techniques in cluster-based routing protocols. As viewed from the three techniques, recent representative cluster-based routing protocols are presented, and their characteristics and application areas are compared. Finally, the future research issues in this area are pointed out.
    2002,13(7):1228-1237, DOI:
    [Abstract] (13390) [HTML] (0) [PDF 500.04 K] (12839)
    Abstract:
    Software architecture (SA) is emerging as one of the primary research areas in software engineering recently and one of the key technologies to the development of large-scale software-intensive system and software product line system. The history and the major direction of SA are summarized, and the concept of SA is brought up based on analyzing and comparing the several classical definitions about SA. Based on summing up the activities about SA, two categories of study about SA are extracted out, and the advancements of researches on SA are subsequently introduced from seven aspects.Additionally,some disadvantages of study on SA are discussed,and the causes are explained at the same.Finally,it is concluded with some singificantly promising tendency about research on SA.
    2008,19(zk):112-120, DOI:
    [Abstract] (13372) [HTML] (0) [PDF 594.29 K] (13608)
    Abstract:
    An ad hoc network is a collection of wireless mobile nodes dynamically forming a temporary network without the use of any existing network infrastructure or centralized administration. Due to bandwidth constraint and dynamic topology of mobile ad hoc networks, multipath supported routing is a very important research issue. In this paper, we present an entropy-based metric to support stability multipath on-demand routing (SMDR). The key idea of SMDR protocol is to construct the new metric-entropy and select the stability multipath with the help of entropy metric to reduce the number of route reconstruction so as to provide QoS guarantee in the ad hoc network whose topology changes continuously. Simulation results show that, with the proposed multipath routing protocol, packet delivery ratio, end-to-end delay, and routing overhead ratio can be improved in most of cases. It is an available approach to multipath routing decision.
    2011,22(1):115-131, DOI:10.3724/SP.J.1001.2011.03950
    [Abstract] (13311) [HTML] (0) [PDF 845.91 K] (26535)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2009,20(1):11-29, DOI:
    [Abstract] (13258) [HTML] (0) [PDF 787.30 K] (13137)
    Abstract:
    Constrained optimization problems (COPs) are mathematical programming problems frequently encountered in the disciplines of science and engineering application. Solving COPs has become an important research area of evolutionary computation in recent years. In this paper, the state-of-the-art of constrained optimization evolutionary algorithms (COEAs) is surveyed from two basic aspects of COEAs (i.e., constraint-handling techniques and evolutionary algorithms). In addition, this paper discusses some important issues of COEAs. More specifically, several typical algorithms are analyzed in detail. Based on the analyses, it concluded that to obtain competitive results, a proper constraint-handling technique needs to be considered in conjunction with an appropriate search algorithm. Finally, the open research issues in this field are also pointed out.
    2015,26(1):26-39, DOI:10.13328/j.cnki.jos.004631
    [Abstract] (13131) [HTML] (1420) [PDF 763.52 K] (13383)
    Abstract:
    In recent years, transfer learning has provoked vast amount of attention and research. Transfer learning is a new machine learning method that applies the knowledge from related but different domains to target domains. It relaxes the two basic assumptions in traditional machine learning: (1) the training (also referred as source domain) and test data (also referred target domain) follow the independent and identically distributed (i.i.d.) condition; (2) there are enough labeled samples to learn a good classification model, aiming to solve the problems that there are few or even not any labeled data in target domains. This paper surveys the research progress of transfer learning and introduces its own works, especially the ones in building transfer learning models by applying generative model on the concept level. Finally, the paper introduces the applications of transfer learning, such as text classification and collaborative filtering, and further suggests the future research direction of transfer learning.
    2013,24(1):50-66, DOI:10.3724/SP.J.1001.2013.04276
    [Abstract] (13097) [HTML] (0) [PDF 0.00 Byte] (15491)
    Abstract:
    As an important application of acceleration in the cloud, the distributed caching technology has received considerable attention in industry and academia. This paper starts with a discussion on the combination of cloud computing and distributed caching technology, giving an analysis of its characteristics, typical application scenarios, stages of development, standards, and several key elements, which have promoted its development. In order to systematically know the state of art progress and weak points of the distributed caching technology, the paper builds a multi-dimensional framework, DctAF. This framework is constituted of 6 dimensions through analyzing the characteristics of cloud computing and boundary of the caching techniques. Based on DctAF, current techniques have been analyzed and summarized; comparisons among several influential products have also been made. Finally, the paper describes and highlights the several challenges that the cache system faces and examines the current research through in-depth analysis and comparison.
    2003,14(9):1621-1628, DOI:
    [Abstract] (12848) [HTML] (0) [PDF 680.35 K] (18212)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2008,19(8):1902-1919, DOI:
    [Abstract] (12793) [HTML] (0) [PDF 521.73 K] (12615)
    Abstract:
    Visual language techniques have exhibited more advantages in describing various software artifacts than one-dimensional textual languages during software development, ranging from the requirement analysis and design to testing and maintenance, as diagrammatic and graphical notations have been well applied in modeling system. In addition to an intuitive appearance, graph grammars provide a well-established foundation for defining visual languages with the power of precise modeling and verification on computers. This paper discusses the issues and techniques for a formal foundation of visual languages, reviews related practical graphical environments, presents a spatial graph grammar formalism, and applies the spatial graph grammar to defining behavioral semantics of UML diagrams and developing a style-driven framework for software architecture design.
    2008,19(8):1947-1964, DOI:
    [Abstract] (12737) [HTML] (0) [PDF 811.11 K] (8979)
    Abstract:
    Wide-Spread deployment for interactive information visualization is difficult. Non-Specialist users need a general development method and a toolkit to support the generic data structures suited to tree, network and multi-dimensional data, special visualization techniques and interaction techniques, and well-known generic information tasks. This paper presents a model driven development method for interactive information visualization. First, an interactive information visualization interface model (IIVM) is proposed. Then, the development method for interactive information visualization based on IIVM is presented. The Daisy toolkit is introduced, which includes Daisy model builder, Daisy IIV generator and runtime framework with Daisy library. Finally, an application example is given. Experimental results show that Daisy can provide a general solution for development for interactive information visualization.
    2002,13(10):1952-1961, DOI:
    [Abstract] (12670) [HTML] (0) [PDF 570.96 K] (10727)
    Abstract:
    The crucial technologies related to personalization are introduced in this paper, which include the representation and modification of user profile, the representation of resource, the recommendation technology, and the architecture of personalization. By comparing with some existing prototype systems, the key technologies about how to implement personalization are discussed in detail. In addition, three representative personalization systems are analyzed. At last, some research directions for personalization are presented.
    2003,14(9):1635-1644, DOI:
    [Abstract] (12646) [HTML] (0) [PDF 622.06 K] (10865)
    Abstract:
    Computer forensics is the technology field that attempts to prove thorough, efficient, and secure means to investigate computer crime. Computer evidence must be authentic, accurate, complete and convincing to juries. In this paper, the stages of computer forensics are presented, and the theories and the realization of the forensics software are described. An example about forensic practice is also given. The deficiency of computer forensics technique and anti-forensics are also discussed. The result comes out that it is as the improvement of computer science technology, the forensics technique will become more integrated and thorough.
    2010,21(2):231-247, DOI:
    [Abstract] (12558) [HTML] (0) [PDF 1.21 M] (15193)
    Abstract:
    In this paper, a framework is proposed for handling fault of service composition through analyzing fault requirements. Petri nets are used in the framework for fault detecting and its handling, which focuses on targeting the failure of available services, component failure and network failure. The corresponding fault models are given. Based on the model, the correctness criterion of fault handling is given to analyze fault handling model, and its correctness is proven. Finally, CTL (computational tree logic) is used to specify the related properties and enforcement algorithm of fault analysis. The simulation results show that this method can ensure the reliability and consistency of service composition.
    2012,23(1):82-96, DOI:10.3724/SP.J.1001.2012.04101
    [Abstract] (12516) [HTML] (0) [PDF 394.07 K] (13207)
    Abstract:
    Botnets are one of the most serious threats to the Internet. Researchers have done plenty of research and made significant progress. However, botnets keep evolving and have become more and more sophisticated. Due to the underlying security limitation of current system and Internet architecture, and the complexity of botnet itself, how to effectively counter the global threat of botnets is still a very challenging issue. This paper first introduces the evolving of botnet’s propagation, attack, command, and control mechanisms. Then the paper summarizes recent advances of botnet defense research and categorizes into five areas: Botnet monitoring, botnet infiltration, analysis of botnet characteristics, botnet detection and botnet disruption. The limitation of current botnet defense techniques, the evolving trend of botnet, and some possible directions for future research are also discussed.
    2010,21(7):1620-1634, DOI:
    [Abstract] (12250) [HTML] (0) [PDF 765.23 K] (18648)
    Abstract:
    As an application of mobile ad hoc networks (MANET) on Intelligent Transportation Information System, the most important goal of vehicular ad hoc networks (VANET) is to reduce the high number of accidents and fatal consequences dramatically. One of the most important factors that would contribute to the realization of this goal is the design of effective broadcast protocols. This paper introduces the characteristics and application fields of VANET briefly. Then, it discusses the characteristics, performance, and application areas with analysis and comparison of various categories of broadcast protocols in VANET. According to the characteristic of VANET and its application requirement, the paper proposes the ideas and breakthrough direction of information broadcast model design of inter-vehicle communication.
    2017,28(1):1-16, DOI:10.13328/j.cnki.jos.005139
    [Abstract] (12218) [HTML] (1517) [PDF 1.75 M] (7498)
    Abstract:
    Knapsack problem (KP) is a well-known combinatorial optimization problem which includes 0-1 KP, bounded KP, multi-constraint KP, multiple KP, multiple-choice KP, quadratic KP, dynamic knapsack KP, discounted KP and other types of KPs. KP can be considered as a mathematical model extracted from variety of real fields and therefore has wide applications. Evolutionary algorithms (EAs) are universally considered as an efficient tool to solve KP approximately and quickly. This paper presents a survey on solving KP by EAs over the past ten years. It not only discusses various KP encoding mechanism and the individual infeasible solution processing but also provides useful guidelines for designing new EAs to solve KPs.
    2008,19(7):1565-1580, DOI:
    [Abstract] (12202) [HTML] (0) [PDF 815.02 K] (14856)
    Abstract:
    Software defect prediction has been one of the active parts of software engineering since it was developed in 1970's. It plays a very important role in the analysis of software quality and balance of software cost. This paper investigates and discusses the motivation, evolvement, solutions and challenges of software defect prediction technologies, and it also categorizes, analyzes and compares the representatives of these prediction technologies. Some case studies for software defect distribution models are given to help understanding.
    2010,21(5):916-929, DOI:
    [Abstract] (11997) [HTML] (0) [PDF 944.50 K] (16188)
    Abstract:
    Data deduplication technologies can be divided into two categories: a) identical data detection techniques, and b) similar data detection and encoding techniques. This paper presents a systematic survey on these two categories of data deduplication technologies and analyzes their advantages and disadvantages. Besides, since data deduplication technologies can affect the reliability and performance of storage systems, this paper also surveys various kinds of technologies proposed to cope with these two aspects of problems. Based on the analysis of the current state of research on data deduplication technologies, this paper makes several conclusions as follows: a) How to mine data characteristic information in data deduplication has not been completely solved, and how to use data characteristic information to effectively eliminate duplicate data also needs further study; b) From the perspective of storage system design, it still needs further study how to introduce proper mechanisms to overcome the reliability limitations of data deduplication techniques and reduce the additional system overheads caused by data deduplication techniques.
    2008,19(10):2706-2719, DOI:
    [Abstract] (11941) [HTML] (0) [PDF 778.29 K] (10665)
    Abstract:
    Web search engine has become a very important tool for finding information efficiently from the massive Web data. With the explosive growth of the Web data, traditional centralized search engines become harder to catch up with the growing step of people's information needs. With the rapid development of peer-to-peer (P2P) technology, the notion of P2P Web search has been proposed and quickly becomes a research focus. The goal of this paper is to give a brief summary of current P2P Web search technologies in order to facilitate future research. First, some main challenges for P2P Web search are presented. Then, key techniques for building a feasible and efficient P2P Web search engine are reviewed, including system topology, data placement, query routing, index partitioning, collection selection, relevance ranking and Web crawling. Finally, three recently proposed novel P2P Web search prototypes are introduced.
    2006,17(9):1848-1859, DOI:
    [Abstract] (11902) [HTML] (0) [PDF 770.40 K] (19597)
    Abstract:
    In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
    2004,15(12):1751-1763, DOI:
    [Abstract] (11880) [HTML] (0) [PDF 928.33 K] (7110)
    Abstract:
    This paper presents a research work in children Truing test(CTT).The main defference between our test program and other ones is its knowledge-based character,which is supported by a massive commonsense knowledge base.The motivation,design,techniques,experimental results and platform(including a knowledge engine and a cinverstation engine)of the CTT are described in this paper.Finally,some cincluding thoughts about the CTT and AI are given.
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291, DOI:
    [Abstract] (36575) [HTML] (0) [PDF 832.28 K] (77516)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2008,19(1):48-61, DOI:
    [Abstract] (27363) [HTML] (0) [PDF 671.39 K] (59302)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2010,21(8):1834-1848, DOI:
    [Abstract] (19766) [HTML] (0) [PDF 682.96 K] (53280)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2011,22(1):71-83, DOI:10.3724/SP.J.1001.2011.03958
    [Abstract] (29234) [HTML] (0) [PDF 781.42 K] (52185)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2009,20(1):54-66, DOI:
    [Abstract] (19106) [HTML] (0) [PDF 1.41 M] (48089)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2009,20(5):1337-1348, DOI:
    [Abstract] (27525) [HTML] (0) [PDF 1.06 M] (42979)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2009,20(2):271-289, DOI:
    [Abstract] (26450) [HTML] (0) [PDF 675.56 K] (41065)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2009,20(2):350-362, DOI:
    [Abstract] (15717) [HTML] (0) [PDF 1.39 M] (38155)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2004,15(10):1493-1504, DOI:
    [Abstract] (8877) [HTML] (0) [PDF 937.72 K] (37774)
    Abstract:
    Graphics processing unit (GPU) has been developing rapidly in recent years at a speed over Moor抯 law, and as a result, various applications associated with computer graphics advance greatly. At the same time, the highly processing power, parallelism and programmability available nowadays on the contemporary GPU provide an ideal platform on which the general-purpose computation could be made. Starting from an introduction to the development history and the architecture of GPU, the technical fundamentals of GPU are described in the paper. Then in the main part of the paper, the development of various applications on general purpose computation on GPU is introduced, and among those applications, fluid dynamics, algebraic computation, database operations, and spectrum analysis are introduced in detail. The experience of our work on fluid dynamics has been also given, and the development of software tools in this area is introduced. Finally, a conclusion is made, and the future development and the new challenge on both hardware and software in this subject are discussed.
    2010,21(3):427-437, DOI:
    [Abstract] (32307) [HTML] (0) [PDF 308.76 K] (36695)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2013,24(11):2476-2497, DOI:10.3724/SP.J.1001.2013.04486
    [Abstract] (9765) [HTML] (0) [PDF 1.14 M] (32687)
    Abstract:
    Probabilistic graphical models are powerful tools for compactly representing complex probability distributions, efficiently computing (approximate) marginal and conditional distributions, and conveniently learning parameters and hyperparameters in probabilistic models. As a result, they have been widely used in applications that require some sort of automated probabilistic reasoning, such as computer vision and natural language processing, as a formal approach to deal with uncertainty. This paper surveys the basic concepts and key results of representation, inference and learning in probabilistic graphical models, and demonstrates their uses in two important probabilistic models. It also reviews some recent advances in speeding up classic approximate inference algorithms, followed by a discussion of promising research directions.
    2014,25(9):1889-1908, DOI:10.13328/j.cnki.jos.004674
    [Abstract] (11294) [HTML] (1558) [PDF 550.98 K] (32380)
    Abstract:
    This paper first introduces the key features of big data in different processing modes and their typical application scenarios, as well as corresponding representative processing systems. It then summarizes three development trends of big data processing systems. Next, the paper gives a brief survey on system supported analytic technologies and applications (including deep learning, knowledge computing, social computing, and visualization), and summarizes the key roles of individual technologies in big data analysis and understanding. Finally, the paper lays out three grand challenges of big data processing and analysis, i.e., data complexity, computation complexity, and system complexity. Potential ways for dealing with each complexity are also discussed.
    2012,23(4):962-986, DOI:10.3724/SP.J.1001.2012.04175
    [Abstract] (18259) [HTML] (0) [PDF 2.09 M] (29275)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):1-20, DOI:10.3724/SP.J.1001.2012.04100
    [Abstract] (14003) [HTML] (0) [PDF 1017.73 K] (28947)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2012,23(1):32-45, DOI:10.3724/SP.J.1001.2012.04091
    [Abstract] (18207) [HTML] (0) [PDF 408.86 K] (28838)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2016,27(1):45-71, DOI:10.13328/j.cnki.jos.004914
    [Abstract] (28450) [HTML] (1439) [PDF 880.96 K] (28397)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2005,16(5):857-868, DOI:
    [Abstract] (19522) [HTML] (0) [PDF 489.65 K] (28350)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2018,29(5):1471-1514, DOI:10.13328/j.cnki.jos.005519
    [Abstract] (5214) [HTML] (1789) [PDF 4.38 M] (27900)
    Abstract:
    Computer aided detection/diagnosis (CAD) can improve the accuracy of diagnosis,reduce false positive,and provide decision supports for doctors.The main purpose of this paper is to analyze the latest development of computer aided diagnosis tools.Focusing on the top four fatal cancer's incidence positions,major recent publications on CAD applications in different medical imaging areas are reviewed in this survey according to different imaging techniques and diseases.Further more,multidimentional analysis is made on the researches from image data sets,algorithms and evaluation methods.Finally,existing problems,research trend and development direction in the field of medical image CAD system are discussed.
    2011,22(1):115-131, DOI:10.3724/SP.J.1001.2011.03950
    [Abstract] (13311) [HTML] (0) [PDF 845.91 K] (26535)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2013,24(1):77-90, DOI:10.3724/SP.J.1001.2013.04339
    [Abstract] (10962) [HTML] (0) [PDF 0.00 Byte] (25157)
    Abstract:
    Task parallel programming model is a widely used parallel programming model on multi-core platforms. With the intention of simplifying parallel programming and improving the utilization of multiple cores, this paper provides an introduction to the essential programming interfaces and the supporting mechanism used in task parallel programming models and discusses issues and the latest achievements from three perspectives: Parallelism expression, data management and task scheduling. In the end, some future trends in this area are discussed.
    2015,26(1):62-81, DOI:10.13328/j.cnki.jos.004701
    [Abstract] (15329) [HTML] (1274) [PDF 1.04 M] (23660)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2017,28(4):959-992, DOI:10.13328/j.cnki.jos.005143
    [Abstract] (8599) [HTML] (1758) [PDF 3.58 M] (21791)
    Abstract:
    The development of mobile internet and the popularity of mobile terminals produce massive trajectory data of moving objects under the era of big data. Trajectory data has spatio-temporal characteristics and rich information. Trajectory data processing techniques can be used to mine the patterns of human activities and behaviors, the moving patterns of vehicles in the city and the changes of atmospheric environment. However, trajectory data also can be exploited to disclose moving objects' privacy information (e.g., behaviors, hobbies and social relationships). Accordingly, attackers can easily access moving objects' privacy information by digging into their trajectory data such as activities and check-in locations. In another front of research, quantum computation presents an important theoretical direction to mine big data due to its scalable and powerful storage and computing capacity. Applying quantum computing approaches to handle trajectory big data could make some complex problem solvable and achieve higher efficiency. This paper reviews the key technologies of processing trajectory data. First the concept and characteristics of trajectory data is introduced, and the pre-processing methods, including noise filtering and data compression, are summarized. Then, the trajectory indexing and querying techniques, and the current achievements of mining trajectory data, such as pattern mining and trajectory classification, are reviewed. Next, an overview of the basic theories and characteristics of privacy preserving with respect to trajectory data is provided. The supporting techniques of trajectory big data mining, such as processing framework and data visualization, are presented in detail. Some possible ways of applying quantum computation into trajectory data processing, as well as the implementation of some core trajectory mining algorithms by quantum computation are also described. Finally, the challenges of trajectory data processing and promising future research directions are discussed.
    2021,32(2):349-369, DOI:10.13328/j.cnki.jos.006138
    [Abstract] (6386) [HTML] (2720) [PDF 2.36 M] (21285)
    Abstract:
    Few-shot learning is defined as learning models to solve problems from small samples. In recent years, under the trend of training model with big data, machine learning and deep learning have achieved success in many fields. However, in many application scenarios in the real world, there is not a large amount of data or labeled data for model training, and labeling a large number of unlabeled samples will cost a lot of manpower. Therefore, how to use a small number of samples for learning has become a problem that needs to be paid attention to at present. This paper systematically combs the current approaches of few-shot learning. It introduces each kind of corresponding model from the three categories: fine-tune based, data augmentation based, and transfer learning based. Then, the data augmentation based approaches are subdivided into unlabeled data based, data generation based, and feature augmentation based approaches. The transfer learning based approaches are subdivided into metric learning based, meta-learning based, and graph neural network based methods. In the following, the paper summarizes the few-shot datasets and the results in the experiments of the aforementioned models. Next, the paper summarizes the current situation and challenges in few-shot learning. Finally, the future technological development of few-shot learning is prospected.
    2011,22(6):1299-1315, DOI:10.3724/SP.J.1001.2011.03993
    [Abstract] (10459) [HTML] (0) [PDF 987.90 K] (20854)
    Abstract:
    Attribute-Based encryption (ABE) scheme takes attributes as the public key and associates the ciphertext and user’s secret key with attributes, so that it can support expressive access control policies. This dramatically reduces the cost of network bandwidth and sending node’s operation in fine-grained access control of data sharing. Therefore, ABE has a broad prospect of application in the area of fine-grained access control. After analyzing the basic ABE system and its two variants, Key-Policy ABE (KP-ABE) and Ciphertext-Policy ABE (CP-ABE), this study elaborates the research problems relating to ABE systems, including access structure design for CP-ABE, attribute key revocation, key abuse and multi-authorities ABE with an extensive comparison of their functionality and performance. Finally, this study discusses the need-to-be solved problems and main research directions in ABE.
    2009,20(1):124-137, DOI:
    [Abstract] (16529) [HTML] (0) [PDF 1.06 M] (20616)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2009,20(3):524-545, DOI:
    [Abstract] (17089) [HTML] (0) [PDF 1.09 M] (20599)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2006,17(9):1848-1859, DOI:
    [Abstract] (11902) [HTML] (0) [PDF 770.40 K] (19597)
    Abstract:
    In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
    2004,15(11):1583-1594, DOI:
    [Abstract] (8302) [HTML] (0) [PDF 1.57 M] (19370)
    Abstract:
    Uncertainty exists widely in the subjective and objective world. In all kinds of uncertainty, randomness and fuzziness are the most important and fundamental. In this paper, the relationship between randomness and fuzziness is discussed. Uncertain states and their changes can be measured by entropy and hyper-entropy respectively. Taken advantage of entropy and hyper-entropy, the uncertainty of chaos, fractal and complex networks by their various evolution and differentiation are further studied. A simple and effective way is proposed to simulate the uncertainty by means of knowledge representation which provides a basis for the automation of both logic and image thinking with uncertainty. The AI (artificial intelligence) with uncertainty is a new cross-discipline, which covers computer science, physics, mathematics, brain science, psychology, cognitive science, biology and philosophy, and results in the automation of representation, process and thinking for uncertain information and knowledge.
    2005,16(1):1-7, DOI:
    [Abstract] (21620) [HTML] (0) [PDF 614.61 K] (19115)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2012,23(8):2058-2072, DOI:10.3724/SP.J.1001.2012.04237
    [Abstract] (9752) [HTML] (0) [PDF 800.05 K] (18959)
    Abstract:
    The Distributed denial of service (DDoS) attack is a major threat to the current network. Based on the attack packet level, the study divides DDoS attacks into network-level DDoS attacks and application-level DDoS attacks. Next, the study analyzes the detection and control methods of these two kinds of DDoS attacks in detail, and it also analyzes the drawbacks of different control methods implemented in different network positions. Finally, the study analyzes the drawbacks of the current detection and control methods, the development trend of the DDoS filter system, and corresponding technological challenges are also proposed.
    2014,25(1):37-50, DOI:10.13328/j.cnki.jos.004497
    [Abstract] (9291) [HTML] (1206) [PDF 929.87 K] (18681)
    Abstract:
    This paper surveys the state of the art of speech emotion recognition (SER), and presents an outlook on the trend of future SER technology. First, the survey summarizes and analyzes SER in detail from five perspectives, including emotion representation models, representative emotional speech corpora, emotion-related acoustic features extraction, SER methods and applications. Then, based on the survey, the challenges faced by current SER research are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, and presents detailed comparison and analysis between these methods.
    2010,21(7):1620-1634, DOI:
    [Abstract] (12250) [HTML] (0) [PDF 765.23 K] (18648)
    Abstract:
    As an application of mobile ad hoc networks (MANET) on Intelligent Transportation Information System, the most important goal of vehicular ad hoc networks (VANET) is to reduce the high number of accidents and fatal consequences dramatically. One of the most important factors that would contribute to the realization of this goal is the design of effective broadcast protocols. This paper introduces the characteristics and application fields of VANET briefly. Then, it discusses the characteristics, performance, and application areas with analysis and comparison of various categories of broadcast protocols in VANET. According to the characteristic of VANET and its application requirement, the paper proposes the ideas and breakthrough direction of information broadcast model design of inter-vehicle communication.
    2003,14(9):1621-1628, DOI:
    [Abstract] (12848) [HTML] (0) [PDF 680.35 K] (18212)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2005,16(10):1743-1756, DOI:
    [Abstract] (9676) [HTML] (0) [PDF 545.62 K] (18091)
    Abstract:
    This paper presents a survey on the theory of provable security and its applications to the design and analysis of security protocols. It clarifies what the provable security is, explains some basic notions involved in the theory of provable security and illustrates the basic idea of random oracle model. It also reviews the development and advances of provably secure public-key encryption and digital signature schemes, in the random oracle model or the standard model, as well as the applications of provable security to the design and analysis of session-key distribution protocols and their advances.
    2018,29(10):2966-2994, DOI:10.13328/j.cnki.jos.005551
    [Abstract] (8310) [HTML] (2460) [PDF 610.06 K] (17911)
    Abstract:
    In recent years, the rapid development of Internet technology and Web applications has triggered the explosion of various data on the Internet, which generates a large amount of valuable knowledge. How to organize, represent and analyze these knowledge has attracted much attention. Knowledge graph was thus developed to organize these knowledge in a semantical and visualized manner. Knowledge reasoning over knowledge graph then becomes one of the hot research topics and plays an important role in many applications such as vertical search and intelligent question-answer. The goal of knowledge reasoning over knowledge graph is to infer new facts or identify erroneous facts according to existing ones. Unlike traditional knowledge reasoning, knowledge reasoning over knowledge graph is more diversified, due to the simplicity, intuitiveness, flexibility, and richness of knowledge representation in knowledge graph. Starting with the basic concept of knowledge reasoning, this paper presents a survey on the recently developed methods for knowledge reasoning over knowledge graph. Specifically, the research progress is reviewed in detail from two aspects:One-Step reasoning and multi-step reasoning, each including rule based reasoning, distributed embedding based reasoning, neural network based reasoning and hybrid reasoning. Finally, future research directions and outlook of knowledge reasoning over knowledge graph are discussed.
    2013,24(5):1078-1097, DOI:10.3724/SP.J.1001.2013.04390
    [Abstract] (11465) [HTML] (0) [PDF 1.74 M] (17830)
    Abstract:
    The control and data planes are decoupled in software-defined networking, which provide a new solution for research on new network applications and future Internet technologies. The development status of OpenFlow-based SDN technologies is surveyed in this paper. The research background of decoupled architecture of network control and data transmission in OpenFlow network is summarized first, and the key components and research progress including OpenFlow switch, controller, and SDN technologies are introduced. Moreover, current problems and solutions of OpenFlow-based SDN technologies are analyzed in four aspects. Combined with the development status in recent years, the applications used in campus, data center, network management and network security are summarized. Finally, future research trends are discussed.
    2013,24(2):295-316, DOI:10.3724/SP.J.1001.2013.04336
    [Abstract] (9656) [HTML] (0) [PDF 0.00 Byte] (17779)
    Abstract:
    Under the new application mode, the traditional hierarchy data centers face several limitations in size, bandwidth, scalability, and cost. In order to meet the needs of new applications, data center network should fulfill the requirements with low-cost, such as high scalability, low configuration overhead, robustness and energy-saving. First, the shortcomings of the traditional data center network architecture are summarized, and new requirements are pointed out. Secondly, the existing proposals are divided into two categories, i.e. server-centric and network-centric. Then, several representative architectures of these two categories are overviewed and compared in detail. Finally, the future directions of data center network are discussed.
    2014,25(4):839-862, DOI:10.13328/j.cnki.jos.004558
    [Abstract] (15049) [HTML] (1386) [PDF 1.32 M] (17709)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2020,31(7):2245-2282, DOI:10.13328/j.cnki.jos.006037
    [Abstract] (2522) [HTML] (1641) [PDF 967.02 K] (17160)
    Abstract:
    Ultrasonography is the first choice of imaging examination and preoperative evaluation for thyroid and breast cancer. However, ultrasonic characteristics of benign and malignant nodules are commonly overlapped. The diagnosis heavily relies on operator's experience other than quantitative and stable methods. In recent years, medical imaging analysis based on computer technology has developed rapidly, and a series of landmark breakthroughs have been made, which provides effective decision supports for medical imaging diagnosis. In this work, the research progress of computer vision and image recognition technologies in thyroid and breast ultrasound images is studied. A series of key technologies involved in automatic diagnosis of ultrasound images is the main lines of the work. The major algorithms in recent years are summarized and analyzed, such as ultrasound image preprocessing, lesion localization and segmentation, feature extraction and classification. Moreover, multi-dimensional analysis is made on the algorithms, data sets, and evaluation methods. Finally, existing problems related to automatic analysis of those two kinds of ultrasound imaging are discussed, research trend and development direction in the field of ultrasound images analysis are discussed.
    2010,21(7):1605-1619, DOI:
    [Abstract] (9677) [HTML] (0) [PDF 856.25 K] (17082)
    Abstract:
    The rapid development of Internet leads to an increase in system complexity and uncertainty. Traditional network management can not meet the requirement, and it shall evolve to fusion based Cyberspace Situational Awareness (CSA). Based on the analysis of function shortage and development requirement, this paper introduces CSA as well as its origin, conception, objective and characteristics. Firstly, a CSA research framework is proposed and the research history is investigated, based on which the main aspects and the existing issues of the research are analyzed. Meanwhile, assessment methods are divided into three categories: Mathematics model, knowledge reasoning and pattern recognition. Then, this paper discusses CSA from three aspects: Model, knowledge representation and assessment methods, and then goes into detail about main idea, assessment process, merits and shortcomings of novel methods. Many typical methods are compared. The current application research of CSA in the fields of security, transmission, survivable, system evaluation and so on is presented. Finally, this paper points the development directions of CSA and offers the conclusions from issue system, technical system and application system.
    2009,20(6):1393-1405, DOI:
    [Abstract] (11698) [HTML] (0) [PDF 831.86 K] (17069)
    Abstract:
    Combinatorial testing can use a small number of test cases to test systems while preserving fault detection ability. However, the complexity of test case generation problem for combinatorial testing is NP-complete. The efficiency and complexity of this testing method have attracted many researchers from the area of combinatorics and software engineering. This paper summarizes the research works on this topic in recent years. They include: various combinatorial test criteria, the relations between the test generation problem and other NP-complete problems, the mathematical methods for constructing test cases, the computer search techniques for test generation and fault localization techniques based on combinatorial testing.
    2008,19(11):2803-2813, DOI:
    [Abstract] (8938) [HTML] (0) [PDF 319.20 K] (16845)
    Abstract:
    A semi-supervised clustering method based on affinity propagation (AP) algorithm is proposed in this paper. AP takes as input measures of similarity between pairs of data points. AP is an efficient and fast clustering algorithm for large dataset compared with the existing clustering algorithms, such as K-center clustering. But for the datasets with complex cluster structures, it cannot produce good clustering results. It can improve the clustering performance of AP by using the priori known labeled data or pairwise constraints to adjust the similarity matrix. Experimental results show that such method indeed reaches its goal for complex datasets, and this method outperforms the comparative methods when there are a large number of pairwise constraints.
    2009,20(8):2241-2254, DOI:
    [Abstract] (6499) [HTML] (0) [PDF 1.99 M] (16803)
    Abstract:
    Inspired from the idea of data fields, a community discovery algorithm based on topological potential is proposed. The basic idea is that a topological potential function is introduced to analytically model the virtual interaction among all nodes in a network and, by regarding each community as a local high potential area, the community structure in the network can be uncovered by detecting all local high potential areas margined by low potential nodes. The experiments on some real-world networks show that the algorithm requires no input parameters and can discover the intrinsic or even overlapping community structure in networks. The time complexity of the algorithm is O(m+n3/γ)~O(n2), where n is the number of nodes to be explored, m is the number of edges, and 2<γ<3 is a constant.
    2009,20(3):567-582, DOI:
    [Abstract] (8070) [HTML] (0) [PDF 780.38 K] (16345)
    Abstract:
    The research on the software quality model and software quality evaluation model has always been a hot topic in the area of software quality assurance and assessment. A great amount of domestic and foreignresearches have been done in building software quality model and quality assessment model, and so far certainaccomplishments have been achieved in these areas. In recent years, platform building and systematization havebecome the trends of developing basic softwares based on operating systems. Therefore, the quality evaluation ofthe foundational software platform becomes an essential issue to be solved. This article analyzes and concludes thecurrent development of researches on software quality model and software quality assessment model focusing onsummarizing and depicting the developing process of quality evaluation of foundational software platform. It alsodiscusses the future development of researches on quality assessment of foundational software platform in brief, trying to establish a good foundation for it.
    2009,20(8):2199-2213, DOI:
    [Abstract] (10117) [HTML] (0) [PDF 2.05 M] (16310)
    Abstract:
    This paper analyzes the previous study of applying P2P technology in mobile Internet. It first introduces the P2P technology and the conception of mobile Internet, and presents the challenges and service pattern of P2P technology in mobile Internet. Second, the architectures of P2P technology in mobile Internet are described in terms of centralized architecture, super node architecture and ad hoc architecture, respectively. Further more, the resource location algorisms and cross-layer optimizations are introduced based on two different terminal access patterns. Detailed analyses of different key technologies are presented and the disadvantages are pointed out. At last, this paper outlines future research directions.
    2010,21(5):916-929, DOI:
    [Abstract] (11997) [HTML] (0) [PDF 944.50 K] (16188)
    Abstract:
    Data deduplication technologies can be divided into two categories: a) identical data detection techniques, and b) similar data detection and encoding techniques. This paper presents a systematic survey on these two categories of data deduplication technologies and analyzes their advantages and disadvantages. Besides, since data deduplication technologies can affect the reliability and performance of storage systems, this paper also surveys various kinds of technologies proposed to cope with these two aspects of problems. Based on the analysis of the current state of research on data deduplication technologies, this paper makes several conclusions as follows: a) How to mine data characteristic information in data deduplication has not been completely solved, and how to use data characteristic information to effectively eliminate duplicate data also needs further study; b) From the perspective of storage system design, it still needs further study how to introduce proper mechanisms to overcome the reliability limitations of data deduplication techniques and reduce the additional system overheads caused by data deduplication techniques.
    2017,28(1):160-183, DOI:10.13328/j.cnki.jos.005136
    [Abstract] (8386) [HTML] (2202) [PDF 3.12 M] (16093)
    Abstract:
    Image segmentation is the process of dividing the image into a number of regions with similar properties, and it's the preprocessing step for many image processing tasks. In recent years, domestic and foreign scholars mainly focus on the content-based image segmentation algorithms. Based on extensive research on the existing literatures and the latest achievements, this paper categorizes image segmentation algorithms into three types:graph theory based method, pixel clustering based method and semantic segmentation method. The basic ideas, advantage and disadvantage of typical algorithms belong to each category, especially the most recent image semantic segmentation algorithms based on deep neural network are analyzed, compared and summarized. Furthermore, the paper introduces the datasets which are commonly used as benchmark in image segmentation and evaluation criteria for algorithms, and compares several image segmentation algorithms with experiments as well. Finally, some potential future research work is discussed.
    2013,24(4):825-842, DOI:10.3724/SP.J.1001.2013.04369
    [Abstract] (7982) [HTML] (0) [PDF 1.09 M] (16046)
    Abstract:
    Honeypot is a proactive defense technology, introduced by the defense side to change the asymmetric situation of a network attack and defensive game. Through the deployment of the honeypots, i.e. security resources without any production purpose, the defenders can deceive attackers to illegally take advantage of the honeypots and capture and analyze the attack behaviors to understand the attack tools and methods, and to learn the intentions and motivations. Honeypot technology has won the sustained attention of the security community to make considerable progress and get wide application, and has become one of the main technical means of the Internet security threat monitoring and analysis. In this paper, the origin and evolution process of the honeypot technology are presented first. Next, the key mechanisms of honeypot technology are comprehensively analyzed, the development process of the honeypot deployment structure is also reviewed, and the latest applications of honeypot technology in the directions of Internet security threat monitoring, analysis and prevention are summarized. Finally, the problems of honeypot technology, development trends and further research directions are discussed.
    2012,23(5):1148-1166, DOI:10.3724/SP.J.1001.2012.04195
    [Abstract] (13966) [HTML] (0) [PDF 946.37 K] (16035)
    Abstract:
    With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here, the features, influence and related products of cloud databases are first discussed. Then, research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.
    2016,27(3):691-713, DOI:10.13328/j.cnki.jos.004948
    [Abstract] (8959) [HTML] (1046) [PDF 2.43 M] (15981)
    Abstract:
    Learning to rank(L2R) techniques try to solve sorting problems using machine learning methods, and have been well studied and widely used in various fields such as information retrieval, text mining, personalized recommendation, and biomedicine.The main task of L2R based recommendation algorithms is integrating L2R techniques into recommendation algorithms, and studying how to organize a large number of users and features of items, build more suitable user models according to user preferences requirements, and improve the performance and user satisfaction of recommendation algorithms.This paper surveys L2R based recommendation algorithms in recent years, summarizes the problem definition, compares key technologies and analyzes evaluation metrics and their applications.In addition, the paper discusses the future development trend of L2R based recommendation algorithms.