• Current Issue
  • Online First
  • Archive
  • Click Rank
  • Most Downloaded
  • 综述文章
  • 专刊文章
  • 分辑系列
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2022,33(5):1529-1550, DOI: 10.13328/j.cnki.jos.006547
    [Abstract] (360) [HTML] (15) [PDF 9.96 M] (544)
    Abstract:
    Industrial computational fluid dynamics (CFD) software is a kind of computer-aided engineering (CAE) software, which has a wide range of applications in aeronautics and astronautics and other fields. Its development process strongly relies on fluid mechanics, mathematics, computers, and other disciplines knowledge models, involving a large number of professional and fundamental scientific researches, such as theoretical derivations, physical model establishment, algorithm optimization, verification and validation, leading to a very complex composition of the software system and huge research and development difficulty. By introducing software engineering methods and practices, software development can be effectively organized and managed to shorten development and to improve the quality of the software. This study briefly analyzes the characteristics and new trends of industrial CFD software. Base on this, a combination of incremental and iterative developing model suitable for industrial CFD software is proposed. An automated continuous integration platform for CFD simulation software is developed. Suggestions for industrial CFD software design are given from the aspects of software interaction, encapsulation and efficiency, functional scalability, and high-performance cluster environment deployment. Targeted verification and validation methods suitable for scientific computing software are established. Finally, a demonstration case of domestic independent CFD software is illustrated, with a view to providing references for related researchers and practitioners.
    2022,33(5):1551-1568, DOI: 10.13328/j.cnki.jos.006548
    [Abstract] (347) [HTML] (18) [PDF 6.41 M] (435)
    Abstract:
    Most of the existing code smell detection approaches rely on code structure information and heuristic rules, while pay little attention to the semantic information embedded in different levels of code, and the accuracy of code smell detection approaches is not high. To solve this problem, this study proposes a novel approach DeepSmell based on a pre-trained model and multi-level metrics. Firstly, the static analysis tool is used to extract code smell instances and multi-level code metric information in the source program and mark these instances. Secondly, the level information that relate to code smells in the source code are parsed and obtained through the abstract syntax tree. The textual information composed of the level information is combined with code metric information to generate the data set. Finally, text information is converted into word vectors using the BERT pre-training model. The GRU-LSTM model is applied to obtain the potential semantic relationship among the identifiers, and the CNN model is combined with attention mechanism to code smell detection. The experiment tested four kinds of code smells including feature envy, long method, data class, and god class on 24 open source programs such as JUnit, Xalan, and SPECjbb2005. The results show that DeepSmell improves the average recall and F1 by 9.3% and 10.44% respectively compared with existing detection methods, and maintains a high level of precision at the same time.
    2022,33(5):1569-1586, DOI: 10.13328/j.cnki.jos.006549
    [Abstract] (454) [HTML] (12) [PDF 9.95 M] (378)
    Abstract:
    With the maturity of deep learning technology, intelligent speech recognition software has been widely used. Various deep neural networks in the intelligent software play a crucial role. Recent studies have shown that minor disturbances in adversarial examples significantly threaten the security and robustness of deep neural networks. Researchers usually take the generated adversarial examples as the test cases and input them into the intelligent speech recognition software to test whether the adversarial examples will make the software misjudge. And then defense methods are adopted to improve the security and robustness of intelligent software. For the adversarial example generation, black box intelligent speech software is more common in life and has practical research value. However, the existing generation methods have some limitations. Therefore, this study proposes a target adversarial example generation method for the black box speech software based on the firefly algorithm and gradient evaluation method, namely the firefly-gradient adversarial example generation method. With the set target text, disturbances are added to the original speech example. The firefly algorithm or gradient evaluation method is chosen to optimize the adversarial example according to the edit distance between the text of the current generated adversarial example and the target text so that the target adversarial example is generated finally. To verify the effectiveness of the method, this study conducts an experimental evaluation on common speech recognition software, using three different types of speech datasets: Common Speech dataset, Google Command dataset and LibriSpeech dataset, and looks for volunteers to evaluate the generated adversarial examples. Experimental results show that the proposed method can effectively improve the success rate of target adversarial example generation. For example, for the DeepSpeech speech recognition software, the success rate of generating adversarial examples on Common Speech datasets is 13% higher than that of the compared method.
    2022,33(5):1587-1611, DOI: 10.13328/j.cnki.jos.006550
    Abstract:
    With the rise of blockchain technology, more and more researchers and companies pay attention to the security of smart contracts. Currently, there are some studies on smart contract defect detection and testing techniques. Software defect prediction technology is an effective supplement to the defect detection techniques, which can optimize the allocation of testing resources and improve the efficiency of software testing. However, there is no research on software defect prediction for the smart contract. To address this problem, this study proposes a defect prediction method for Solidity smart contracts. First, it designs a metrics suite (smart contract-Solidity, SC-Sol) which considers the variables, functions, structures, and features of Solidity smart contracts, and SC-Sol is combined with the traditional metrics suite (code complexity and features of object-oriented program, COOP), which consider the object-oriented features, into COOP-SC-Sol metrics suite. Then, it extracts relevant metric meta-information from the Solidity code and performs defect detection to obtain the defects information to construct a Solidity smart contracts defect data set. On this basis, seven regression models and six classification models are applied to predict the defects of Solidity smart contracts to verify the performance differences of different metrics suites and different models for predicting the number and tendency of defects. Experimental results show that compared with the COOP, COOP-SC-Sol can improve the performance of the defect prediction model by 8% in terms of the F1-score. In addition, the problem of class imbalance in smart contract defect prediction is further studied. The result shows that the random under-sampling method can improve the performance of the defect prediction model by 9% in F1-score. In predicting the tendency of specific types of defects, the performance of the model is affected by the imbalance of data sets. Better performance is achieved in predicting the types of defects which the percentage of defect modules is greater than 10%.
    2022,33(5):1612-1634, DOI: 10.13328/j.cnki.jos.006551
    [Abstract] (246) [HTML] (10) [PDF 7.02 M] (434)
    Abstract:
    GUI event-based record and replay technologies for Android apps aim at automatically capturing and playing back the UI interactions between users and apps. Record and replay are challenging because it involves a cross-understanding of three different program semantics: application difference, version evolution, and device compatibility. This study models record and replay as a search problem, and analyzes this problem from a human perspective. Accordingly, this study proposes a general framework to demonstrate the key points in record and replay: the widget representation and recording technologies, the event semantic equivalence strategies, and the local search strategies. By summarizing and analyzing existing technologies from a new perspective that is suitable for the framework, this study has a better understanding of the advantages and disadvantages of existing technologies and proposes feasible future research directions.
    2022,33(5):1635-1651, DOI: 10.13328/j.cnki.jos.006553
    [Abstract] (289) [HTML] (12) [PDF 9.81 M] (572)
    Abstract:
    Existing developer recommendation algorithms extract explicit features of tasks and developers by mining the explicit information of tasks and developers, so as to recommend developers to specific tasks. However, since the description information in the explicit information is subjective and often imprecise, the performance of existing developer recommendation algorithms based on explicit features is not ideal. The crowdsourcing software development platforms not only have a lot of imprecise description information, but also contain objective and more accurate “task-developer” score information, which can effectively infer implicit features of tasks and developers. Considering that implicit features are supplements to explicit features, which will effectively alleviate the problem of imprecise description information, this study proposes a developer hybrid recommendation algorithm that combines explicit features and implicit features. First, the explicit features are fully extracted from the visible information of tasks and the developers on the platform, and the explicit features-oriented factorization machine (FM) recommendation model is proposed to learn the relationship between explicit features of tasks and developers and the corresponding ratings. Then, implicit features are inferred with the "task-developer" rating matrix, and the implicit features-oriented matrix factorization (MF) recommendation model is proposed. Finally, a multi-layer perceptron fusion algorithm is proposed to fuse the explicit features-oriented FM recommendation model and implicit features-oriented MF recommendation model. Further, for the cold-start problem, first, based on historical data, a multi-layer perceptron model is utilized to learn the mapping relationship between explicit features and implicit features. Then, for the cold-start tasks or the cold-start developers, the implicit features are obtained through their explicit features. Finally, the ratings are predicted based on the trained multi-layer perceptron fusion algorithm. The simulation experiment on the Topcoder software crowdsourcing platform shows that the proposed algorithm outperforms the comparison algorithms significantly in terms of four different evaluation metrics.
    2022,33(5):1652-1673, DOI: 10.13328/j.cnki.jos.006554
    Abstract:
    While the function and complexity of modern civil aircraft airborne software are growing rapidly, those safety standards for airborne software (such as DO-178B/C, etc.) must be satisfied at the same time. It raises more challenge to analyze and verify the consistency and integrity of airborne software requirements on the early stage of system development. This study introduces a formal modeling and analysis tool platform (avionics requirement tools, ART) for airborne software natural language requirements, and carries out a case study of the requirements of cockpit display and control software subsystem (EICAS). Firstly, the semantics of a formal variable relationship model (VRM) is given, also the platform architecture and tool chain of ART are descripted. Then, a methodology of formal analysis of requirement consistency and integrity based on multi-paradigm is given. After that, some details of the case study of EICAS are shown including: how to make a pre-modeling process of initial natural language requirements and the automatic analysis process of requirement model, such as the preprocessing and standardization of original requirement items, automatic generation of VRM models and multi-paradigm based formal analysis, etc. Finally, some experiences of this case study are drawn.
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006286
    Abstract:
    Solving the minimal attribute reduction (MAR) in Rough set theory (RST) is an NP-hard combinatorial optimization problem. Ant Colony Optimization algorithm (ACO) is a globally heuristic optimization algorithm in evolutionary algorithms, so combining RST with ACO is an effective and feasible way to solve attribute reduction. The ACO algorithm often fall into local optimal solution, and the convergence speed is slow. This paper first uses an improved information gain rate as heuristic information, and then deduction test is performed on each selected attribute and the optimal reduction set of each generation. And the mechanism of calculating probability in advance is proposed to avoid repeated calculation of information on the same path in the search process for each ant. But the algorithm can only handle small-scale data sets. The ACO algorithm has good parallelism and the equivalent classes in rough set theory can be calculated by Cloud computing. In this paper, we propose a parallel attribute reduction algorithm based on ACO and Cloud Computing to solve massive data sets and further investigate a multi-objective parallel solution scheme, which can simultaneously calculate the importance of the remaining attributes relative to the current attribute or reduction set. Experiments show that the algorithm can obtain the MAR in the case of processing big data, and complexity of time on calculating the importance of attribute decreases from O(n2) to O(|n|).
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006423
    Abstract:
    Compared with witness encryption, offline witness encryption is more extensive in the practical applications because of its high-efficiency by transferring the hard computation work to a setup phase. However, most of the current offline witness encryption schemes only satisfy the selective security, that is, the adversary must commit a pair of challenge messages(m0,m1) and an instance x before obtaining the public parameters. Chvojka et al. proposed an offline witness encryption construction that achieves semi-adaptive security by introducing the puncturable encryption. The semi-adaptive security permits the adversary to choose challenge messages adaptively. However, the instance x of the considered NP language that is used to create the challenge ciphertext must be fixed before the adversary gets the public parameters (ppe,ppd). Therefore, they leave it as an open problem to construct offline witness encryption schemes with fully adaptive security.
    In this paper, we firstly propose an offline witness encryption scheme that achieves the fully adaptive security. The setup algorithm outputs public parameters (ppe,ppd), whereppeused as encryption key contains two public keys, a common reference, a commitment, and the decryption keyppdis an obfuscated circuit. This algorithm needs to be only run once, and the parameters can be used for arbitrary many encryptions. The encryption algorithm outputs a Naor-Yung’s ciphertext by using key encapsulation mechanism and non-interactive witness indistinguishable proofs system. We have solved the problem of outputting the challenge plaintext in advance during the proving process of selective security by selecting the encapsulation key in advance. In addition, our scheme can also be turned into a functional offline witness encryption scheme directly to realize the reuse of the decryption key for the function f by embedding f into the decryption key in the key generation phase.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006535
    Abstract:
    Heterogeneous information networks can be used for modeling several applications in the real world. Their representation learning has received extensive attention from scholars. Most of the representation learning methods extract structural and semantic information based on meta-paths and their effectiveness in network analysis have been proved. However, these methods ignore the node internal information and different degrees of importance of meta-path instances. Besides, they can capture only the local node information. Thus, this paper proposes a heterogeneous network representation learning method fusing mutual information and multiple meta-paths. First, a meta-path internal encoding method called relational rotation encoding is used, which captures the structural and semantic information of the heterogeneous information network according to adjacent nodes and meta-path context nodes. It uses an attention mechanism to model the importance of each meta-path instance. Then, an unsupervised heterogeneous network representation learning method fusing mutual information maximization and multiple meta-paths is proposed and mutual information can capture both global and local information. Finally, experiments are conducted on two real datasets. Compared with the current mainstream algorithms as well as some semi-supervised algorithms, the results show that the proposed method has better performance on node classification and clustering.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006615
    Abstract:
    Asynchronous programs use asynchronous non-blocking calls to achieve program concurrency, are widely used in parallel and distributed systems. The complexity of verifying asynchronous programs is very high, no matter safety or liveness . This paper proposes a program model of asynchronous programs, and defines two problems on asynchronous programs:the-equivlence problem and reachability problem. By reducing the 3-CNF-SAT to these two problems, and then reducing them to the reachablity problem of communication-free Petri net, we prove that the two problems are both NP-complete. The case shows that these two problems can solve a series of program verification problems on asynchronous programs.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006616
    Abstract:
    The reliable functioning of safety-critical IT systems depends heavily on the correct execution of program code. Deductive program verification can be performed to provide a high level of correctness guarantees for computer programs. There is a plethora of different programming languages in use, and new languages oriented for high reliability scenarios are still being invented. It can be difficult to devise for each such language a full-fledged logical system supporting the verification of programs, and to prove the soundness and completeness of the logical system with respect to the formal semantics of the language. Language-independent verification techniques offer sound verification procedures parameterized over the formal semantics of programming languages. The specialization of the verification procedure with the formal semantics of a concrete programming language directly gives rise to a verification procedure for the language. In this article, we propose a language-independent verification technique based on big-step operational semantics. The technique features a unified procedure for the sound reasoning about program structures that potentially causes unbounded behavior, such as iteration and recursion. In particular, we employ a functional formalization of big-step semantics to support the explicit representation of the computation performed by the sub-structures of a program. This representation enables the exploitation of the auxiliary information provided for these sub-structures in the unified reasoning process. We prove the soundness and relative completeness of the proposed technique, evaluate the technique using verification examples in imperative and functional programming languages, and mechanize all the formal results and verification examples in the Coq proof assistant. The development provides a basis for the implementation of a language-independent program verifier in a proof assistant based on big-step operational semantics.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006617
    Abstract:
    With the development of the Internet, we usher in the 5th Generation of mobile communication technology (5G). The 5G Authentication and Key Agreement (5G-AKA) protocol is proposed mainly to achieve two-way authentication between users and service networks. However, recent research suggests that it may be subject to information deciphering and message replay attacks. At the same time, we found that some variants of the current 5G-AKA cannot satisfy the unlinkability. Therefore, in response to the above-mentioned shortcomings, we propose an improvement plan called SM-AKA. SM-AKA is designed two parallel sub-protocols in a novel way. Through clever mode switching, lighter sub-protocols (GUTI submodule) are frequently adopted, and the other sub-protocol (SUPI submodule) is to deal with abnormalities caused by authentication. According to this mechanism, it not only realizes the efficient authentication, but also improves the stability of protocol. The freshness of variables has also been effectively maintained, which can prevent the replay of messages, and strict encryption and decryption methods have further improved the security of the protocol. Finally, we carry out a complete evaluation of SM-AKA. Through formal modeling, attack assumptions and Tamarin derivation, we prove that the scheme can achieve the authentication and privacy goals, and the theoretical analysis part also shows the correctness of the protocol design.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006634
    Abstract:
    Interferences among wireless signals hinder concurrent transmissions such that the throughput of wireless networks decreases. It is well known that link scheduling is an effective way to improve throughput and decrease transmission delay of wireless networks. SINR (Signal to Interference plus Noise Ratio) model accurately describes the inherent characteristics of wireless signal propagation. Therefore, in this paper, an online distributed link scheduling (OLD_LS) algorithm with a constant approximation factor under the SINR model is proposed. Here, online means that nodes can join in and leave from wireless networks. Nodes joining in or leaving from networks arbitrarily reflects the dynamic characteristics of wireless networks. OLD_LS partitions the network region into hexagons and localizes the SINR model, which is a global interference model. A leader election (LE) subroutine in dynamic networks is proposed in this paper. It is shown that if the dynamic rate of nodes is less than 1/ε, LE elects a leader with a high probability and the time complexity is O(logn+logR). Where,ε is a constant and satisfies ε≤5(1-21-α/2)/6, with α being the path loss exponent, n is the number of senders and R is the longest link length. To the best of our knowledge, the algorithm proposed in this paper is the first online distributed link scheduling algorithm for dynamic wireless networks.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006635
    Abstract:
    Network measurement is the basis of researching on network performance monitoring, traffic management, and fault diagnosis. In-band network telemetry has become a hot issue in current network measurement research due to its real-time, accuracy, and scalability. With the emergence and development of Programmable Data Planes, many practical in-band network telemetry solutions have been proposed thanks to its rich information feedback and flexible function deployment. First, we analyze the principles and deployment challenges of typical in-band network telemetry solutions INT and AM-PM. Second, according to the optimization measures and extension of in-band network telemetry, we analyze the characteristics of the optimization mechanism from the aspects of data collection process and multi-task orchestration, and analyze the feasibility of technology extension from the aspects of wireless network, optical network and hybrid network. Third, we compare and analyze the latest applications from the aspects of in-network performance sensing, network-level telemetry system, traffic scheduling and fault diagnosis. Finally, we summarize research work of in-band network telemetry and highlight the future research directions.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006636
    Abstract:
    This paper proposes some new classical key recovery attacks against Feistel, Misty and Type-1/2 Generalized Feistel Scheme. Our new key recovery attacks can be constructed by combining the birthday attack with the periodic property of Simon’s algorithm, which are different from the previous classical attacks. By using Simon quantum algorithm, an adversary can recover the periodic value in polynomial time in the quantum setting. However, we require the birthday bound to recover the candidate value for the periodic value in the classical setting. By combining the periodic property of Simon’s algorithm with birthday attack, our chosen ciphertexts key recovery attack can recover the key of a 5-round Feistel-F in O(23n/4) time and O(2n/4) chosen plaintexts and ciphertexts. The memory complexity of the above attack is O(2n/4). Compared with Isobe’s result, our new result not only increases one round, but also requires lower memory complexity. For the Feistel-FK structure, we can construct a 7-round key recovery attack. In addition, we can apply the above approach to construct some key recovery attacks against Misty schemes and the Type-1/2 Generalized Feistel Scheme. In details, this paper not only proposes the key recovery attacks against the 5-round Misty L-F and Misty R-F, but also shows the key recovery attacks against the 6-round Misty L-KF/FK and Misty R-KF/FK respectively. In addition, this paper constructs a d2-round key recovery attack for the d branches Type-1 Generalized Feistel Scheme. Furthermore, when d≥6 and d is even, we propose a better key recovery attack for the d branches Type-2 Generalized Feistel Scheme than the previous work.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006637
    Abstract:
    Code comment generation has been an important research task in the field of software engineering in the past few years. Some existing work has achieved impressive results on the open source datasets that containing a large number of <code snippet, comment> pairs. However, in the practice of software enterprises, the code to be commented is usually belong to a software project. Different from the code snippets in the open source datasets, the code in a software project has different length and granularity, developers need to know not only how to add comment, but also where to add comments, namely commenting decision. In this paper, we propose CoComment, a software project-oriented code comment generation approach. This approach automatically extracts domain-specific concepts from software documents, then propagates and expands these concepts by code parsing and text matching. On this basis, an automatic code commenting decision method is made by locating code lines or segments related to these concepts, and corresponding natural language comments are generated by fusing concepts and context. We conduct comparative experiments on 3 software projects, containing more than 46,000 manually annotated code comments. The experimental results demonstrate our approach makes code commenting decision accurately and generates more helpful comments compared with existing work, which effectively solve the problem of automatic code comment for software project.
    Available online:  March 24, 2022 , DOI: 10.13328/j.cnki.jos.006638
    Abstract:
    With the rapid development of Internet of things (IoT), cloud computing et al., portable health clinic (PHC) has been realized and widely used in telemedicine. For the significant advantages of 5G communication, China has actively promoted the construction of intelligent medicine and built a multi-functional telemedicine information service platform. The realization of telemedicine is inseparable from the support of the remote data sharing system. At present, the PHC data sharing system uses the network architecture combining the IoT and cloud computing. However, its privacy and security issues are rarely studied. This paper keeps an eye on security and privacy when sharing data in the PHC system. We realize the secure upload of IoT data, normalization of personalized ciphertext, dynamic multi-user fine-grained access control, efficient decryption operations, and formal security verification. This paper first improves the classical proxy re-encryption and attribute-based encryption algorithms. It proposes an IPRE-TO-FAME combined encryption mechanism suitable for the network architecture with IoT and cloud computing. Addressing the challenge of key updates caused by many distributed IoT terminals, this paper uses the idea of proxy re-encryption (PRE) for reference to realize the key update based on the unilateral transformation without changing the IoT’s key. At the same time, as the setting in this paper is different from the conventional algorithm PRE, the re-encryption entity can be regarded as fully trusted. This paper improves the conventional algorithm PRE and implement an efficient IPRE (improved PRE) algorithm. Thirdly, the classic FAME (fast attribute-based message encryption) mechanism is improved to realize dynamic multi-user fine-grained access control. It is convenient for users to use portable intelligent devices to access data anytime and anywhere. Security proof, theoretical analysis, and experimental results show that the scheme proposed in this paper is secure and practical. It is an effective solution to the problem of PHC secure data sharing.
    Available online:  February 22, 2022 , DOI: 10.13328/j.cnki.jos.006609
    Abstract:
    Termination bugs such as deadlocks and infinite loops are common in concurrent file systems due to complex implementations. Existing efforts on file system verification have ignored the termination property. Based on a verified concurrent file system, AtomFS, this paper presents the verification of its termination property, which ensures that every method call will always return under fair scheduling. Proving a method's termination requires to show that when the method is blocked, the source thread of the obstruction will make progress. The core lies in showing the termination of the lock coupling traversal. However, we face two major challenges applying the idea. (1) The file system is in the shape of a tree and allows threads that block others to diverge on its traversal. As a result, we may find multiple sources of obstruction globally, which leads to the loss of locality in proof; (2) The rename operation need to traverse on two paths and could bring obstruction across the path. It not only leads to more difficulty in source location, but also could cause the failure in finding the source of obstruction when two renames block each other. This paper handles these challenges through two key techniques:(1) only recognizing each local blocking chain for source location; (2) determining partial orders of obstruction among threads. We have successfully built a framework called CRL-T for termination verification and apply it to verify the termination of AtomFS. All the proofs are mechanized in Coq.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006639
    Abstract:
    The failure of a safety-critical system can cause serious consequences, and it is very important to ensure its correctness. The space embedded operating system is a typical safety-critical system. In the design of its memory management, we must ensure its efficient allocation and deallocation. And at the same time, the occupancy of system resources is minimized. In the traditional software development process, centralized testing and verification are usually carried out after the entire software development is completed, which will inevitably cause uncertain development. Therefore, this paper combines the formal verification method with the three-tier development framework of "demand-design-implementation" in the field of software engineering, and ensures the consistency of each level through the method of layered transfer verification. First, starting from the demand analysis of the demand level, the idea of formal proof is introduced to prove the correctness of the logic of the demand level, which can better guide the design of the program. Second, verification at the design level can greatly reduce the error rate of the development code, and prove the correctness of the call logic between the design algorithm and the function that needs to be implemented. Third, at the code level, we need to prove the consistency of the implemented code and the functional design, and prove the correctness of code. Using the interactive theorem proving auxiliary tool Coq, this paper takes the memory management module of a domestic space embedded operating system as an example, to prove the correctness of the memory management algorithm and the consistency of demand, design, and code.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006387
    Abstract:
    Path testing is a very important and widely used structural testing method. Existing path generation methods are either time-consuming or labor-intensive, or they can generate a large number of redundant paths. To solve the above problem, this paper mainly studies the optimization model of path selection problem and its evolutionary solution method. The purpose is to reduce the number of redundant paths and reduce test consumption without reducing test coverage. First, a number of paths are selected as the decision variable, and the number of edges and paths included in these paths are taken as the objective to formulate a multi-objective optimization model; then, the multi-objective evolutionary algorithm is employed to solve the formulated model with the purpose of obtaining the target path set. We apply the proposed method to test 7 benchmark programs and compare it with the existing method and greedy algorithm. Experimental results show that, compared with other algorithms, the proposed method can reduce the test consumption under the condition of ensuring test sufficiency, thereby improving the test efficiency.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006594
    Abstract:
    Providing safe, reliable, and efficient decisions is a challenging issue in the field of autonomous driving. At present, with the vigorous development of the autonomous driving industry, various behavioral decision-making methods are proposed. However, the decision of autonomous driving behavior is influenced by uncertainties in the environment, and the decision itself also requires effectiveness and high security, current methods are difficult to completely cover these issues. Therefore, we propose an autonomous driving decision-making approach with RoboSim model based on the Bayesian network. Semantic relationship information in driving scenarios is modeled by domain ontology, and an LSTM model for intention prediction of dynamic entities in scenarios is combined to provide driving scenario information for Bayesian networks. Using the decisions inferred by the Bayesian network, we abstract a specific RoboSim model for autonomous driving behavior decision-making, which is platform-independent, and it can simulate decision-making simulation execution cycle. In addition, the RoboSim model also can be transformed into other formal verification models, and in this paper, we use the model checking tool UPPAAL for verification and analysis to ensure the safety of the decision-making model. Combined with the case of lane change overtaking scenario, the feasibility of Bayesian network and RoboSim model construction method for autonomous driving behavior decision making is illustrated, which lays a foundation for providing a safe and efficient autonomous driving decision-making approach.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006492
    Abstract:
    For a given set of moving objects, Continuous k Nearest Neighbor (CkNN) query q over moving objects is to quickly identify and monitor the k nearest objects as objects and the query point evolve. In real life, many location-based applications in transportation, social network, e-commerce, and other fields involve the basic problem of processing CkNN queries over moving objects. Most of existing work processing CkNN queries usually need to determine a query range containing k nearest neighbors through multiple iterations, while each iteration has to identify the number of objects in the current query range, and which dominates the query cost. In order to address this issue, this work proposes a dual index called GGI that consists of a grid index and a Gaussian mixture function to simulate the varying distribution of objects. The bottom layer of GGI employs a grid index to maintain moving objects, and the upper layer constructs Gaussian mixture model to simulate the distribution of moving objects in two-dimensional space. Based on GGI, an incremental search algorithm called IS-CKNN to process CkNN queries. This algorithm directly determines a query region that at least contains k neighbors of q based on Gaussian mixture model, which greatly reduces the number of iterations. When the objects and query point evolve, an efficient incremental query strategy is further proposed, which can maximize the use of existing query results and reduce the calculation of the current query. Finally, extensive experiments are carried out on one real dataset and two synthetic datasets to confirm the superiority of our proposal.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006614
    Abstract:
    Determinization of a nondeterministic automaton is to construct another deterministic automaton that recognizes the same language as the nondeterministic one, which is one of the fundamental notions in automata theory. Determinization of ω automata serves as a natural basic step in the decision procedures of SnS, CTL*, μ-calculus etc. Meanwhile, it is also the key of solving infinite games. Therefore, it is of great significance to study the determinization of ω automata. We focus on a kind of ω automata called Streett automata. Nondeterministic Streett automata can be transformed into equivalent deterministic Rabin or parity automata. In our previous work, we have obtained the optimal and asymptotically optimal determinziation algorithms respectively. For evaluating the theoretical results of proposed algorithms and showing the procedure of determinization visually, it is necessary to develop a tool to support Streett determinization. In this paper, we first introduce four different Streett determinization constructions, including μ-Safra trees, H-Safra trees, compact Streett Safra trees and LIR-H-Safra trees. By H-Safra trees, which are optimal, and μ-Safra trees, deterministic Rabin transition automata are obtained. In addition, deterministic parity transition automata are constructed via another two structures, where LIR-H-Safra trees are asymptotically optimal. Further, based on the open source software, named Graphical Tool for Omega-Automata and Logics (GOAL), we implement a tool for Streett determinization, named NS2DR & PT. Besides, a benchmark is constructed by randomly generating 100 Streett automata. We have implemented these determinization constructions on the benchmark in NS2DR & PT, which shows that experimental results are consistent with theoretical analyses on state complexity. Moreover, the efficiency of different algorithms is also compared and analyzed.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006538
    Abstract:
    Existing malware similarity measurement methods cannot accommodate code obfuscation technology and lack the ability to model the complex relationships between malware. This paper proposes a malware similarity measurement method called RG-MHPE (API Relation Graph enhanced Multiple Heterogeneous ProxEmbed) based on multiplex heterogeneous graph solves the above problems. This method first uses the dynamic and static feature of malware to construct the multiplex heterogeneous graph and then proposes an enhanced proximity embedding method based on relational paths to solve the problem that proximity embedding cannot be applied to the similarity measurement of the multiplex heterogeneous graph. In addition, this article extracts knowledge from API documents on the MSDN website, builds an API relation graph, learns the Similarity between Windows APIs, and effectively slows down the aging speed of similarity measurement models. Finally, the experimental results show RG-MHPE has the best performance in similarity measurement performance and model anti-aging ability.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006595
    Abstract:
    ARM's Armv8.1-M architecture and the Arm Helium technology of the M-Profile vector extension solution have been declared to increase the machine-learning performance of the Arm Cortex-M processor up to 15 times. With the rapid development of the Internet of Things, the correct execution of the microprocessor is important. Since the development of chip simulators or programs on chip is relied on the official reference manual, it is also important to ensure its correctness. This paper introduces the correctness verification of the vectorized machine-learning instructions in the official reference manual of the Armv8.1-M architecture. We automatically extracted the operation pseudo-code of the vectorized machine-learning instructions, and then formalized them in semantics rules. With the executable framework provided by K Framework, the formalized semantics rules can be executed and tested by the benchmarks.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006612
    Abstract:
    The security issues of Trusted Execution Environment (TEE) have always been concerned by the domestic and foreign researchers. Memory tag technology utilized in TEE helps to achieve finer-grained memory isolation and access control mechanisms. Neverthless, prior works often rely on testing or empirical analysis to show their effectiveness, which lacks strong assurance of functional correctness and security properties. This paper proposes a general formal model framework for memory tag based access control, and presents a security analysis method in access control based on model checking. First, a general framework for access control model of TEE based on memory tag are constructed utilizing formal method, and those access control entities are formally defined. The defined rules include access control rules and tag update rules. Then the abstract machines of the framework is incrementally designed and implemented with formal language B. These abstract machines formalize the basic properties through invariant constraints. Next, a TEE implementation called TIMBER-V is used as an application case. The TIMBER-V access control model is constructed by instantiating these abstract machines, and the security properties are formally specified. The functional correctness and security of the instantiated model is verified based on model checking. Finally, this paper simulates the specific attack scenarios and these attacks are successfully detected. The evaluation results show the effectiveness of the security analysis method.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006552
    Abstract:
    With the rapid development of emerging technologies, domain software puts forward the new requirements on development efficiency. Datalog as a declarative programming language with concise syntax and good semantics, can help developers to reason and solve complex problems rapidly. However, when solving the real-world problems, the existing single-machine Datalog engines are often limited by the size of memory capacity and have no scalability. In order to solve the above problems, this paper designs and implements Datalog engine based on out-of-core computation. Methods firstly, a series of out-of-core operators are designed, and then the Datalog program is converted into the C++ program with the operators. Then, the partition strategy based on Hash and the minimum replacement scheduling strategy based on search tree pruning are designed. The corresponding partition files are scheduled and computed, and then the final results are generated. Based on this method, the prototype tool DDL(Disk-Based DataLog Engine) is implemented, and widely used real-world Datalog programs are selected to conduct experiments on both synthetic and real-world datasets. The experimental results show that DDL has good performance and high scalability.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006593
    Abstract:
    In recent years, deep reinforcement learning has been widely used in sequential decision making. The approach works well in many applications, especially in those scenarios with high-dimensional input and large state spaces. However, there are some limitations of these deep reinforcement learning methods, such as lack of interpretability, inefficient initial training, cold start, etc. In this paper, we propose a framework combining explicit knowledge reasoning with deep reinforcement learning, to alleviation the above problems. The framework successfully leverages high-level priori knowledge in the deep learning process via explicit knowledge representation, resulting in improvement of the training efficiency and the interpretability. The explicit knowledge is categorized into two kinds, namely, acceleration knowledge and safety knowledge. The former intervenes in the training, especially at the early stage, to speed up the learning process, while the latter keeps the agent from catastrophic actions to keep it safe. Our experiments in several domains with several baselines show that the proposed framework significantly improves the training efficiency and the interpretability, and the improvement is general for different reinforcement learning algorithms and different scenarios.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006618
    Abstract:
    Data races are common defects in multi-threaded programs. Traditional data race analysis methods are difficult to achieve both in recall and precision. Their detection reports are difficult to locate the root cause of the defect. Considering that Petri nets have the advantages of accurate behavior description and rich analysis tools in the modeling and analysis of concurrent systems, a new data race detection method based on Petri net unfolding technology is proposed. First, by analyzing a program running trace, a Petri net model of the program is mined. It implies multiple different traces of the program even though it is mined from only one trace, which can reduce the false negative rate of traditional dynamic methods while ensuring the performance. After that, a Petri net unfolding-based detection method of program potential data races is proposed, which has a significant improvement in efficiency compared with static methods. Furthermore, it can clearly show the triggering path of the data race defect. Finally, for the potential data race detected in the previous stage, a scheduling schema is designed to replay the defect based on the CalFuzzer platform, which can eliminate false positives and ensure the authenticity of detection results. The corresponding prototype system is developed, and the effectiveness of the proposed method is verified with open program instances.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006558
    Abstract:
    Feature requests are enhancements for existing features or requests for new features proposed by end users on the open forums, which can reflect the users' wishes and represent the users' needs. Feature requests play a vital role in improving user satisfaction and improving product competitiveness. It has become an important source of software requirements. However, feature requests are different from traditional requirements in terms of source, content and forms. Thus, there must be differences in applying feature requests to software development than traditional requirements. At present, there are many researches about feature requests on different topics, e.g., classification, prioritization, quality management, and so on. With the continuous increase of related researches, the necessity of a survey of user feature requests analysis and processing has increased. In this paper, we investigated 121 academic research papers on how to analyze and process feature requests in the software development process. We sort the existing researches from the perspective of applying feature requests to the software development process. We summarized the research topics on feature requests and investigated the research progress. Besides, we mapped the feature requests research topics to traditional requirements engineering processes. We analyze the existing research methods and point out research gaps. Finally, in order to provide guidance for future researches, a perspective of the future work in this research area is discussed.
    Available online:  January 28, 2022 , DOI: 10.13328/j.cnki.jos.006592
    Abstract:
    In recent years, artificial intelligence has been rapidly advancing. Artificial intelligence system has penetrated our life and has become an indispensable part of our life. However, artificial intelligence systems require a large amount of data to train models, and data disturbances will affect their results. What's more, with the business form changing, the scale becoming more complex, the trustworthiness of the artificial intelligence systems has been getting more and more attention. Firstly, based on summarizing the trustworthiness attributes proposed by various organizations and scholars, we introduce the nine trustworthiness attributes of artificial intelligence. Next, we present the existing AI systems measurement method for the data, model, and result trustworthiness, and propose a artificial intelligence trustworthy evidence collection method. Then, we discuss the trustworthiness measurement model of AI systems. Combined with existing attributes-based software trustworthiness measurement methods and blockchain technology, we propose an artificial intelligence system trustworthiness measurement framework, including the decomposition of trustworthiness attributes and evidence acquisition method, the federation trustworthiness measurement model, and the blockchain-based artificial intelligence trustworthiness measurement structure. Finally, we analyzed the opportunities and challenges of trustworthiness measurement technology for artificial intelligence systems.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006539
    Abstract:
    Database can provide efficient storage and access for massive data. However, it is nontrivial for non-experts to command database query language like SQL, which is essential for querying databases. Hence, querying databases using natural language (i.e., text-to-SQL) has received extensive attention in recent years. This paper provides a holistic view of text-to-SQL technologies and elaborates on current advancements. It first introduces the background of the research and describes the research problem. Then the paper focuses on the current text-to-SQL technologies, including pipeline-based methods, statistical-learning-based methods, as well as techniques developed for multi-turn text-to-SQL task. The paper goes further to discuss the field of semantic parsing to which text-to-SQL belongs. Afterward, it introduces the benchmarks and evaluation metrics that are widely used in the research field. Moreover, it compares and analyzes the state-of-the-art models from multiple perspectives. Finally, the paper summarizes the potential challenges for text-to-SQL task, and gives some suggestions for future research.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006540
    Abstract:
    In the era of today’s Internet of Things, embedded systems are becoming important components for accessing the cloud, which are used in both secure and privacy-sensitive applications or devices frequently. However, the underlying software (a.k.a. firmware) often suffered from a wide range of security vulnerabilities. The complexity and heterogeneous of the underlying hardware platform, the difference of the hardware and software implementation, the specificity and limited document, together with limited running environment made some of very good dynamic testing tools for desktop systems hard to (even impossible) be adapted to embedded devices/firmware environment directly. In recent years, researchers have made great progress in detecting well-known vulnerabilities in embedded device firmware based on binary code similarity analysis. Focusing on the key technical challenges of binary code similarity analysis, we studied the existing binary code similarity analysis technologies systematically, analyzed and compared the general process, technical characteristics and evaluation criteria of these technologies comprehensively. Then we analyzed and summarized the application of these technologies in the field of the embedded device firmware vulnerability search. At the end, we presented some technical challenges in this field and proposed some open future research directions for the related researchers.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006544
    Abstract:
    Text style transfer is one of the hot issues in the field of natural language processing in recent years. It aims to transfer the specific style or attributes of the text (such as emotion, tense, gender, etc.) through editing or generating while retaining the text content. The purpose of this article is to sort out the existing methods in order to advance this research field. First, the problem of text style transfer is defined and the challenges are given; then, the existing methods are classified and reviewed, focusing on the TST methods based on unsupervised learning and further dividing them into the implicit methods and the explicit methods. The implementation mechanisms, advantages, limitations and performance of each method are also analyzed; Subsequently, the performance of several representative methods on automatic evaluation indicators such as transfer accuracy, text content retention, and perplexity are compared through experiments; finally, the research of text style transfer is concluded and prospected.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006522
    [Abstract] (1150) [HTML] (0) [PDF 1.07 M] (1629)
    Abstract:
    Reasoning over knowledge graphs aims to infer new facts based on known ones, so as to make the graphs as complete as possible. In recent years, distributed embedding-based reasoning methods have made great success on this task. However, due to their black-box nature, these methods cannot provide interpretability for a specific prediction. Therefore, there has been a growing interest in how to design user-understandable and user-trustworthy reasoning models. Starting from the basic concept of interpretability, this paper systematically studies the recently developed methods for interpretable reasoning on knowledge graphs. Specifically, it introduces the research progress of ante-hoc and post-hoc interpretable reasoning models. According to the scope of interpretability, ante-hoc interpretable models can be further divided into local-interpretable and global-interpretable models. In post-hoc interpretable reasoning models, this paper reviews representative reasoning methods and introduces two post-hoc interpretation methods in detail. Next, it also summarizes the application of explainable knowledge reasoning in such fields as finance and healthcare. Then, this paper summarizes the current situation in explainable knowledge learning. Finally, the future technological development of interpretable reasoning models is prospected.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006524
    [Abstract] (1195) [HTML] (0) [PDF 1.86 M] (829)
    Abstract:
    Time-sensitive networking (TSN) is an important research area to update infrastructure of industrial internet of things. Deterministic transmission in TSN is the key technologies, mainly including time-triggered scheduling in control plane, mixed-cricality transmission, and deterministic delay analysis, to support deterministic real-time transmission requirements for industrial control. This paper surveys the related works on deterministic transmission technologies of TSN in recent years and systematically cards and summarzes them. First, this paper presents the different transmission models of different kinds of flows in TSN. Second, based on these models, on the one hand, this paper presents time-triggered scheduling model and its research status and existing challenges on control plane. On the other hand, this paper presents the architecture of TSN switches, the strategies of mixed-criticality transmission and their disadvatanges and the corresponding improvement approaches. Third, this paper models the transmission delay of the whole TSN based on netowork calculus and presents the delay analysis methods, their research status and possible improvement directions. Finally, this paper summarizes the challenges and research prospects of deterministic transmission technologies in TSN.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006528
    Abstract:
    Blockchains such as Ethereum serially execute smart contract transactions in a block, which can strictly guarantee the consistency of the blockchain state between nodes after execution, but it has become a serious bottleneck restricting the throughput of these blockchains. Therefore, the use of parallel methods to optimize the execution of smart contract transactions has gradually become the focus of industry and academia. This paper summarizes the research progresses of the parallel execution methods of smart contracts in blockchains, and proposes a research framework. From the perspective of the phases of parallel execution of smart contracts, the framework condenses four parallel execution models of smart contracts, namely the parallel execution model based on static analysis, the parallel execution model based on dynamic analysis, the parallel execution model between nodes and the divide-and-conquer parallel execution model, and describes the typical parallel execution methods under each model. Finally, this paper discusses the factors affecting parallel execution such as the transaction dependency graph and concurrency control strategies, and proposes future research directions.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006377
    Abstract:
    As a typical form of the Serverless architecture, the Function as a Service (FaaS) architecture abstracts the business into fine-grained functions, and provides automatic operation and maintenance functionality such as auto-scaling, which can greatly reduce the operation and maintenance costs. Some of the high concurrent, high available, and high flexible services (such as payment, red packet, etc.) in many online service systems have been migrated to the FaaS platform, but a large number of traditional monolithic applications still find it difficult to take advantage of the FaaS architecture. In order to solve this problem, a dynamic and static analysis based FaaS migration approach for monolithic applications is proposed in this paper. This approach identifies and strips the implementation code and dependencies for the specified monolithic application API by combining dynamic and static analysis, and then completes the code refactoring according to the function template. Aiming at the cold-start problem of functions in high concurrency scenario, this approach uses the master-slave multithreaded Reactor model based on IO multiplexing to optimize the function template and improve the concurrency processing capability of a single function instance. Based on this approach, we implemented Codext, a prototype tool for Java language, and carried out experimental verification on OpenFaaS, an open source Serverless platform, for four open source monolithic applications.
    Available online:  December 24, 2021 , DOI: 10.13328/j.cnki.jos.006350
    Abstract:
    With the popularization of digital information technology, the reversible data hiding in encrypted images (RDHEI) has gradually become the research hotspot of privacy protection in cloud storage. As a technology which can embed additional information in encrypted domain, extract the embedded information correctly and recover the original image without loss, RDHEI has been widely paid attention by researchers. To embed sufficient additional information in the encrypted image, a high-capacity RDHEI method using adaptive encoding is proposed in this paper. Firstly, the occurrence frequency of different prediction errors of the original image is calculated and the corresponding adaptive Huffman coding is generated. Then, the original image is encrypted with stream cipher and the encrypted pixels are marked with different Huffman codewords according to the prediction errors. Finally, additional information is embedded in the reserved room of marked pixels by bit substitution. The experimental results show that the proposed algorithm can extract the embedded information correctly and recover the original image losslessly. Compared with similar algorithms, the proposed algorithm makes full use of the characteristics of the image itself and greatly improves the embedding rate of the image. On UCID, BOSSBase, and BOWS-2 datasets, the average embedding rate of the proposed algorithm reaches 3.162 bpp, 3.917 bpp, and 3.775 bpp, which is higher than the state-of-the-art algorithm of 0.263 bpp, 0.292 bpp, and 0.280 bpp, respectively.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006363
    Abstract:
    Given a distributed set D of categorical data defined on a domain D, we study differentially private algorithms for releasing a histogram to approximate the categorical data distribution in D. Existing solutions for this problem mostly use central/local differential privacy models, which are two extreme assumptions of differential privacy. The two models, however, cannot balance the contradiction between the privacy requirement of users and the analysis accuracy of collectors. To remedy the deficiency caused by the current solutions under central/local differential privacy, this paper proposes a differentially private method in a shuffling way, called HP-SDP, to release histogram. HP-SDP firstly employs the local hash technology to design the shuffled randomized response mechanism. Based on this mechanism, each user perturbs her/his data in a linear decomposition way of perturbation function, without worrying about the domain size, and reports the perturbed messages to the shuffler. And then, the shuffler in HP-SDP permutes the reported messages by using a uniformly random permutation method, which makes sure the shuffled messages satisfy central differential privacy, and the collector cannot reidentify a target user. Furthermore, HP-SDP adopts the convex programming technology to boost the accuracy of the released histogram. Theoretical analysis and experimental evaluations show that our methods can effectively improve the utility of the histogram, and outperform the existing solutions.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006368
    Abstract:
    Video Click-Through Rate (CTR) Prediction is one of the important tasks in the context of video recommendation. According to click-through prediction, recommendation systems can adjust the order of the recommended video sequence to improve the performance of video recommendation. In recent years, with the explosive growth of videos, the problem of video cold start has become more and more serious. Aim for this problem, we propose a novel video click-through prediction model which utilizes both the video content features and context features to improve CTR prediction; we also propose a simulation training of the cold start scenario and neighbor-based new video replacement method to enhance the model's CTR prediction ability for new videos. Our proposed model is able to predict CTR for both old and new videos. The experiments on two real-world video CTR datasets (Track_1_series and Track_2_movies) show the effectiveness of our proposed method. Specifically, our proposed model using both video content and contextual information improves the performance of CTR prediction for old videos, which also outperforms the existing models on both datasets. Additionally, for new videos, a baseline model without considering the cold start problem achieves an AUC score of about 0.57. By contrast, our proposed model gives much better AUC scores of 0.645 and 0.615 on Track_1_series and Track_2_movies, respectively, showing the better robustness to the cold start problem.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006500
    Abstract:
    During the software development and maintenance process, bug fixers usually refer to bug reports submitted by end-users or developers/testers to locate and fix a bug. In this sense, the quality of the bug report largely determines whether the bug fixer could quickly and precisely locate the bug and further fix it. Researchers have done much work on characterizing, modeling, and improving the quality of bug reports. This paper offers a systematic survey on existing work on bug report quality, with an attempt to understand the current state of research on this area as well as to open new avenues for future research work. Firstly, we summarized a list of quality problems of bug reports reported by existing studies, such as the missing of key information and errors in information items. Then, we presented a series of work on automatically modeling bug report quality. After that, we introduced those approaches that aim to improve bug report quality. Finally, we discussed the challenges and potential opportunities for research on bug report quality.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006502
    [Abstract] (370) [HTML] (0) [PDF 1.10 M] (1171)
    Abstract:
    Nowadays, the big data processing frameworks such as Hadoop and Spark, have been widely used for data processing and analysis in industry and academia. These big data processing frameworks adopt the distributed architecture, generally developed in object-oriented languages like Java, Scala, etc. These frameworks take Java Virtual Machine (JVM) as the runtime environment on cluster nodes to execute computing tasks, i.e., relying on JVM's automatic memory management mechanism to allocate and reclaim data objects. However, current JVMs are not designed for the big data processing frameworks, leading to many problems such as long garbage collection (GC) time and high cost of data serialization and deserialization. As reported by users and researchers, GC time can take even more than 50% of the overall application execution time in some cases. Therefore, JVM memory management problem has become the performance bottleneck of the big data processing frameworks. This paper makes a systematic review of the recent JVM optimization research work for big data processing frameworks. Our contributions include (1) We summarize the root causes of the performance degradation of big data applications when executed in JVM; (2) We summarize the existing JVM optimization techniques for big data processing frameworks. We also classify these methods into categories, compare and analyze the advantages and disadvantages of each, including the method's optimization effects, application scopes and burdens on users; (3) We finally propose some future JVM optimization directions, which will help the performance improvement of big data processing frameworks.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006503
    Abstract:
    Object-oriented software metrics are important for understanding and guaranting the quality of object-oriented software. By comparing object-oriented software metrics with their thresholds, we can simply and intuitively evaluate whether there is a bug. The methods to deriving metrics thresholds mainly include unsupervised learning methods based on the distribution of metric data and supervised learning methods based on the relationship between the metrics and defect-proneness. The two types of methods have their own advantages and disadvantages:unsupervised methods do not require label information to derive thresholds and are easy to implement, but the resulting thresholds often have a low performance in defect prediction; supervised methods improve the defect prediction performance by machine learning algorithms, but they need label information to derive the thresholds, which is not easy to obtain, and the linking technology between metrics and defect-proneness is complex. In recent years, researchers of the two types of methods have continued to explore and made a great progress. At the same time, it is still challenging to derive the thresholds of object-oriented software metrics. This paper offers a systematic survey of recent research achievements in deriving metric thresholds. First, we introduce the research problem in object-oriented software metric threshold derivation. Then, we describe the current main research work in detail from two aspects:unsupervised and supervised learning methods. After that, we discuss related techniques. Finally, we summarize the opportunities and challenges in this field and outline the reaearch directions in the future.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006510
    Abstract:
    With the increasing scale and complexity of computer networks, it is difficult for network administrators to ensure that the network intent has been correctly realized, and the incorrect network configuration will affect the security and availability of the network. Inspired by the successful application of formal methods in the field of hardware verification and software verification, researchers applied formal methods to networks, forming a new research field, namely Network Verification, which aims to use rigorous mathematical methods to prove the correctness of the network. Network Verification has become a hot research topic in the field of network and security, and its research results have been successfully applied in actual networks. From the three research directions of data plane verification, control plane verification, and stateful network verification, this paper systematically summarizes the existing research results in the field of network verification, and analyzes the research hotspots and related solutions, aiming to organize the field of network verification and provides systematic references and future work prospects for researchers in the field.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006513
    Abstract:
    Anonymous networks aim to protect the user's communication privacy in open network environment. Since Chaum proposed Mix-net, related work has been progressing in decades. Nowadays, based on Mix-net, DC-net or PIR, many anonymous networks have been developed, for various application scenarios and threat models by integrating multiple design elements. Beginning from anonymity concepts, this paper introduces the overall development of anonymous network area. Representative works and their design choices are classified and articulated. This paper systematically analyzes the characteristics of anonymous networks from the aspects of anonymity, latency and bandwidth overhead, etc.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006515
    Abstract:
    This paper proposes a feature extraction algorithm based on the principal component analysis of the anisotropic Gaussian kernel penalty which is different from the traditional kernel principal component analysis algorithms. In the non-linear data dimensionality reduction, the infinite steel tempering of raw data is ignored by the traditional kernel principal component analysis algorithms. Meanwhile, the previous kernel function is mainly controlled by one identical kernel width parameter in each dimension, which cannot reflect the significance of different features in each dimension precisely, resulting the low accuracy of dimensionality reduction process. To address the above issues, contraposing the current problem of infinite steel tempering of raw data, an averaging algorithm is presented in this paper, which has shown good performance in improving the variance contribution rate of the original data typically. Then, anisotropic Gaussian kernel function is introduced owing each dimension has different kernel width parameters which can critically reflect the importance of the dimension data features. In addition, the feature penalty function of kernel principal component analysis is formulated based on the anisotropic Gaussian kernel function to represent the raw data with fewer features and reflect the importance of each principal component information. Furthermore, the gradient descent method is introduced to update the kernel width of feature penalty function and control the iterative process of the feature extraction algorithm. To verify the effectiveness of the proposed algorithm, several algorithms are compared on UCI public data sets and KDDCUP99 data sets respectively. The experimental results show that the feature extraction algorithm of the principal component analysis based on the anisotropic Gaussian kernel penalty is 4.49% higher on average than the previous principal component analysis algorithms on UCI public data sets. The feature extraction algorithm of the principal component analysis based on the anisotropic Gaussian kernel penalty is 8% higher on average than the previous principal component analysis algorithms on KDDCUP99 data sets.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006518
    Abstract:
    Recently, with the rapid development of information technology, emerging technologies represented by artificial intelligence are widly applied in education, triggering profound changes in the concept and mode of learning. And, online learning transcends the limitations of time and space, providing more possibilities for learners to learn "anytime and anywhere". However, the separation of time and space of teachers and students in online learning makes teachers could not handle students' learning process, limits the quality of teaching and learning. Diversified learning targets and massive learning resources generate some new problems, i.e., how to quickly accomplish learning targets, reduce learning costs and reasonably allocate learning resources. And these problems have become the limitations of the development of individuals and the society. However, traditional "one size fitsall" educational model can no longer fit human's nedds, thus, we need one more effieient and scientific personalized education model to help learners maximize their learning targets with minimal learning costs. Based on these considerations, what we need is to new adaptive learning system which could automatically and efficiently identify learner personalized characteristics, efficiently organize and allocate learning resources, and plan a global personalized learning path. In this paper, we systematically review and analyze the current researches on personalized learning path recommendation, and we analyze different research sight from multidisciplinary perspective. Then, we summarize the most applied algorithm in current research. Finally, we highlight the main shortcomings of the current rearch, which we should pay more attention to.
    Available online:  November 24, 2021 , DOI: 10.13328/j.cnki.jos.006521
    Abstract:
    Recommender system is an information filtering system that helps users filter a large number of invalid information to obtain information or items by estimating their interests and preferences. The mainstream traditional recommendation system mainly uses offline and historical user data to continuously train and optimize offline models, and then recommend items for online users. There are three main problems:the unreliable estimation of user preferences based on sparse and noisy historical data, the ignorance of online contextual factors that affect user behavior, and the unreliable assumption that users are aware of their preferences by default. Since the dialogue system focuses on the user's real-time feedback data and obtains the user's current interaction intentions, "conversational recommendation " combines the interactive form of the dialogue system with the recommendation task, and becomes an effective means to solve the traditional recommendation problem. Through online interactive methods, conversational recommendation can guide and capture users' current preferences and interests, and provide timely feedback and updates. Thanks to the widespread use of voice assistants and chatbot technologies, as well as the mature application of technologies such as reinforcement learning and knowledge graphs in recommendation strategies, in the past few years, more and more researchers have paid attention to conversational recommendation systems. This survey combs the overall framework of the conversational recommendation system, classifies the datasets used in the conversational recommendation algorithm, and discusses the relevant metrics to evaluate the effect of the conversational recommendation. Focusing on the background interaction strategy and recommendation logic in conversational recommendation, this survey summarizes the existing research achievements of the domestic and foreign researchers in recent years. And finally, this survey also summarizes and prospects future works of conversational recommendation.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006499
    Abstract:
    Accurately predicting the status of 1p/19q is of great significance for formulating treatment plans and evaluating the prognosis of gliomas. Although there are some works which can predict the status of 1p/19q accurately based on magnetic resonance images and machine learning methods, they require to delineate the tumor contour preliminarily, which cannot satisfy the needs of computer-aided diagnosis. To deal with this issue, this work proposes a novel deep multi-scale invariant features-based network (DMIF-Net) for predictions 1p/19q status in glioma. Firstly, it uses the wavelet-scattering network to extract multi-scale and multi-orientation invariant features, and deep split and aggregation network to extract semantic features. Then, it reduces the feature dimensions using a multi-scale pooling module and fuses these features with concatenation. Finally, with inputing the bounding box of the tumor region it can predict the 1p/19q status accurately. The experimental results illustrate that, without requiring to delineate the tumor region accurately, the AUC predicted by DMIF-Net can reach 0.92 (95%CI=[0.91,0.94]). Compared with the best deep learning model, the AUC, sensitivity, and specificity increased by 4.1%, 4.6%, and 3.4%, respectively. Compared with the state-of-the-art models on glioma, AUC and accuracy have increased by 4.9% and 5.5%, respectively. Moreover, the ablation experiments demonstrate that the proposed multi-scale invariant feature extraction module can promote effectively the 1p/19q prediction performance, which verify that combining the semantic and multi-scale invariant features can significantly increase the prediction accuracy for 1p/19q status without knowing the boundaries of tumor region, providing therefore an auxiliary means for formulating personalized treatment plan for low-grade glioma.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006485
    [Abstract] (2082) [HTML] (0) [PDF 841.81 K] (2229)
    Abstract:
    Reinforcement learning is a technique that discovers optimal strategies in a trial and error way, and has become a general method for solving environmental interaction problems. However, as a machine learning method, reinforcement learning faces an unexplainable problem in machine learning. The unexplainable problem limits applications of reinforcement learning in safety-sensitive fields, e.g., medical, military, transportation, etc., and leads to the lack of universally applicable solutions in environmental simulation and task generalization. Though a lot of works devoted to overcoming this weakness, the academic community still lacks a consistent understanding of explainable reinforcement learning. In this paper, we explore the basic problems of reinforcement learning and review existing works. To begin with, we explore the parent problem, i.e., explainable artificial intelligence, and summarizes its existing definitions. Next, we construct an interpretability theoretical system to describe the common problems of explainable reinforcement learning and explainable artificial intelligence, which discussing intelligent algorithms and mechanical algorithms, interpretation, factors that affect interpretability, and the intuitiveness of the explanation. Then, three unique problems of explainable reinforcement learning, i.e., environmental interpretation, task interpretation, and strategy interpretation, are defined based on the characteristics of reinforcement learning. After that, the latest researches on explainable reinforcement learning are reviewed, and the existing methods were systematically classified. Finally, we discuss the research directions in the future.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006488
    Abstract:
    In recent years, with the continuous development of computer vision, semantic segmentation and shape completion of 3D scene have been paid more and more attention by academia and industry. Among them, semantic scene completion is an emerging research in this field, which aims to to simultaneously predict the spatial layout and semantic labels of a 3D scene, and has developed rapidly in recent years. In this paper, we classify and summarize the methods based on RGB-D images proposed in this field in few years. These methods are divided into two categories based on whether deep learning is used or not, which include traditional methods and deep learning-based methods. Among them, the methods based on deep learning are divided into two categories according to the input data type, which are the methods based on single depth image and the methods based on RGB-D images. Based on the classification and overview of the existing methods, we collate the relevant datasets used for semantic scene completion task and analyze the experimental results. Finally, we summarize the challenges and development prospects of this field.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006425
    Abstract:
    In the process of software testing, the expected output of the program under test is an important factor in judging whether the program is defective. Metamorphic testing technique uses the properties of the program under test to check the output of the program, so as to effectively solve the problem of being difficult to construct the expected output of the program. In recent years, metamorphic testing has blossomed in the field of software testing. Many researchers have optimized techniques related to metamorphic testing and applied them to various fields to effectively improve software quality. This study summarizes and analyzes the research work of metamorphic testing from the following three aspects:theoretical knowledge, improvement strategies and application areas, especially the research results of the past five years. Meanwhile, the possible research is discussed when metamorphic testing is applied for parallel programs. First, the basic concepts of metamorphic testing and the metamorphic testing process are provided; next, according to its steps, the optimization techniques for metamorphic testing are summarized from the four perspectives:metamorphic relationships, test case generation, test execution, and metamorphic testing tools; then, the application fields of metamorphic testing are listed; finally, based on the existing research results, the problems faced by metamorphic testing are discussed in parallel program testing, and the possible solutions are provided for further research.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006426
    [Abstract] (1341) [HTML] (0) [PDF 1.38 M] (1248)
    Abstract:
    Knowledge Graphs (KGs) serve as a kind of knowledge base by storing facts with network structure, representing each piece of fact as a triple, i.e. (head, relation, tail). Thanks to the general applications of KGs in various of fields, the embedding learning of Knowledge Graph has also quickly gained massive attention. In this article, we try to classify the existing embedding algorithms as five types:translation-based models, tensor factorization-based models, traditional deep learning-based models, graph neural network-based models and models by fusing extra information. Then we introduce and analyze the key ideas, algorithm features, advantages and disadvantages of different embedding models to give the first-time researchers a guide article that can be referenced to help researchers quickly get started.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006429
    [Abstract] (1543) [HTML] (0) [PDF 1.97 M] (2234)
    Abstract:
    Knowledge Graph (KG) is a kind of technology that uses graph model to describe the relationship between knowledge and modeling things. Knowledge Graph Embedding (KGE), as a widely adopted knowledge representation method, its main idea is to embed entities and relationships in a knowledge graph into a continuous vector space, which is used to simplify operations while preserving the intrinsic structure of the KG. It can benefit a variety of downstream tasks, such as KG completion and relation extraction, etc. Firstly, the existing knowledge graph embedding technologies are comprehensively reviewed, including not only techniques using the facts observed in KG for embedding, but also dynamic KG embedding methods that add time dimensions, as well as KG embedding technologies that integrate multi-source information. The relevant models are analyzed, compared and summarized from the aspects of entity embedding, relation embedding and scoring functions. Then, typical applications of KG embedding technologies in downstream tasks are briefly introduced, including question answering systems, recommendation systems and relationship extraction. Finally, the challenges of knowledge graph embedding are expounded, and the future research directions are prospected.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006431
    Abstract:
    Code smells are low-quality code snippets that are in urgent need of refactoring. Code smell is a research hotspot in software engineering, with many related research topics, large time span, and rich research results. To sort out the relevant research approach and results, analyze the research hotspots, and predict the future research directions, this paper systematically analyzes and classifies more than 300 papers related to code smell published from 1990 to June 2020. This paper analyzes the development trend of code smells, quantitatively reveals the mainstream and hot spots of related research, identifies the key code smells concerned by the academia, and also studies the differences of concerns between industry and academia.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006434
    Abstract:
    Since ordinary city road map has not covered the road restrictions information for the lorry, and lacks of hot spots labeling, they cannot satisfy massive batches and long-distance road transportation requirements of bulk commodity transporting. In order to address the issues of frequent transportation accidents and low logistics efficiency, and further improve the truck drivers' travel experience, it is urgent to combine with the type of goods transported and the type of truck as well as the driver's route selection preference to study the building method of customized logistics map for bulk commodity transporting. With the widespread applications of mobile Internet and Internet of vehicles, spatio-temporal data generated by bulk commodity transporting is growing rapidly. It constitutes logistics big data with other logistics operational data, which provides a solid data foundation for logistics map building. In this paper, we first comprehensively review the state-of-the-art work about the issue of map building using trajectory data. Then, to tackle the limitations of existing digital map building methods in the field of bulk commodity transporting, we put forward a data-driven logistics map building framework using multi-source logistics data. We focus on the following researches:(1) multi-constraint logistics map construction based on users' prior knowledge; (2) dynamic spatio-temporal data driven logistics map incremental updating. Logistics map will become AI infrastructure for new generation of logistics technology fit for bulk commodity transportation. The research results of this paper provide rich practical contents for the technical innovation of logistics map building, and offer new solutions to promote the cost reduction and efficiency improvement of logistics, which have important theoretical significance and application values.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006435
    Abstract:
    Anycast uses BGP to achieve the best path selection by assigning the same IP address to multiple terminal nodes. In recent years, as anycast technology has become more and more common, it has been widely used in DNS and CDN services. This paper fisrtly introduces anycast technology in an all-round way and then discuss current problems of anycast technology and summarizes these problems into three categories:anycast inference is imperfect, anycast performance cannot be guaranteed, and it is difficult to control anycast load balancing. In response to these problems, the latest research progress is described. Finally, we summarize the problems in solving anycast problems and the direction of improvement and provides useful references for researchers in related fields.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006437
    Abstract:
    As a distributed storage solution with high performance and high scalability, key-value storage systems have been widely adopted in recent years, such as Redis, MongoDB, Cassandra, etc. On the one hand,the multi-replication mechanism widely used in distributed storage system improves system throughput and reliability, but also increases the extra overhead of system coordination and replication consistency. For the cross-region distributed system, the long-distance replication coordination overhead may even become the performance bottleneck of the system, reducing system availability and throughput. The distributed key-value storage system called Elsa, which proposed in the article is a coordination-free multi-master key-value storage system that designed for cross-region architecture. On the basis of ensuring high performance and high scalability, Elsa adopts the conflict-free replicated data types (CRDT) technology to ensure strong eventual consistency between replications without coordination,reducing the coordination overhead between system nodes. In this paper, we set up a cross-region distributed environment spanning 4 datacenters and 8 nodes on aliyun platform and make a large-scale distributed performance comparison experiment.The experimental results show that under the cross-region distributed environment, the throughput of Elsa has obvious advantages for high concurrent contention loads, reaching up to 7.37 times of the MongoDB cluster and 1.62 times of the Cassandra cluster.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006402
    Abstract:
    Blockchain is a distributed ledger constructed by a series of network nodes. It owns the following security attributes:unforgeability, decentralization, trustless, provable security based on cryptography and non-repudiation. This paper summarizes those security services, including data confidentiality, data integrity, authentication, data privacy, assured data erasure. This paper first introduces the concept of blockchain and public key cryptography. For the above-mentioned five security services, we analyze existing security threats faced by users in actual scenarios and their corresponding solutions. We also discuss the drawbacks of those traditional implementations, and then introduce countermeasures based on blockchain. Finally, values and challenges associated with blockchain are also discussed.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006420
    [Abstract] (423) [HTML] (0) [PDF 1.28 M] (1142)
    Abstract:
    Emotion is the external expression of affect, which has an influence on cognition, perception and decision-making of our daily life. As one of the basic problems in the realization of overall computer intelligence, emotion recognition has been studied in depth and widely applied in fields of affective computing and human-computer interaction. Comparing to facial expression, speech and other physiological signals, using EEG to recognize emotion is attracting more attention for its higher temporal resolution, lower cost, better identification accuracy and higher reliability. In recent years, more deep learning architectures are applied and have achieved better performance than traditional machine learning methods in this task. Deep learning for EEG-based emotion recognition is one of the research focuses and it remains many challenges to overcome. Considering that there exist few review literature to refer to, in this paper, we investigate the implementation of deep learning in EEG-based emotion recognition. Specifically, input formulation, deep learning architecture, experimental setting and results are surveyed. Besides, we carefully screen articles that evaluated their model on the widely used datasets, DEAP and SEED, perform qualitative and quantitative analysis from different aspects and make a comparison. Finally, we summarize the total work and give the prospect of future work.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006421
    Abstract:
    In order to ensure the network-wide consensus and tamper proof of the transaction ledger, the miner nodes are required to possess strong computing and storage resource in the traditional blockchain technology. It greatly limits the resource-constrained devices to join in the blockchain systems. In recent years, blockchain technology has been expanded in many fields, such as financial economy, health care, Internet of Things, supply chain, etc.. However, there are a large number of devices with weak computing power and low storage capacity in these application scenarios, which brings great challenges to the application of blockchain. Therefore, lightweight blockchain technology is emerging. In this paper, we summarize some related works of lightweight blockchain from the two aspects of lightweight computing and storage. We compare and analyze their advantages and disadvantages. Finally, we look forward to the future development of the lightweight blockchain systems.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006358
    Abstract:
    In recent years, deep learning has shown excellent performance in image steganalysis. At present, most of the image steganalysis models based on deep learning are special steganalysis models, which are only applied to a specific steganography. To detect the stego images of other steganographic algorithms using the special steganalysis model, a large number of stego images encoded by the steganographic algorithms are regarded as datasets to retrain the model. However, in practical steganalysis tasks, it is difficult to obtain a large number of encoded stego images, and it is a great challenge to train the universal steganalysis model with very few stego images samples. Inspired by the research results in the field of few-shot learning, we propose a universal steganalysis method based on transductive propagation network. First, the feature extraction network is improved based on the existing few-shot learning classification framework, and the multi-scale feature fusion network is designed, so that the few-shot classification model can extract more steganalysis features for the classification task based on weak information such as secret noise residue. Second, to solve the problem that steganalysis model based on few-shot learning is difficult to converge, the initial model with prior knowledge is obtained by pre-training. Then, the steganalysis models based on few-shot learning in frequency domain and spatial domain are trained respectively. The results of self-test and cross-test show that the average detection accuracy is above 80%. Furthermore, the steganalysis models based on few-shot learning in frequency domain and spatial domain are retrained by means of dataset enhancement, so that the detection accuracy of the steganalysis models based on few-shot learning is improved to more than 87% compared with the previous steganalysis model based on few-shot learning. Finally, the proposed steganalysis model based on few-shot learning is compared with the existing steganalysis models in frequency domain and spatial domain, the result shows that the detection accuracy of the universal steganalysis model based on few-shot learning is slightly below those of SRNet and ZhuNet in spatial domain and is beyond that of existing best steganalysis model in frequency domain under the experimental setup of few-shot learning. The experimental results show that the proposed method based on few-shot learning is efficient and robust for the detection of unknown steganographic algorithms.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006351
    Abstract:
    How to detect sudden events in data streams on social media is a popular research topic in natural language processing. However, current methods for extracting emergencies have problems of low accuracy and low efficiency. In order to solve these problems, this paper proposes an emergency detection method based on the characteristics of word correlation, which can quickly detect emergency events from the social network data stream, so that relevant decision makers can take timely and effective measures to deal with, making the negative impact of emergencies can be reduced as much as possible to maintain social stability. First of all, through noise filtering and emotion filtering, we get microblog texts full of negative emotions. Then, based on the time information, time slice the Weibo data to calculate the word frequency characteristics, user influence and word frequency growth rate characteristics of each word of the data in each time window, and use the burst calculation method to extract the burst word. According to the word2vec model, similar words are merged, and the characteristic similarity of the burst words is used to form a burst word relationship graph. Finally, the multi-attribute spectral clustering algorithm is used to optimally divide the word relationship graph, and pay attention to abnormal words when the time window slides, and to judge the sudden events through the structural changes caused by the sudden changes of the words in the sub-graph. It is known from the experimental results that the emergency event detection method has a better event detection effect in the real-time blog post data stream. Compared with the existing methods, the emergency detection method proposed in this paper can meet the needs of emergency detection. Not only can it detect the detailed information of sub-events, but also the relevant information of events can be accurately detected.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006321
    Abstract:
    The regional network border describes the topological border nodes in cyberspace among countries and regions in the real world. By combining active and passive measurement techniques, this paper proposes a dual-stage method of discovering regional network border nodes——RNB. The first stage is to discover the regional network border's candidate sets by using directed topology measurement and multi-source geolocation; the second stage is to accurately identify border nodes from the candidate sets by using multi-source information weighted geolocation and dual PING geolocation. The experiment took mainland China as the target region and discovered 1,644 border nodes. Compared with the CAIDA data set, our results have 37% of exclusively discovered border nodes with only 2.5% of the measurement cost. The accuracy rate under manual verification is 99.3%, and that under the verification of an ISP operator is 75%.
    Available online:  October 20, 2021 , DOI: 10.13328/j.cnki.jos.006501
    Abstract:
    In order to protect the security of the execution environment of security-sensitive programs in computing devices, researchers have proposed the TEE technology, which provides a secure execution environment for security-sensitive programs that is isolated from the rich computing environment by isolating hardware and software. Side-channel attacks have evolved from traditionally requiring expensive equipment to now inferring confidential information using its access mode obtained basing only on microarchitecture states through software. The TEE architecture only provides an isolation mechanism and cannot resist this type of emerging software side-channel attacks. This paper thoroughly investigates the software side-channel attacks and corresponding countermeasures of the three TEE architectures of ARM TrustZone, Intel SGX and AMD SEV, and discusses the development trend of their attacks and defense mechanisms. First, we introduce the basic principles of ARM TrustZone, Intel SGX and AMD SEV, and elaborate on the definition and classification of software cache side-channel attacks, as well as the practical side-channel attack methods and steps. Second, from the perspective of processor instruction execution, we propose a TEE attack surface classification method, use this method to classify TEE software side-channel attacks, and explain the attacks combining software side-channel attacks and other attacks. Third, we discuss the threat model of TEE software side-channel attacks in detail. Finally, we comprehensively summarize the industry's countermeasures against TEE software side-channel attacks, and discuss some future research trends of TEE software side-channel attacks from two aspects of attack and defense.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006390
    Abstract:
    Human pose estimation is a basic and challenging task in the field of computer vision. It is the basis for many of computer vision tasks, such as action recognition and action detection. With the development of deep learning methods, deep learning-based human pose estimation algorithms have shown excellent results. In this paper, we divide pose estimation methods into three categories, including single person pose estimation, top-down multi-person pose estimation and bottom-up multi-person pose estimation. We introduce the development of 2D human pose estimation algorithms in recent years, and discuss the current challenges of two-dimensional human pose estimation. Finally, we give an outlook for the future development of human pose estimation.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006391
    Abstract:
    Deep reinforcement learning combines the representation ability of deep learning with the decision-making ability of reinforcement learning, which has aroused great research interest due to its remarkable effect in complex control tasks. This paper classifies the model-free deep reinforcement learning methods into Q-value function methods and policy gradient methods by considering whether the Bellman equation is used, and introduce the two kinds of methods from the aspects of model structure, optimization process and evaluation respectively. Toward the low sample efficiency problem in deep reinforcement learning, this paper illustrates that the overestimation problem in Q-value function methods and the unbiased sampling constraint in policy gradient methods are the main factors that affect the sample efficiency according to model structure. Then, from the perspectives of enhancing the exploration efficiency and improving the sample exploitation rate, this paper summarizes various feasible optimization methods according to the recent research hotspots and trends, analyzes advantages together with existing problems of related methods, and compares them according to the scope of application and optimization effect. Finally, this paper proposes to enhance the generality of optimization methods, explore migration of optimization mechanisms between the two kinds of methods and improve theoretical completeness as future research directions.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006395
    Abstract:
    How to utilize multi-source and heterogeneous spatio-temporal data to achieve accurate trajectory prediction as well as reflect the movement characteristics of moving objects is a core issue in the research field of trajectory prediction. Most of the existing trajectory prediction models are used to predict long sequential trajectory patterns according to the characteristics of historical trajectories, or the current locations of moving objects are integrated into spatio-temporal semantic scenarios to predict trajectories based on historical trajectories of moving objects. This survey summarizes the currently commonly-used trajectory prediction models and algorithms, involving different research fields. Firstly, the state-of-the-are works of multiple-motion trajectory prediction and the basic models of trajectory prediction are described. Secondly, the prediction models of different categories are summarized, including mathematical statistics, machine learning, Filtering algorithm, as well as the representative methods in these research fields. Thirdly, the context awareness techniques are introduced, the definition of context awareness by different scholars from different research fields are described, the key technical points of context awareness techniques are presented, such as the different kinds of models on context awareness computing, context acquisition and context reasoning, and the different categories, filtering, storage and fusion of context awareness and their implementation methods are analyzed. The technical roadmap of multiple-motion-pattern trajectory prediction of moving objects with context awareness and the working mechanism of each task is introduced in detail. This survey presents the real-world application scenarios of context awareness techniques, for example, location recommendation, point of interest recommendation. By comparing them with traditional algorithms, the advantages and disadvantages of context awareness techniques in the aforementioned applications are discussed. The new methods for pedestrian trajectory prediction based on context awareness and LSTM(Long Short-Term Memory) techniques are introduced in detail. Lastly, the current problems and future trends of trajectory prediction and context awareness are summarized.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006384
    Abstract:
    In the era of big data, there are more and more application analysis scenarios driven by large-scale data. How to quickly and efficiently extract the information for analysis and decision-making from these massive data brings great challenges to the database system, At the same time, the real-time performance of analysis data in modern business analysis and decision-making requires that the database system can process ACID transactions and complex analysis queries. However, the traditional data partition granularity is too coarse, and can not adapt to the dynamic changes of complex analysis load; the traditional data layout is single, and can not cope with the modern increasing mixed transaction analysis application scenarios In order to solve the above problems, "intelligent data partition and layout" has become one of the current research hotspots. It extracts the effective characteristics of workload through data mining, machine learning and other technologies, and design appropriate partition strategy to avoid scanning a large number of irrelevant data and guide the layout structure design to adapt to different types of workloads. This paper first introduces the background knowledge of data partition and layout techniques, and then elaborates the research motivation, development trend and key technologies of intelligent data partition and layout. Finally, the research prospect of intelligent data partition and layout is summarized and prospected.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006385
    Abstract:
    Spoken language understanding is one of the hot research topics in the field of natural language processing. It is applied in many fields such as personal assistants, intelligent customer service, human-computer dialogue, and medical treatment. Spoken language understanding technology refers to the conversion of natural language input by the user into semantics representation, which mainly includes 2 sub-tasks of intent recognition and slot filling. At this stage, the deep modeling of joint recognition methods for intent recognition and slot filling tasks in spoken language understanding has become mainstream and has achieved good results. Summarizing and analyzing the joint modeling algorithm of deep learning for spoken language learning is of great significance. First, it introduces the related work to the application of deep learning technology to spoken language understanding, and then the existing research work is analyzed from the relationship between intention recognition and slot filling. The experimental results of different models are compared and summarized. Finally, the challenges that future research may face are prospected.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006407
    Abstract:
    Separation logic is an extension of the classical Hoare logic for reasoning about pointers and dynamic data structures, and has been extensively used in the formal analysis and verification of fundamental software, including operating system kernels. Automated constraint solving is one of the key means to automate the separation-logic based verification of these programs. The verification of programs manipulating dynamic data structures usually involves both the shape properties, e.g. singly or doubly linked lists and trees, and data constraints, e.g. sortedness and the invariance of data sets/multisets. This paper introduces COMPSPEN, a separation logic solver capable of simultaneously reasoning about the shape properties and data constraints of linear dynamic data structures, We first introduce the theoretical foundations of COMPSPEN, including the definition of separation logic fragment SLIDdata as well as the decision procedures of the satisfiability and entailment problems of SLIDdata. Then, we present the implementation and the architecture of the COMPSPEN tool. At last, we report the experimental results for COMPSEN. We collected 600 test cases and compared the performance of COMPSPEN against the state-of-the-art separation logic solvers, including ASTERIX, S2S, Songbird, and SPEN. The experimental results showed that COMPSPEN is the only tool capable of solving separation logic formulae involving set data constraints, and in overall, it is able to efficiently solve the satisfiability problem of separation logic formulas involving both shape properties and linear arithmetic data constraints on linear dynamic data structures, and is also capable of solving the entailment problem.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006409
    Abstract:
    With the rapid development of neural network and other technologies, artificial intelligence has been widely applied in safety-critical or mission-critical systems, such as autopilot systems, disease diagnosis systems, and malware detection systems. Due to the lack of a comprehensive and in-depth understanding of artificial intelligence software systems, some errors with serious consequences occur frequently. The functional attributes and non-functional attributes of artificial intelligence software systems are proposed to enhance the adequate understanding and quality assurance of artificial intelligence software systems. After investigation, a large number of researchers are devoted to the study of functional attributes, but people are paying more and more attention to the non-functional attributes of artificial intelligence software systems. This paper investigates 138 papers in related fields, systematically combs the existing research results from the aspects of attribute necessity, attribute definition, attribute examples, and common quality assurance methods, and summarizes the research work on non-functional attributes of artificial intelligence software systems. At the same time, a summary and relationship analysis are presented on the non-functional attributes of artificial intelligence software systems. The open source tools that can be used in the research of artificial intelligence software system are surveyed. Finally, the thoughts on potential future research directions and challenges are summarized on non-functional attributes of artificial intelligence software systems, which, hopefully, will provide references for researchers interested in the related directions.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006410
    Abstract:
    Cyber-physical System (CPS) plays an increasingly important role in social life. The on-demand choreography of CPS resources is based on the software defining of CPS resources. The definition of software interfaces depends on the full description for the capabilities of CPS resources. At present, in the CPS field, there is a lack of a knowledge base that can describe resources and their capabilities, and a lack of an effective way to construct the knowledge base. For the text description of CPS resources, this study proposes to construct the CPS resource capability knowledge graph and designs a bottom-up automatic construction method. Given CPS resources, this method first extracts textual descriptions of the resources' capabilities from code and texts, and generates a normalized expression of capability phrases based on a predefined representation pattern. Then, capability phrases are divided, aggregated and abstracted based on the key components of the verb-object structure to generate the hierarchical abstract description of capabilities for different categories of resources. Finally, the CPS knowledge graph is constructed. Based on the Home Assistant platform, this study constructs a knowledge graph containing 32 resource categories and 957 resource capabilities. In the construction experiment, the results of manual construction and automatic construction using the proposed method are compared and analyzed from different dimensions. Experimental results show that this study provides a feasible method for automatic construction of CPS Resource Capability Knowledge Graph. This method helps to reduce the workload of artificial construction, supplement the description of resource services and capabilities in the CPS field and improves the knowledge completeness.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006411
    Abstract:
    With the vigorous development of areas such as big data and cloud computing, it has become a worldwide trend for the public to attach importance to data security and privacy. Different groups are reluctant to share data in order to protect their own interests and privacy, which leads to data silos. Federated learning enables multiple parties to build a common, robust model without exchanging their data samples, thus addressing critical issues such as data fragmentation and data isolation. However, more and more studies have shown that the federated learning algorithm first proposed by Google can not resist sophisticated privacy attacks. Therefore, how to strengthen privacy protection and protect users’ data privacy in the federated learning scenario is an important issue. This paper offers a systematic survey of existing research achievements of privacy attacks and protection in federated learning in recent years. First, the definition, characteristics and classification of federated learning are introduced. Then the adversarial model of privacy threats in federated learning is analyzed, and typical works of privacy attacks are classified with respect to the adversary’s objectives. Next, several mainstream privacy-preserving technologies are introduced and their advantages and disadvantages in practical applications are pointed out. Futhermore, the existing achievements on protection against privacy attacks are summarized and six privacy-preserving schemes are elaborated. Finally, future challenges of privacy preserving in federated learning are concluded and promising future research directions are discussed.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006415
    Abstract:
    Deep learning has made great achievements in various fields such as computer vision, natural language processing, speech recognition and other fields. Compared with traditional machine learning algorithms, deep models have higher accuracy on many tasks. Because deep learning is an end-to-end, highly non-linear and complex model, the interpretability of deep models is not as good as traditional machine learning algorithms, which brings certain obstacles to the application of deep learning in real life. It is of great significance and necessary to study the interpretability of depth model, and in recent years many scholars have proposed different algorithms on this issue. For image classification tasks, in this article, we divide the interpretability algorithms into global interpretability and local interpretability algorithms. From the perspective of interpretation granularity, global interpretability algorithms are further divided into model-level and neuron-level interpretability algorithms, and local interpretability algorithms are divided into pixel-level features, concept-level features, and image-level feature interpretability algorithms. Based on the above framework, this article mainly summarizes the common deep model interpretability research algorithms and related evaluation indicators, and discusses the current challenges and future research directions for deep model interpretability research. We believe that conducting research on the interpretability and theoretical foundation of deep model is a necessary way to open the black box of the deep model, and interpretability algorithms have huge potential to provide help for solving other problems of deep models, such as fairness and generalization.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006380
    Abstract:
    Being one of the most deployed Payment Channel Networks (PCN), the Lightning Network (LN) has attracted much attention since it was proposed in 2016. The Lightning Network is a Layer-2 technology addressing the scalability problem of Bitcoin. In LN, participants only need to submit Layer-1 transactions on the blockchain to open and close the payment channel, and they can issue multiple transactions off-chain. This working mechanism avoids the waste of time on waiting for every transaction to be verified and simultaneously saves transaction fees. However, as the time of Lightning Network put in practice is rather short, previous works were based on small volume and rapidly-changing data, which lacks necessary time-effectiveness. To fill in the gap and get a comprehensive understanding of the topology of Lightning Network and its evolving trend, in this paper, we characterize both static and dynamic features of LN by leveraging graph analysis based on data of high time-effectiveness updated to July, 2020. We do a clustering analysis of the nodes, and presents some conclusions and insights derived of the clustering results. Moreover, we conduct an additional study of the charging mechanism in LN by comparing the on-chain and off-chain transaction fees.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006381
    Abstract:
    Sparse triangular solver (SpTRSV) is an important computation kernel in scientific computing. The irregular memory access pattern of SpTRSV makes efficient data reuse difficult to achieve. Structured grid problems possess special nonzero patterns. On SW26010 processor, the major building block of Sunway Taihulight supercomputer, these patterns are often exploited during the task partitioning stage to facilitate on-chip reuse of computed unknowns; Software-based routing is usually employed to implement inter-thread communication. Routing incurs overhead and imposes certain restrictions on nonzero patterns. In this paper, we achieve on-chip data reuse without routing. The input problem is partitioned and mapped onto SW26010 such that threads with data dependencies are always connected by the register communication network. This enables direct thread communication and obviates routing. We describe our solver and test it over a variety of problems. In the experiments, our solver sustains an average memory bandwidth utilization of 88.1% with peak efficiency reaching 94.5% (24.5GB/s).
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006312
    Abstract:
    Code review is manual inspection of source code by developers other than the author. In a code review system, software developers submit code changes to fix software defects or add software features. Not all of the code changes will be integrated into the codebase. Some of the code changes will be abandoned. The abandoned code changes could be restored for review, which allows contributors to further improve the code changes. However, it takes more time to review the restored code changes. This paper collects 920,700 restored code changes from four open source projects, investigates reasons for which code changes have been restored, uses thematic analysis method to identify 11 categories of reasons for restoring code changes, and quantitatively analyzes the characteristics of restored code changes. The main findings include 1) among the reasons for which code changes are restored, the reason "improve and update" accounts for the largest proportion; 2) the distribution of reasons across the four projects are different, but the difference is not significant; 3) compared with non-restored code changes, restored code changes have a 10% lower acceptance rate, 1.9 times more comments, and 5.8 times longer review time on average; 4) 81% of restored code changes has been accepted, while 19% has still been abandoned.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006283
    Abstract:
    To reveal parent-child influence relationships between nodes in a diffusion network, most prior work requires knowledge of node infection time, which is possible only by carefully monitoring the diffusion process. In this work, we investigate how to solve this problem by learning from diffusion results, which contain only the final infection statuses of nodes in each diffusion process and are often more easily accessible in practice. A conditional entropy-based method is presented to infer potential candidate parent nodes for each node in the network. Furthermore, we are able to refine the inference results by identifying and pruning the inferred influence relations that are unlikely to exist in reality. Experimental results on both synthetic and real-world networks verify the effectiveness and efficiency of our approach.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006330
    Abstract:
    VM consolidation for cloud data centers is one of the hottest research topics in cloud computing. It is challenging to minimize the energy consumption while ensuring QoS of the hosts in cloud data centers, which is essentially a NP-hard multi-objective optimization problem. In this paper, we propose an Energy Efficient Hybrid Swarm Intelligence VM Consolidation Method (HSI-VMC) for heterogeneous cloud environments to address this problem, which including Peak Efficiency Based Static Threshold Overloaded Hosts Detection Strategy (PEBST), Migration Ratio Based Reallocate-VM Selection Strategy (MRB), Target Host Selection Strategy, Hybrid Discrete Heuristic Differential Evolutionary Particle Swarm Optimization VM Placement Algorithm (HDH-DEPSO) and Load Average Based Underloaded Hosts Processing Strategy (AVG). Specifically, the combination of PEBST, MRB and AVG is able to detect the overloaded and underloaded hosts and selects appropriate VMs for migration to reduce SLAV and VM migrations. Also, HDH-DEPSO combines the advantages of DE and PSO to search for the best VM placement solution, which can reduce cluster's real-time power effectively. A series of experiments based on real cloud environment datasets (PlanetLab, Mix and Gan) show that HSI-VMC can reduce energy consumption sharply with accommodate to multiple QoS metrics, outperforms several existing mainstream energy-aware VM consolidation approaches.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006331
    Abstract:
    Directed grey-box fuzzing measures the effectiveness of seeds for detecting the execution path towards the target. In addition to the closeness between the triggered execution and the target code lines, the ability to explore diversified execution paths is also important to avoid local optimum. Current directed grey-box fuzzing methods measure this capability by coverage counting of the whole program. But only a part of the program is responsible for the calculation of the target state. If the new seed brings target irrelevant state changes, it cannot enhance the queue for state exploration. What is worse, it may distract the concentration of the fuzzer and waste time on exploring target irrelevant code logic. To solve this problem, this paper provides a valid coverage guided directed grey-box fuzzing method. We use static program slicing technique to locate the code region that can affect the target state and detect interesting seeds that bring new differences in coverage of this code region. By enlarging the energy of these seeds and reducing others(adjusting power schedule), the fuzzer can be guided to focus on seeds that can help explore different control flow that target depends and mitigate the interference of redundant seeds. Our experiment on the benchmark provided shows that this strategy brings significant performance improvement for AFLGO.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006313
    [Abstract] (634) [HTML] (0) [PDF 1.76 M] (1174)
    Abstract:
    Knowledge graph is a graph-based structural representation of knowledge. One of the key problems about knowledge graph in both research and practice is how to construct large-scale high-quality knowledge graphs. In this paper, we present an approach to construct knowledge graphs based on Internet-based human collective intelligence. The core of this approach is a continuously executing loop, called the EIF loop or EIFL, consisting of three activities: free exploration, automatic integration and proactive feedback. In free exploration activity, each participant tries to construct an individual knowledge graph alone. In automatic integration activity, all participants' current individual knowledge graphs are integrated in real-time into a collective knowledge graph. In proactive feedback activity, each participant is provided with personalized feedback information from the current collective knowledge graph, in order to improve the participant's efficiency of constructing an individual knowledge graph. In particular, we propose a hierarchical knowledge graph representation mechanism, design a knowledge graph merging algorithm driven by the goal of minimizing the collective knowledge graph's general entropy, and introduce two ways for context-dependent and context-independent information feedback, repectively. In order to investigate the feasibility of the proposed approach, we design and carry out three kinds of experiment: (1) the merging experiment on simulated graphs with structural information only; (2) the merging experiment on real large-scaled knowledge graphs; (3) the construction experiment of knowledge graphs with different number of participants. The experimental results show that: (1) the proposed knowledge graph merging algorithm can find high-quality merging solutions of knowledge graphs by utilizing both structural information of knowledge graphs and semantic information of elements in knowledge graphs; (2) EIFL-based collective collaboration improves both the efficiency of participants in constructing individual knowledge graphs and the scale of the collective knowledge graph merged from individual knowledge graphs, and shows good scalability with respect to the number of participants in knowledge graph construction.
    Available online:  August 02, 2021 , DOI: 10.13328/j.cnki.jos.006383
    Abstract:
    As an important way to access and use web services, REST API provides a technical means for developing and implementing service-oriented architecture-based application systems. However, REST API's design quality varies, so practical and reasonable design guidelines are essential for standardizing and improving REST API design quality. First of all, based on the connotation of REST API, a multi-dimensional, two-layered REST API design guidelines classification framework RADRC (REST API Design Rule Catalog) is established. Twenty-five popular design guidelines are classified based on RADRC. Secondly, a REST API design guideline compliance inspection tool, namely RESTer, is implemented. Finally, RESTer is employed to conduct an empirical study on current REST API design by analyzing nearly 2,000 real-world REST API documents from APIs.guru. RESTer analyzes the documents and extracts REST API design information for characterizing REST API design and inspecting compliance with the design guidelines. The empirical study finds that REST APIs of different application categories vary in resources and operation modes, making different categories REST APIs have the characteristics in terms of design guidelines and overall architecture. The empirical study results help understand the characteristics, status quo and shortcomings of current REST APIs and their adoptions of design guidelines, which is practically significant to improve REST API design quality and design guidelines.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006365
    [Abstract] (1132) [HTML] (0) [PDF 1.02 M] (952)
    Abstract:
    With the in-depth penetration of information technology in various fields, there are many data in the real world. This can help data-driven algorithms in machine learning obtain valuable knowledge. Meanwhile, high-dimension, excessive redundancy, and strong noise are inherent characteristics of these various and complex data. In order to eliminate redundancy, discover data structure, and improve data quality, prototype learning is developed. By finding a prototype set from the target set, we can reduce the data in the sample space, and then improve the efficiency and effectiveness of machine learning algorithms. Its feasibility has been proven in many applications. Thus, the research on prototype learning has been one of the hot and key research topics in the field of machine learning recently. This paper mainly introduces the research background and application value of prototype learning. Meanwhile, it also provides an overview of specialties of various related methods in prototype learning, quality evaluation of prototypes, and typical applications. Then, the research progress of prototype learning with respect to supervision mode and model design is presented. In particular, the former involves unsupervision, semi-supervision, and full supervision mode, and the latter compares four kinds of prototype learning methods based on similarity, determinantal point process, data reconstruction, and low-rank approximation, respectively. Finally, this paper looks forward to the future development of prototype learning.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006367
    Abstract:
    Software bugs are inevitable in the process of software development and maintenance. Software bug reports are an important bug description documents in the software maintenance process. A high-quality software bug report can effectively improve the efficiency of software bug repair. However, due to the existence of many developers, testers, and users interact with the bug tracking system and submit bug reports, the same bug may be reported by different people, resulting in a large number of duplicate software bug reports. Duplicate software bug reports will inevitably increase the workload of manual detection of duplicate bug reports, cause waste of manpower and material resources, and reduce the efficiency of bug repair. This paper systematically analyzes the research work of domestic and foreign scholars in the field of duplicated detection of bug reports in recent years by means of literature research. It mainly analyzes and summarizes the research methods, data set selection, performance evaluation, etc, and puts forward the problems and challenges in the follow-up work in this field, and our suggestions.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006369
    Abstract:
    Code snippets in open-source and enterprise software projects and posted on various software development websites are important software development resources. However, developer needs for code search often reflect high-level intentions and topics, which are difficult to be satisfied through information retrieval based code search techniques. It is thus highly desirable that code snippets can be accompanied with semantic tags reflecting their high-level intentions and topics to facilitate code search and understanding. Existing tag generation technologies are mainly oriented to text content or rely on historical data, and cannot meet the needs of large-scale code semantic annotation and auxiliary code search and understanding. Targeted at the problem, this paper proposes a software knowledge graph based approach (called KGCodeTagger) that automatically generates semenatic tags for code snippets. KGCodeTagger constructs a software knowledge graph based on concepts and relations extracted from API documentations and software development Q&A text and uses the knowledge graph as the basis of code semantic tag generation. Given a code snippet, KGCodeTagger identifies and extracts API invocations and concept mentions, and then links them to the corresponding concepts in the software knowledge graph. On this basis, the approach further identifies other concepts related to the linked concepts as candidates and selects semantic tags from relevant concepts based on the diversity and representativeness. We evaluate the software knowledge graph construction steps of KGCodeTagger and the quality of the generated code tags. The results show that KGCodeTagger can produce high-quality and meaningful software knowledge graph and code semantic tags that can help developers quickly understand the intention of the code.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006372
    Abstract:
    Sampling is a fundamental class of computational problems. The problem of generating random samples from a solution space according to certain probability distribution, has numerous important applications in approximate counting, probability inference, statistical learning, etc. In the Big Data Era, the distributed sampling attracts considerably more attentions. In recent years, there is a line of research works that systematically study the theory of distributed sampling. This paper surveys important results on distributed sampling, including distributed sampling algorithms with theoretically provable guarantees, the computational complexity of sampling in the distributed computing model, and the mutual relation between sampling and inference in the distributed computing model.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006373
    Abstract:
    God class refers to a class that carries heavy tasks and responsibilities. The common feature of god class is that it contains a large number of attributes and methods, and has multiple dependencies with other classes in the system. God Class is a typical code smell, which has a negative impact on the development and maintenance of the software. In recent years, many studies have been devoted to discovering or refactoring the god class; however, the detection ability of existing methods is not strong, and the detection precision is not high enough. This paper proposes a god class detection approach based on graph model and isolation forest algorithm, which can be divided into two stages:the stage of the graph structure information analysis and the stage of intra-class measurement evaluation. In the stage of the graph structure information analysis, the inter-class method call graph and the intra-class structure graph are established respectively, and the isolation forest algorithm is used to reduce the detection range of god class. In the stage of the intra-class measurement evaluation, the impact of the scale and architecture of the project is taken into account, and the average value of the god class related measurement indicators in the project is used as the benchmark. We design an experiment to determine the scale factors, and use the product of the average value and the scale factors as the threshold for the detection to obetain the god class detection result. The experimental results on the code smell standard data set show that the method proposed in this article improves the precision and F1 value by 25.82% and 33.39% respectively compared with the existing god class detection methods, while maintaining a high level of recall at the same time.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006374
    Abstract:
    With the progresses of the open source concept, open source software has become the trend of software development, and the use of open source software is subject to various open source licenses. How open source participants can correctly choose open source software licenses in their development to ensure the efficient and reasonable use of the collaborative results of community groups is still an urgent issue to be solved. To this end, we first analyze and interpret commonly used open source licenses for OSI certification in the paper. Furthermore, with the studies of the license terms and structure, the open source license framework, and compatibility derivation models are deduced. The model is applied to the analysis and interpretation of Mulan Permissive Software License independently developed in China. Finally, based on the above work, a license choosing tool for open source license is developed, which provides references and decision support for open source developers to understand and use licenses.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006375
    [Abstract] (2038) [HTML] (0) [PDF 1.14 M] (2296)
    Abstract:
    Smart contract, one of the most successful applications of blockchain, provides the foundation for realizing various real-world applications of blockchain, playing an essential role in the blockchain ecosystem. However, frequent smart contract security events not only caused huge economic losses but also destroyed the blockchain-based credit system. The security and reliability of smart contract thus gain wide attention from researchers worldwide. In this paper, we first introduce the common types and typical cases of smart contract vulnerabilities from three levels, i.e., Solidity code layer, EVM execution layer, and blockchain system layer. Then, we review the research progress of smart contract vulnerability detection and classify existing efforts into five categories, namely formal verification, symbolic execution, fuzzing testing, intermediate representation, and deep learning. We compare the detectable vulnerability types, accuracy, and time consumption of existing vulnerability detection methods in detail as well as their limitations and improvements. Finally, based on the summary of existing researches, we discuss the challenges in the field of smart contract vulnerability detection and combine with the deep learning technology to look forward to future research directions.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006355
    Abstract:
    The study of code naturalness is one of the common research hotspots in the field of natural language processing and software engineering, aiming to solve various software engineering tasks by building a code naturalness model based on natural language processing techniques. In recent years, as the size of source code and data in the open source software community continues to grow, more and more researchers are focusing on the information contained in the source code, and a series of research results have been achieved. However, at the same time, code naturalness research faces many challenges in code corpus construction, model building, and task application. In view of this, this paper reviews and summarizes the progress of code naturalness research and application in recent years in terms of code corpus construction, model construction and task application. The main contents include:(1) Introducing the basic concept of code naturalness and its research overview. (2) The current corpus of code naturalness research is summarized, and the modeling methods for code naturalness are classified and summarized. (3) Summarizes the experimental validation methods and model evaluation metrics of code naturalness models. (4) Summarize and categorize the current application status of code naturalness. (5) Summarize the key issues of code naturalness techniques. (6) Prospects the future development of code naturalness techniques.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006356
    Abstract:
    The use of the DataFlow model integrates the batch processing and stream processing of big data computing. However, the existing cluster resource scheduling frameworks for big data computing are either oriented to stream processing or batch processing, which are not suitable for batch processing and stream processing jobs to share cluster resources. In addition, when GPUs are used for big data analysis and calculations, resource usage efficiency is reduced due to the lack of effective CPU-GPU resource decoupling methods. Based on the analysis of existing cluster scheduling frameworks, a hybrid resource scheduling framework called HRM is designed and implemented that can perceive batch/stream processing applications. Based on a shared state architecture, HRM uses a combination of optimistic blocking protocols and pessimistic blocking protocols to ensure different resource requirements for stream processing jobs and batch processing jobs. On computing nodes, it provides flexible binding of CPU-GPU resources, and adopts queue stacking technology, which not only meets the real-time requirements of stream processing jobs, but also reduces feedback delays and realizes the sharing of GPU resources. By simulating the scheduling of large-scale jobs, the scheduling delay of HRM is only about 75% of the centralized scheduling framework; by using actual load testing, the CPU resource utilization is increased by more than 25% when batch processing and stream processing share clusters; by using the fine-grained job scheduling method, not only the GPU utilization rate is increased by more than 2 times, the job completion time can also be reduced by about 50%.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006362
    Abstract:
    Data-intensive tasks include a large number of tasks. Using GPU devices to improve the performance of tasks is the main method currently. However, in the case of solving the fair sharing of GPU resources between data-intensive tasks and reducing the cost of data network transmission, the existing research methods do not comprehensively consider the contradiction between resource fairness and data transmission costs. The paper analyzes the characteristics of GPU cluster resource scheduling, and proposes an algorithm based on the minimum cost and the maximum number of tasks in GPU cluster resource scheduling. The method can solve the contradiction between the fair allocation of GPU resources and the high cost of data transmission. The scheduling process is divided into two stages. In the first stage, each job gives its own optimal plan according to the data transmission costs, and in the second stage, the resource allocator merges the plan of each job. Firstly, the paper gives the overall structure of the framework, and the source allocator works globally after each job giving its own optimal plan. Secondly, the network bandwidth estimation strategy and the method of computing the data transmission cost of the task are given. Thirdly, the basic algorithm for the fair allocation of resources based on the number of GPUs is given. Fourthly, the scheduling algorithm with the smallest cost and the largest number of tasks is proposed, which describing the implementation strategies of resource non-grabbing, robbing and resource fairness strategies. Finally, six data-intensive computing tasks are designed, and the algorithm proposed in the paper is tested, and the experiments verifies the scheduling algorithm can achieve about 90% of resource fairness, while also ensuring that the parallel operation time of jobs is minimized.
    Available online:  May 21, 2021 , DOI: 10.13328/j.cnki.jos.006359
    Abstract:
    In recent years, traditional HDDs' areal density will stop increasing. To extend the capacity of disk drives, several new storage techniques were proposed, including Shingled Magnetic Recording (SMR), which is the first one to reach market during those newtechnologies. However, the shingled track structure of SMR disks will encounter serious write amplification and declining performance when processing random write requests. Furthermore, constructing RAID5 based on SMR drives worsens the write amplification (WA) because the parity updating of RAID5 is very frequent to produce many random writes. In this paper, for current SMR disks' structure, we find that the first track of each band can be overwritten without impacting other tracks, because the wide write head can be moved a bit to cover both the first track and the guard region. In other words, the first track of each band can be called the free track, because it can be overwritten freely without causing any write amplification. Therefore, we propose a new Free-Track-based RAID system (FT-RAID) based on SMR drives, to fully develop the potentials of the overwriting-free region in SMR disk drives. FT-RAID is consisted of two key techniques, i.e., FT-Mapping and FT-Buffer. FT-Mapping is an SMR-friendly data mapping manner in RAID, which maps the frequently updated parity blocks to the free tracks; FT-Buffer adopts an SMR-friendly two-layer cache structures, in which the upper level can support in-place updating for hot blocks and the lower level can supply higher capacity for the write buffer. Both of them are designed to mitigate the degradation of performance by reducing SMR WA, leading to an 80.4% lower WA ratio than CMR-based RAID5 based on practical enterprise I/O workloads.
    Available online:  April 21, 2021 , DOI: 10.13328/j.cnki.jos.006339
    Abstract:
    Bug location based on information retrieval(IR) uses cross language semantic similarity to construct a retrieval model to locate source code errors through bug report. However, the traditional method of bug location based on information retrieval treats the code as pure text and only uses the lexical semantic information of source code, which leads to the problem of low accuracy caused by the lack of candidate code semantics in fine-grained bug location, and the usefulness of the results needs to be improved. By analyzing the relationship between code change and bug generation in the scenario of program evolution, this paper proposes a fine-grained bug location method based on source code extension information, the explicit semantic information of code vocabulary and implicit information of code execution are used to enrich source code semantics to realize fine-grained bug location. Based on the location candidate points, the semantic context is used to enrich the code quantity, and the structural semantics of code execution intermediate language is used to realize fine-grained code distinguishability. Meanwhile, natural language semantics is used to guide the generation of code language representation based on attention mechanism, the semantic mapping between fine-grained code and natural language is implemented to implement fine-grained bug location method FlowLocator. The experimental results show that compared with the classical IR bug location method, the location accuracy of this method is significantly improved in the Top-N rank, Mean Average Precision(MAP) and Mean Reciprocal Rank(MRR).
    Available online:  April 21, 2021 , DOI: 10.13328/j.cnki.jos.006340
    Abstract:
    Recent research on multi-turn dialogue generation has focused on RNN or Transformer-based encoder-decoder architecture. However, most of these models ignore the influence of dialogue structure on dialogue generation. To solve this problem, this paper proposes to use graph neural network structure to model the dialogue structure information, thus effectively describing the complex logic within a dialogue. We propose text-based similarity relation structure, turn-switching-based relation structure, and speaker-based relation structure for dialogue generation, and employ graph neural network to realize information transmission and iteration in dialogue context. Extensive experiments on the DailyDialog dataset show that the proposed model consistently outperforms other baseline models in many indexes, which indicates that our proposed model with graph neural network can effectively describe various correlation structures in dialogue, thus contributing to the high-quality dialogue response generation.
    Available online:  April 21, 2021 , DOI: 10.13328/j.cnki.jos.006324
    Abstract:
    In the field of software engineering, code completion is one of the most useful technologies in the integrated development environment (IDE). It improves the efficiency of software development and becomes an important technology to accelerate the development of modern software. Prediction of class names, method names, keywords and so on, through code completion technology, to a certain extent, improves code specifications and reduces the work intensity of programmers. In recent years, artificial intelligenceIn general, Smart Code Completion uses the source code training network to learn code characteristics from the corpus, and makes recommendations and predictions based on the context code characteristics of the locations to be completed. Most of the existing code feature representations are based on program grammar and do not reflect the semantic information of the program.The network structure currently used is still not capable of solving long-distance dependency problems when facing long code sequences. Therefore, this paper presents a method to characterize codes based on program control dependency and grammar information, and considers code completion as an abstract grammar tree (AST) node prediction problem based on time convolution network (TCN), which enables network models toThis method has been proven to be about 2.8% more accurate than existing methods.
    Available online:  April 21, 2021 , DOI: 10.13328/j.cnki.jos.006284
    Abstract:
    Entity resolution is a key aspect of data integration, and also is a necessary preprocessing step of big data analytics and mining. In big data era, more and more query-driven data analytics applications come out, and query-based entity resolution becomes a hot topic. We study multi-attribute data indexing technology for entity cache in order to promote query-resolution efficiency. There are two core problems:1) How to design the multi-attribute index. We design an R-tree based multi-attribute index. Entity cache is produced online, so we propose an online index construction method based on spatial clustering. We propose a filter-verify based multi-dimensional query method:it filters impossible records by the multi-attribute index, and then verifies each candidate record with similarity functions or distance functions. 2) How to insert different string attributes into the tree index. The basic solution is mapping strings into integer spaces. For Jaccard similarity and edit similarity, we propose a q-gram based mapping method, and improve it by vector dimension reduction and z-order, which achieves high mapping qualities. Finally, the proposed hybrid index is experimentally evaluated on two datasets:its effectiveness is validated, and moreover, different aspects of the multi-attribute index are also tested.
    Available online:  April 21, 2021 , DOI: 10.13328/j.cnki.jos.006290
    Abstract:
    Adaptive image steganography has been becoming a hot topic, as it conceals covert information within the texture region of an image by employing a defined distortion function, which guarantees remarkable security. In spatial gray-scale image steganography, the research on automatically generating steganographic distortion using the generative adversarial network has achieved a significant breakthrough recently. However, to the best of our knowledge, there are not related works in spatial color image steganography. Compared with the gray-scale image, color image steganography should preserve the channel correlation and reasonably assign the embedding capacity among RGB channels simultaneously. This paper first proposes a framework based on generative adversarial network to automatically learn to generate the steganographic distortion for spatial color image, which is termed as CIS-GAN (color image steganography based on generative adversarial network). The generator is composed of two U-Net subnetworks, one of two subnetworks translates a cover image into a modification probability map which is the sum of positive/negative modification probability, while another one learns the proportion of positive modification probability. The structure of the designed generator can effectively preserve RGB channels correlation, so as to enhance the steganography security. Also, the generator can automatically learn to allocate the embedding capacity for three channels via controlling the total steganographic capacity in generator's loss function and alternately training the discriminator. The experimental results show that our proposed framework outperforms the advanced spatial color image steganographic schemes in resisting the color image steganalysis.
    Available online:  April 21, 2021 , DOI: 10.13328/j.cnki.jos.006296
    Abstract:
    In the multi-stage secret sharing scheme, the participants of authorized sets in each level of access structures can jointly reconstruct the corresponding secret. But in reality, adversaries who corrupted an unauthorized set can obtain some or even all of the share information of the uncorrupted participants through memory attacks, thereby illegally obtaining some or even all of the shared secrets. Facing with such memory leaks, the existing multi-stage secret sharing schemes are no longer secure. Based on this, this paper firstly presents a formal computational security model of indistinguishability against chosen secret attack for multi-stage secret sharing. Then, using the combination of the physical unclonable function and the fuzzy extractor, a verifiable memory leakage-resistant multi-stage secret sharing scheme for general access structures is constructed based on the minimal linear codes. Furthermore, in the presence of a memory attacker, we prove that the scheme is computational secure in the random oracle model. Finally, we compare and analyze the proposed scheme with the existing schemes in terms of their properties and computational complexity.
    Available online:  February 07, 2021 , DOI: 10.13328/j.cnki.jos.006305
    Abstract:
    Boundaries identification of Chinese named entities is a difficult problem because of no separator between Chinese texts. Futhermore, the lack of well-marked NER data make Chinese NER tasks more challenging in vertical domains, such as clinical domain and financial domain. To address aforementioned issues, this paper proposes a novel cross-domain Chinese NER model by dynamically Transferring Entity Span information (TES-NER). The cross-domain shared entity span information is transferred from the general domain (source domain) with sufficient corpus to the Chinese named entity recognition model on the vertical domain (target domain) through a dynamic fusion layer based on the gate mechanism, where the entity span information is used to represent the scope of the Chinese named entities. Specifically, TES-NER first introduces a cross-domain shared entity span recognition module based on a BILSTM layer and a fully connected neural network (FCN) which are used to identify the cross-domain shared entity span information to determine the boundaries of the Chinese named entities. Then, a Chinese named entity recognition module is constructed to identify the domain-specific Chinese named entities by applying independent bidirectional long short-term memory with conditional random field models (BILSTM-CRF). Finally, a dynamic fusion layer is designed to dynamically determine the amount of the cross-domain shared entity span information extracted from the entity span recognition module, which is used to transfer the knowledge to the domain-specific named entity recognition model through the gate mechanism. This paper sets the general domain (source domain) dataset as the news domain dataset (MSRA) with sufficient labeled corpus, while the vertical domain (target domain) datasets are composed of three datasets:mixed domain (ontonotes5.0), financial domain (resume) and medical domain (ccks2017). Among them, the mixed domain dataset (ontonotes5.0) is a corpus integrating six different vertical domains. The F1 values of the model proposed in this paper are 2.18%, 1.68%, and 0.99% higher than the bidirectional long short-term memory with conditional random field model (BILSTM-CRF), respectively.
    Available online:  January 15, 2021 , DOI: 10.13328/j.cnki.jos.006299
    [Abstract] (1336) [HTML] (0) [PDF 1.34 M] (1451)
    Abstract:
    Free-hand sketches have always been one of the important tools for human communication. As it can express some complex human thoughts quickly in a succinct form, the study of free-hand sketches has always been one of the research hotspots in the field of computer vision. Currently, the research on free-hand sketches mainly focuses on the recognition, retrieval and completion. As researchers focus on the fine-grained operation of free-hand sketches, research on free-hand sketch segmentation has also received more and more attention. In recent years, with the development of deep learning and computer vision technology, a large number of free-hand sketch segmentation methods based on deep learning have been proposed. Moreover, the accuracy and efficiency of free-hand sketch segmentation have also been significantly increased. However, free-hand sketch segmentation is still a very challenging topic because of the abstraction, sparsity and diversity of free-hand sketches. At present, there are few Chinese reviews on hand-drawn sketch segmentation. This paper organizes, categorizes, analyzes and summarizes the free-hand sketch segmentation algorithm based on deep learning to solve the above deficiency. Firstly, Show three basic sketch representation methods and commonly used sketch segmentation datasets. According to the sketch segmentation algorithm prediction results, introduce sketch semantic segmentation, sketch perceptual grouping and sketch parsing respectively. Moreover, Collect and arrange the evaluation results of sketch segmentation on the primary data sets. Finally, summarize the application of sketch segmentation and discuss the possible future development direction.
    Available online:  October 18, 2017 , DOI:
    [Abstract] (2458) [HTML] (0) [PDF 525.21 K] (3784)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,315-325页. 原文链接如下:https://doi.org/10.1145/3106237.3106242, 读者如需引用该文请标引原文出处。
    Available online:  October 18, 2017 , DOI:
    [Abstract] (2462) [HTML] (0) [PDF 352.38 K] (5002)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,303-314页. 原文链接如下:https://doi.org/10.1145/3106237.3106239, 读者如需引用该文请标引原文出处。
    Available online:  September 11, 2017 , DOI:
    [Abstract] (2850) [HTML] (0) [PDF 276.42 K] (1978)
    Abstract:
    GitHub, a popular social-software-development platform, has fostered a variety of software ecosystems where projects depend on one another and practitioners interact with each other. Projects within an ecosystem often have complex inter-dependencies that impose new challenges in bug reporting and fixing. In this paper, we conduct an empirical study on cross-project correlated bugs, i.e., causally related bugs reported to different projects, focusing on two aspects: 1) how developers track the root causes across projects; and 2) how the downstream developers coordinate to deal with upstream bugs. Through manual inspection of bug reports collected from the scientific Python ecosystem and an online survey with developers, this study reveals the common practices of developers and the various factors in fixing cross-project bugs. These findings provide implications for future software bug analysis in the scope of ecosystem, as well as shed light on the requirements of issue trackers for such bugs.
    Available online:  June 21, 2017 , DOI:
    [Abstract] (2927) [HTML] (0) [PDF 169.43 K] (2118)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在IEEE Transactions on Software Engineering 2017 已录用待发表. 原文链接如下:http://ieeexplore.ieee.org/document/7792694, 读者如需引用该文请标引原文出处。
    Available online:  June 13, 2017 , DOI:
    [Abstract] (4138) [HTML] (0) [PDF 174.91 K] (2571)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 39th International Conference on Software Engineering, Pages 27-37, Buenos Aires, Argentina — May 20 - 28, 2017, IEEE Press Piscataway, NJ, USA ?2017, ISBN: 978-1-5386-3868-2 原文链接如下:http://dl.acm.org/citation.cfm?id=3097373, 读者如需引用该文请标引原文出处。
    Available online:  January 25, 2017 , DOI:
    [Abstract] (3058) [HTML] (0) [PDF 254.98 K] (1914)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 871-882. DOI: https://doi.org/10.1145/2950290.2950364 原文链接如下:http://dl.acm.org/citation.cfm?id=2950364, 读者如需引用该文请标引原文出处。
    Available online:  January 18, 2017 , DOI:
    [Abstract] (3422) [HTML] (0) [PDF 472.29 K] (1868)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Pages 133—143, Seattle WA, USA, November 2016. 原文链接如下:http://dl.acm.org/citation.cfm?id=2950327, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017 , DOI:
    [Abstract] (3257) [HTML] (0) [PDF 293.93 K] (1716)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE'16), 810 – 821, November 13 - 18, 2016. 原文链接如下:https://doi.org/10.1145/2950290.2950310, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017 , DOI:
    [Abstract] (3590) [HTML] (0) [PDF 244.61 K] (1961)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE 2016, 原文链接如下:http://dl.acm.org/citation.cfm?doid=2950290.2950313, 读者如需引用该文请标引原文出处。
    Available online:  December 12, 2016 , DOI:
    [Abstract] (3093) [HTML] (0) [PDF 358.69 K] (1999)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE'16会议上Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 原文链接如下:http://dl.acm.org/citation.cfm?id=2950340, 读者如需引用该文请标引原文出处。
    Available online:  September 30, 2016 , DOI:
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在ASE2016会议上http://ase2016.org/ 原文链接如下:http://dl.acm.org/citation.cfm?id=2970366 读者如需引用该文请标引原文出处。
    Available online:  September 09, 2016 , DOI:
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 俊杰的文章发表在ASE2016会议上,http://ase2016.org/。 原文链接如下:http://dl.acm.org/citation.cfm?doid=2970276.2970300 请读者标引时请引注原文出处。
    Available online:  September 07, 2016 , DOI:
    Abstract:
    CCF 软件工程专业委员会白晓颖教授(清华大学)推荐。 原文发表在 ASE 2016 Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering。 全文链接:http://dx.doi.org/10.1145/2970276.2970307。 重要提示:读者如引用该文时请标注原文出处。
    Available online:  August 29, 2016 , DOI:
    Abstract:
    CCF软件工程专业委员会白晓颖教授(清华大学)推荐。 该论文发表在ACM Transactions on Software Engineering and Methodology (TOSEM, Vol. 25, No. 2, Article 13, May 2016),被ICSE 2016主会邀请为“Journal first”报告, 全文参见http://dl.acm.org/citation.cfm?id=2876443。 论文作者包括北京大学的周明辉,马秀娟,张路和梅宏,以及田纳西大学的Audris Mockus。 重要提示:读者如引用该文时请标注原文出处。
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291, DOI:
    [Abstract] (35623) [HTML] (0) [PDF 832.28 K] (75397)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2010,21(3):427-437, DOI:
    [Abstract] (31201) [HTML] (0) [PDF 308.76 K] (35062)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2011,22(1):71-83, DOI:10.3724/SP.J.1001.2011.03958
    [Abstract] (28183) [HTML] (0) [PDF 781.42 K] (49525)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2016,27(1):45-71, DOI:10.13328/j.cnki.jos.004914
    [Abstract] (27532) [HTML] (497) [PDF 880.96 K] (26895)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2009,20(5):1337-1348, DOI:
    [Abstract] (26632) [HTML] (0) [PDF 1.06 M] (41368)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2008,19(1):48-61, DOI:
    [Abstract] (26141) [HTML] (0) [PDF 671.39 K] (57336)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2009,20(2):271-289, DOI:
    [Abstract] (25697) [HTML] (0) [PDF 675.56 K] (39313)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2005,16(1):1-7, DOI:
    [Abstract] (20803) [HTML] (0) [PDF 614.61 K] (17638)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2004,15(3):428-442, DOI:
    [Abstract] (19703) [HTML] (0) [PDF 1009.57 K] (13842)
    Abstract:
    With the rapid development of e-business, web applications based on the Web are developed from localization to globalization, from B2C(business-to-customer) to B2B(business-to-business), from centralized fashion to decentralized fashion. Web service is a new application model for decentralized computing, and it is also an effective mechanism for the data and service integration on the web. Thus, web service has become a solution to e-business. It is important and necessary to carry out the research on the new architecture of web services, on the combinations with other good techniques, and on the integration of services. In this paper, a survey presents on various aspects of the research of web services from the basic concepts to the principal research problems and the underlying techniques, including data integration in web services, web service composition, semantic web service, web service discovery, web service security, the solution to web services in the P2P (Peer-to-Peer) computing environment, and the grid service, etc. This paper also presents a summary of the current art of the state of these techniques, a discussion on the future research topics, and the challenges of the web services.
    2005,16(5):857-868, DOI:
    [Abstract] (19038) [HTML] (0) [PDF 489.65 K] (26719)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2010,21(8):1834-1848, DOI:
    [Abstract] (18427) [HTML] (0) [PDF 682.96 K] (50530)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2009,20(1):54-66, DOI:
    [Abstract] (18039) [HTML] (0) [PDF 1.41 M] (46239)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2012,23(1):32-45, DOI:10.3724/SP.J.1001.2012.04091
    [Abstract] (17610) [HTML] (0) [PDF 408.86 K] (27330)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2012,23(4):962-986, DOI:10.3724/SP.J.1001.2012.04175
    [Abstract] (17445) [HTML] (0) [PDF 2.09 M] (27752)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2009,20(3):524-545, DOI:
    [Abstract] (16628) [HTML] (0) [PDF 1.09 M] (19148)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2009,20(1):124-137, DOI:
    [Abstract] (15653) [HTML] (0) [PDF 1.06 M] (19524)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2009,20(11):2965-2976, DOI:
    [Abstract] (15635) [HTML] (0) [PDF 442.42 K] (11222)
    Abstract:
    This paper studies uncertain graph data mining and especially investigates the problem of mining frequent subgraph patterns from uncertain graph data. A data model is introduced for representing uncertainties in graphs, and an expected support is employed to evaluate the significance of subgraph patterns. By using the apriori property of expected support, a depth-first search-based mining algorithm is proposed with an efficient method for computing expected supports and a technique for pruning search space, which reduces the number of subgraph isomorphism testings needed by computing expected support from the exponential scale to the linear scale. Experimental results show that the proposed algorithm is 3 to 5 orders of magnitude faster than a na?ve depth-first search algorithm, and is efficient and scalable.
    2004,15(8):1208-1219, DOI:
    [Abstract] (15589) [HTML] (0) [PDF 948.49 K] (11280)
    Abstract:
    With the explosive growth of network applications and complexity, the threat of Internet worms against network security becomes increasingly serious. Especially under the environment of Internet, the variety of the propagation ways and the complexity of the application environment result in worm with much higher frequency of outbreak, much deeper latency and more wider coverage, and Internet worms have been a primary issue faced by malicious code researchers. In this paper, the concept and research situation of Internet worms, exploration function component and execution mechanism are first presented, then the scanning strategies and propagation model are discussed, and finally the critical techniques of Internet worm prevention are given. Some major problems and research trends in this area are also addressed.
    2009,20(5):1226-1240, DOI:
    [Abstract] (15338) [HTML] (0) [PDF 926.82 K] (13709)
    Abstract:
    This paper introduces the concrete details of combining the automated reasoning techniques with planning methods, which includes planning as satisfiability using propositional logic, Conformant planning using modal logic and disjunctive reasoning, planning as nonmonotonic logic, and Flexible planning as fuzzy description logic. After considering experimental results of International Planning Competition and relevant papers, it concludes that planning methods based on automated reasoning techniques is helpful and can be adopted. It also proposes the challenges and possible hotspots.
    2003,14(10):1717-1727, DOI:
    [Abstract] (15292) [HTML] (0) [PDF 839.25 K] (11839)
    Abstract:
    Sensor networks are integration of sensor techniques, nested computation techniques, distributed computation techniques and wireless communication techniques. They can be used for testing, sensing, collecting and processing information of monitored objects and transferring the processed information to users. Sensor network is a new research area of computer science and technology and has a wide application future. Both academia and industries are very interested in it. The concepts and characteristics of the sensor networks and the data in the networks are introduced, and the issues of the sensor networks and the data management of sensor networks are discussed. The advance of the research on sensor networks and the data management of sensor networks are also presented.
    2009,20(2):350-362, DOI:
    [Abstract] (15017) [HTML] (0) [PDF 1.39 M] (36154)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2015,26(1):62-81, DOI:10.13328/j.cnki.jos.004701
    [Abstract] (14733) [HTML] (411) [PDF 1.04 M] (21833)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2014,25(4):839-862, DOI:10.13328/j.cnki.jos.004558
    [Abstract] (14583) [HTML] (522) [PDF 1.32 M] (15949)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2009,20(10):2729-2743, DOI:
    [Abstract] (13752) [HTML] (0) [PDF 1.12 M] (9096)
    Abstract:
    In a multi-hop wireless sensor network (WSN), the sensors closest to the sink tend to deplete their energy faster than other sensors, which is known as an energy hole around the sink. No more data can be delivered to the sink after an energy hole appears, while a considerable amount of energy is wasted and the network lifetime ends prematurely. This paper investigates the energy hole problem, and based on the improved corona model with levels, it concludes that the assignment of transmission ranges of nodes in different coronas is an effective approach for achieving energy-efficient network. It proves that the optimal transmission ranges for all areas is a multi-objective optimization problem (MOP), which is NP hard. The paper proposes an ACO (ant colony optimization)-based distributed algorithm to prolong the network lifetime, which can help nodes in different areas to adaptively find approximate optimal transmission range based on the node distribution. Furthermore, the simulation results indicate that the network lifetime under this solution approximates to that using the optimal list. Compared with existing algorithms, this ACO-based algorithm can not only make the network lifetime be extended more than two times longer, but also have good performance in the non-uniform node distribution.
    2012,23(5):1148-1166, DOI:10.3724/SP.J.1001.2012.04195
    [Abstract] (13482) [HTML] (0) [PDF 946.37 K] (14899)
    Abstract:
    With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here, the features, influence and related products of cloud databases are first discussed. Then, research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.
    2000,11(11):1460-1466, DOI:
    [Abstract] (13328) [HTML] (0) [PDF 520.69 K] (9240)
    Abstract:
    Intrusion detection is a highlighted topic of network security research in recent years. In this paper, first the necessity o f intrusion detection is presented, and its concepts and models are described. T hen, many intrusion detection techniques and architectures are summarized. Final ly, the existing problems and the future direction in this field are discussed.
    2012,23(1):1-20, DOI:10.3724/SP.J.1001.2012.04100
    [Abstract] (13304) [HTML] (0) [PDF 1017.73 K] (27115)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2008,19(zk):112-120, DOI:
    [Abstract] (13088) [HTML] (0) [PDF 594.29 K] (12701)
    Abstract:
    An ad hoc network is a collection of wireless mobile nodes dynamically forming a temporary network without the use of any existing network infrastructure or centralized administration. Due to bandwidth constraint and dynamic topology of mobile ad hoc networks, multipath supported routing is a very important research issue. In this paper, we present an entropy-based metric to support stability multipath on-demand routing (SMDR). The key idea of SMDR protocol is to construct the new metric-entropy and select the stability multipath with the help of entropy metric to reduce the number of route reconstruction so as to provide QoS guarantee in the ad hoc network whose topology changes continuously. Simulation results show that, with the proposed multipath routing protocol, packet delivery ratio, end-to-end delay, and routing overhead ratio can be improved in most of cases. It is an available approach to multipath routing decision.
    2004,15(4):571-583, DOI:
    [Abstract] (13063) [HTML] (0) [PDF 1005.17 K] (8043)
    Abstract:
    For most peer-to-peer file-swapping applications, sharing is a volunteer action, and peers are not responsible for their irresponsible bartering history. This situation indicates the trust between participants can not be set up simply on the traditional trust mechanism. A reasonable trust construction approach comes from the social network analysis, in which trust relations between individuals are set up upon recommendations of other individuals. Current p2p trust model could not promise the convergence of iteration for trust computation, and takes no consideration for model security problems, such as sybil attack and slandering. This paper presents a novel recommendation-based global trust model and gives a distributed implementation method. Mathematic analyses and simulations show that, compared to the current global trust model, the proposed model is more robust on trust security problems and more complete on iteration for computing peer trust.
    2013,24(8):1786-1803, DOI:10.3724/SP.J.1001.2013.04416
    [Abstract] (13002) [HTML] (0) [PDF 1.04 M] (13967)
    Abstract:
    Many specific application oriented NoSQL database systems are developed for satisfying the new requirement of big data management. This paper surveys researches on typical NoSQL database based on key-value data model. First, the characteristics of big data, and the key technique issues supporting big data management are introduced. Then frontier efforts and research challenges are given, including system architecture, data model, access mode, index, transaction, system elasticity, load balance, replica strategy, data consistency, flash cache, MapReduce based data process and new generation data management system etc. Finally, research prospects are given.
    2011,22(1):115-131, DOI:10.3724/SP.J.1001.2011.03950
    [Abstract] (12787) [HTML] (0) [PDF 845.91 K] (25028)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2006,17(7):1588-1600, DOI:
    [Abstract] (12780) [HTML] (0) [PDF 808.73 K] (12338)
    Abstract:
    Routing technology at the network layer is pivotal in the architecture of wireless sensor networks. As an active branch of routing technology, cluster-based routing protocols excel in network topology management, energy minimization, data aggregation and so on. In this paper, cluster-based routing mechanisms for wireless sensor networks are analyzed. Cluster head selection, cluster formation and data transmission are three key techniques in cluster-based routing protocols. As viewed from the three techniques, recent representative cluster-based routing protocols are presented, and their characteristics and application areas are compared. Finally, the future research issues in this area are pointed out.
    2009,20(1):11-29, DOI:
    [Abstract] (12765) [HTML] (0) [PDF 787.30 K] (11818)
    Abstract:
    Constrained optimization problems (COPs) are mathematical programming problems frequently encountered in the disciplines of science and engineering application. Solving COPs has become an important research area of evolutionary computation in recent years. In this paper, the state-of-the-art of constrained optimization evolutionary algorithms (COEAs) is surveyed from two basic aspects of COEAs (i.e., constraint-handling techniques and evolutionary algorithms). In addition, this paper discusses some important issues of COEAs. More specifically, several typical algorithms are analyzed in detail. Based on the analyses, it concluded that to obtain competitive results, a proper constraint-handling technique needs to be considered in conjunction with an appropriate search algorithm. Finally, the open research issues in this field are also pointed out.
    2002,13(7):1228-1237, DOI:
    [Abstract] (12727) [HTML] (0) [PDF 500.04 K] (11693)
    Abstract:
    Software architecture (SA) is emerging as one of the primary research areas in software engineering recently and one of the key technologies to the development of large-scale software-intensive system and software product line system. The history and the major direction of SA are summarized, and the concept of SA is brought up based on analyzing and comparing the several classical definitions about SA. Based on summing up the activities about SA, two categories of study about SA are extracted out, and the advancements of researches on SA are subsequently introduced from seven aspects.Additionally,some disadvantages of study on SA are discussed,and the causes are explained at the same.Finally,it is concluded with some singificantly promising tendency about research on SA.
    2015,26(1):26-39, DOI:10.13328/j.cnki.jos.004631
    [Abstract] (12568) [HTML] (341) [PDF 763.52 K] (11454)
    Abstract:
    In recent years, transfer learning has provoked vast amount of attention and research. Transfer learning is a new machine learning method that applies the knowledge from related but different domains to target domains. It relaxes the two basic assumptions in traditional machine learning: (1) the training (also referred as source domain) and test data (also referred target domain) follow the independent and identically distributed (i.i.d.) condition; (2) there are enough labeled samples to learn a good classification model, aiming to solve the problems that there are few or even not any labeled data in target domains. This paper surveys the research progress of transfer learning and introduces its own works, especially the ones in building transfer learning models by applying generative model on the concept level. Finally, the paper introduces the applications of transfer learning, such as text classification and collaborative filtering, and further suggests the future research direction of transfer learning.
    2013,24(1):50-66, DOI:10.3724/SP.J.1001.2013.04276
    [Abstract] (12505) [HTML] (0) [PDF 0.00 Byte] (14352)
    Abstract:
    As an important application of acceleration in the cloud, the distributed caching technology has received considerable attention in industry and academia. This paper starts with a discussion on the combination of cloud computing and distributed caching technology, giving an analysis of its characteristics, typical application scenarios, stages of development, standards, and several key elements, which have promoted its development. In order to systematically know the state of art progress and weak points of the distributed caching technology, the paper builds a multi-dimensional framework, DctAF. This framework is constituted of 6 dimensions through analyzing the characteristics of cloud computing and boundary of the caching techniques. Based on DctAF, current techniques have been analyzed and summarized; comparisons among several influential products have also been made. Finally, the paper describes and highlights the several challenges that the cache system faces and examines the current research through in-depth analysis and comparison.
    2008,19(8):1902-1919, DOI:
    [Abstract] (12468) [HTML] (0) [PDF 521.73 K] (11510)
    Abstract:
    Visual language techniques have exhibited more advantages in describing various software artifacts than one-dimensional textual languages during software development, ranging from the requirement analysis and design to testing and maintenance, as diagrammatic and graphical notations have been well applied in modeling system. In addition to an intuitive appearance, graph grammars provide a well-established foundation for defining visual languages with the power of precise modeling and verification on computers. This paper discusses the issues and techniques for a formal foundation of visual languages, reviews related practical graphical environments, presents a spatial graph grammar formalism, and applies the spatial graph grammar to defining behavioral semantics of UML diagrams and developing a style-driven framework for software architecture design.
    2003,14(9):1621-1628, DOI:
    [Abstract] (12248) [HTML] (0) [PDF 680.35 K] (16669)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2008,19(8):1947-1964, DOI:
    [Abstract] (12209) [HTML] (0) [PDF 811.11 K] (7843)
    Abstract:
    Wide-Spread deployment for interactive information visualization is difficult. Non-Specialist users need a general development method and a toolkit to support the generic data structures suited to tree, network and multi-dimensional data, special visualization techniques and interaction techniques, and well-known generic information tasks. This paper presents a model driven development method for interactive information visualization. First, an interactive information visualization interface model (IIVM) is proposed. Then, the development method for interactive information visualization based on IIVM is presented. The Daisy toolkit is introduced, which includes Daisy model builder, Daisy IIV generator and runtime framework with Daisy library. Finally, an application example is given. Experimental results show that Daisy can provide a general solution for development for interactive information visualization.
    2002,13(10):1952-1961, DOI:
    [Abstract] (12163) [HTML] (0) [PDF 570.96 K] (9699)
    Abstract:
    The crucial technologies related to personalization are introduced in this paper, which include the representation and modification of user profile, the representation of resource, the recommendation technology, and the architecture of personalization. By comparing with some existing prototype systems, the key technologies about how to implement personalization are discussed in detail. In addition, three representative personalization systems are analyzed. At last, some research directions for personalization are presented.
    2010,21(2):231-247, DOI:
    [Abstract] (12140) [HTML] (0) [PDF 1.21 M] (14229)
    Abstract:
    In this paper, a framework is proposed for handling fault of service composition through analyzing fault requirements. Petri nets are used in the framework for fault detecting and its handling, which focuses on targeting the failure of available services, component failure and network failure. The corresponding fault models are given. Based on the model, the correctness criterion of fault handling is given to analyze fault handling model, and its correctness is proven. Finally, CTL (computational tree logic) is used to specify the related properties and enforcement algorithm of fault analysis. The simulation results show that this method can ensure the reliability and consistency of service composition.
    2003,14(9):1635-1644, DOI:
    [Abstract] (12033) [HTML] (0) [PDF 622.06 K] (9938)
    Abstract:
    Computer forensics is the technology field that attempts to prove thorough, efficient, and secure means to investigate computer crime. Computer evidence must be authentic, accurate, complete and convincing to juries. In this paper, the stages of computer forensics are presented, and the theories and the realization of the forensics software are described. An example about forensic practice is also given. The deficiency of computer forensics technique and anti-forensics are also discussed. The result comes out that it is as the improvement of computer science technology, the forensics technique will become more integrated and thorough.
    2012,23(1):82-96, DOI:10.3724/SP.J.1001.2012.04101
    [Abstract] (11910) [HTML] (0) [PDF 394.07 K] (11863)
    Abstract:
    Botnets are one of the most serious threats to the Internet. Researchers have done plenty of research and made significant progress. However, botnets keep evolving and have become more and more sophisticated. Due to the underlying security limitation of current system and Internet architecture, and the complexity of botnet itself, how to effectively counter the global threat of botnets is still a very challenging issue. This paper first introduces the evolving of botnet’s propagation, attack, command, and control mechanisms. Then the paper summarizes recent advances of botnet defense research and categorizes into five areas: Botnet monitoring, botnet infiltration, analysis of botnet characteristics, botnet detection and botnet disruption. The limitation of current botnet defense techniques, the evolving trend of botnet, and some possible directions for future research are also discussed.
    2010,21(7):1620-1634, DOI:
    [Abstract] (11861) [HTML] (0) [PDF 765.23 K] (17654)
    Abstract:
    As an application of mobile ad hoc networks (MANET) on Intelligent Transportation Information System, the most important goal of vehicular ad hoc networks (VANET) is to reduce the high number of accidents and fatal consequences dramatically. One of the most important factors that would contribute to the realization of this goal is the design of effective broadcast protocols. This paper introduces the characteristics and application fields of VANET briefly. Then, it discusses the characteristics, performance, and application areas with analysis and comparison of various categories of broadcast protocols in VANET. According to the characteristic of VANET and its application requirement, the paper proposes the ideas and breakthrough direction of information broadcast model design of inter-vehicle communication.
    2017,28(1):1-16, DOI:10.13328/j.cnki.jos.005139
    [Abstract] (11812) [HTML] (335) [PDF 1.75 M] (6322)
    Abstract:
    Knapsack problem (KP) is a well-known combinatorial optimization problem which includes 0-1 KP, bounded KP, multi-constraint KP, multiple KP, multiple-choice KP, quadratic KP, dynamic knapsack KP, discounted KP and other types of KPs. KP can be considered as a mathematical model extracted from variety of real fields and therefore has wide applications. Evolutionary algorithms (EAs) are universally considered as an efficient tool to solve KP approximately and quickly. This paper presents a survey on solving KP by EAs over the past ten years. It not only discusses various KP encoding mechanism and the individual infeasible solution processing but also provides useful guidelines for designing new EAs to solve KPs.
    2008,19(10):2706-2719, DOI:
    [Abstract] (11564) [HTML] (0) [PDF 778.29 K] (9809)
    Abstract:
    Web search engine has become a very important tool for finding information efficiently from the massive Web data. With the explosive growth of the Web data, traditional centralized search engines become harder to catch up with the growing step of people's information needs. With the rapid development of peer-to-peer (P2P) technology, the notion of P2P Web search has been proposed and quickly becomes a research focus. The goal of this paper is to give a brief summary of current P2P Web search technologies in order to facilitate future research. First, some main challenges for P2P Web search are presented. Then, key techniques for building a feasible and efficient P2P Web search engine are reviewed, including system topology, data placement, query routing, index partitioning, collection selection, relevance ranking and Web crawling. Finally, three recently proposed novel P2P Web search prototypes are introduced.
    2004,15(12):1751-1763, DOI:
    [Abstract] (11518) [HTML] (0) [PDF 928.33 K] (6274)
    Abstract:
    This paper presents a research work in children Truing test(CTT).The main defference between our test program and other ones is its knowledge-based character,which is supported by a massive commonsense knowledge base.The motivation,design,techniques,experimental results and platform(including a knowledge engine and a cinverstation engine)of the CTT are described in this paper.Finally,some cincluding thoughts about the CTT and AI are given.
    2008,19(7):1565-1580, DOI:
    [Abstract] (11503) [HTML] (0) [PDF 815.02 K] (13269)
    Abstract:
    Software defect prediction has been one of the active parts of software engineering since it was developed in 1970's. It plays a very important role in the analysis of software quality and balance of software cost. This paper investigates and discusses the motivation, evolvement, solutions and challenges of software defect prediction technologies, and it also categorizes, analyzes and compares the representatives of these prediction technologies. Some case studies for software defect distribution models are given to help understanding.
    2010,21(5):916-929, DOI:
    [Abstract] (11451) [HTML] (0) [PDF 944.50 K] (15042)
    Abstract:
    Data deduplication technologies can be divided into two categories: a) identical data detection techniques, and b) similar data detection and encoding techniques. This paper presents a systematic survey on these two categories of data deduplication technologies and analyzes their advantages and disadvantages. Besides, since data deduplication technologies can affect the reliability and performance of storage systems, this paper also surveys various kinds of technologies proposed to cope with these two aspects of problems. Based on the analysis of the current state of research on data deduplication technologies, this paper makes several conclusions as follows: a) How to mine data characteristic information in data deduplication has not been completely solved, and how to use data characteristic information to effectively eliminate duplicate data also needs further study; b) From the perspective of storage system design, it still needs further study how to introduce proper mechanisms to overcome the reliability limitations of data deduplication techniques and reduce the additional system overheads caused by data deduplication techniques.
    1999,10(11):1206-1211, DOI:
    [Abstract] (11422) [HTML] (0) [PDF 392.66 K] (5096)
    Abstract:
    In this paper, the authors discuss two important issues in rough set research which are attribute reduction and value reduction. A new attribute reduction approach which can reach the best attribute reduction is presented based on discernibility matrix and logic computation. And a multivariate decision tree can be got with this method. Some improvements for a widely used value reduction method are also achieved in this paper. The complexity of acquired rule knowledge can be reduced effectively in this way.
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291, DOI:
    [Abstract] (35623) [HTML] (0) [PDF 832.28 K] (75397)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2008,19(1):48-61, DOI:
    [Abstract] (26141) [HTML] (0) [PDF 671.39 K] (57336)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2010,21(8):1834-1848, DOI:
    [Abstract] (18427) [HTML] (0) [PDF 682.96 K] (50530)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2011,22(1):71-83, DOI:10.3724/SP.J.1001.2011.03958
    [Abstract] (28183) [HTML] (0) [PDF 781.42 K] (49525)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2009,20(1):54-66, DOI:
    [Abstract] (18039) [HTML] (0) [PDF 1.41 M] (46239)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2009,20(5):1337-1348, DOI:
    [Abstract] (26632) [HTML] (0) [PDF 1.06 M] (41368)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2009,20(2):271-289, DOI:
    [Abstract] (25697) [HTML] (0) [PDF 675.56 K] (39313)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2004,15(10):1493-1504, DOI:
    [Abstract] (8412) [HTML] (0) [PDF 937.72 K] (36579)
    Abstract:
    Graphics processing unit (GPU) has been developing rapidly in recent years at a speed over Moor抯 law, and as a result, various applications associated with computer graphics advance greatly. At the same time, the highly processing power, parallelism and programmability available nowadays on the contemporary GPU provide an ideal platform on which the general-purpose computation could be made. Starting from an introduction to the development history and the architecture of GPU, the technical fundamentals of GPU are described in the paper. Then in the main part of the paper, the development of various applications on general purpose computation on GPU is introduced, and among those applications, fluid dynamics, algebraic computation, database operations, and spectrum analysis are introduced in detail. The experience of our work on fluid dynamics has been also given, and the development of software tools in this area is introduced. Finally, a conclusion is made, and the future development and the new challenge on both hardware and software in this subject are discussed.
    2009,20(2):350-362, DOI:
    [Abstract] (15017) [HTML] (0) [PDF 1.39 M] (36154)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2010,21(3):427-437, DOI:
    [Abstract] (31201) [HTML] (0) [PDF 308.76 K] (35062)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2013,24(11):2476-2497, DOI:10.3724/SP.J.1001.2013.04486
    [Abstract] (9228) [HTML] (0) [PDF 1.14 M] (31278)
    Abstract:
    Probabilistic graphical models are powerful tools for compactly representing complex probability distributions, efficiently computing (approximate) marginal and conditional distributions, and conveniently learning parameters and hyperparameters in probabilistic models. As a result, they have been widely used in applications that require some sort of automated probabilistic reasoning, such as computer vision and natural language processing, as a formal approach to deal with uncertainty. This paper surveys the basic concepts and key results of representation, inference and learning in probabilistic graphical models, and demonstrates their uses in two important probabilistic models. It also reviews some recent advances in speeding up classic approximate inference algorithms, followed by a discussion of promising research directions.
    2014,25(9):1889-1908, DOI:10.13328/j.cnki.jos.004674
    [Abstract] (10633) [HTML] (431) [PDF 550.98 K] (29692)
    Abstract:
    This paper first introduces the key features of big data in different processing modes and their typical application scenarios, as well as corresponding representative processing systems. It then summarizes three development trends of big data processing systems. Next, the paper gives a brief survey on system supported analytic technologies and applications (including deep learning, knowledge computing, social computing, and visualization), and summarizes the key roles of individual technologies in big data analysis and understanding. Finally, the paper lays out three grand challenges of big data processing and analysis, i.e., data complexity, computation complexity, and system complexity. Potential ways for dealing with each complexity are also discussed.
    2012,23(4):962-986, DOI:10.3724/SP.J.1001.2012.04175
    [Abstract] (17445) [HTML] (0) [PDF 2.09 M] (27752)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):32-45, DOI:10.3724/SP.J.1001.2012.04091
    [Abstract] (17610) [HTML] (0) [PDF 408.86 K] (27330)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2012,23(1):1-20, DOI:10.3724/SP.J.1001.2012.04100
    [Abstract] (13304) [HTML] (0) [PDF 1017.73 K] (27115)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2016,27(1):45-71, DOI:10.13328/j.cnki.jos.004914
    [Abstract] (27532) [HTML] (497) [PDF 880.96 K] (26895)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2005,16(5):857-868, DOI:
    [Abstract] (19038) [HTML] (0) [PDF 489.65 K] (26719)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2011,22(1):115-131, DOI:10.3724/SP.J.1001.2011.03950
    [Abstract] (12787) [HTML] (0) [PDF 845.91 K] (25028)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2018,29(5):1471-1514, DOI:10.13328/j.cnki.jos.005519
    [Abstract] (4562) [HTML] (535) [PDF 4.38 M] (24954)
    Abstract:
    Computer aided detection/diagnosis (CAD) can improve the accuracy of diagnosis,reduce false positive,and provide decision supports for doctors.The main purpose of this paper is to analyze the latest development of computer aided diagnosis tools.Focusing on the top four fatal cancer's incidence positions,major recent publications on CAD applications in different medical imaging areas are reviewed in this survey according to different imaging techniques and diseases.Further more,multidimentional analysis is made on the researches from image data sets,algorithms and evaluation methods.Finally,existing problems,research trend and development direction in the field of medical image CAD system are discussed.
    2013,24(1):77-90, DOI:10.3724/SP.J.1001.2013.04339
    [Abstract] (10412) [HTML] (0) [PDF 0.00 Byte] (24054)
    Abstract:
    Task parallel programming model is a widely used parallel programming model on multi-core platforms. With the intention of simplifying parallel programming and improving the utilization of multiple cores, this paper provides an introduction to the essential programming interfaces and the supporting mechanism used in task parallel programming models and discusses issues and the latest achievements from three perspectives: Parallelism expression, data management and task scheduling. In the end, some future trends in this area are discussed.
    2015,26(1):62-81, DOI:10.13328/j.cnki.jos.004701
    [Abstract] (14733) [HTML] (411) [PDF 1.04 M] (21833)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2017,28(4):959-992, DOI:10.13328/j.cnki.jos.005143
    [Abstract] (8007) [HTML] (337) [PDF 3.58 M] (19784)
    Abstract:
    The development of mobile internet and the popularity of mobile terminals produce massive trajectory data of moving objects under the era of big data. Trajectory data has spatio-temporal characteristics and rich information. Trajectory data processing techniques can be used to mine the patterns of human activities and behaviors, the moving patterns of vehicles in the city and the changes of atmospheric environment. However, trajectory data also can be exploited to disclose moving objects' privacy information (e.g., behaviors, hobbies and social relationships). Accordingly, attackers can easily access moving objects' privacy information by digging into their trajectory data such as activities and check-in locations. In another front of research, quantum computation presents an important theoretical direction to mine big data due to its scalable and powerful storage and computing capacity. Applying quantum computing approaches to handle trajectory big data could make some complex problem solvable and achieve higher efficiency. This paper reviews the key technologies of processing trajectory data. First the concept and characteristics of trajectory data is introduced, and the pre-processing methods, including noise filtering and data compression, are summarized. Then, the trajectory indexing and querying techniques, and the current achievements of mining trajectory data, such as pattern mining and trajectory classification, are reviewed. Next, an overview of the basic theories and characteristics of privacy preserving with respect to trajectory data is provided. The supporting techniques of trajectory big data mining, such as processing framework and data visualization, are presented in detail. Some possible ways of applying quantum computation into trajectory data processing, as well as the implementation of some core trajectory mining algorithms by quantum computation are also described. Finally, the challenges of trajectory data processing and promising future research directions are discussed.
    2009,20(1):124-137, DOI:
    [Abstract] (15653) [HTML] (0) [PDF 1.06 M] (19524)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2011,22(6):1299-1315, DOI:10.3724/SP.J.1001.2011.03993
    [Abstract] (9396) [HTML] (0) [PDF 987.90 K] (19399)
    Abstract:
    Attribute-Based encryption (ABE) scheme takes attributes as the public key and associates the ciphertext and user’s secret key with attributes, so that it can support expressive access control policies. This dramatically reduces the cost of network bandwidth and sending node’s operation in fine-grained access control of data sharing. Therefore, ABE has a broad prospect of application in the area of fine-grained access control. After analyzing the basic ABE system and its two variants, Key-Policy ABE (KP-ABE) and Ciphertext-Policy ABE (CP-ABE), this study elaborates the research problems relating to ABE systems, including access structure design for CP-ABE, attribute key revocation, key abuse and multi-authorities ABE with an extensive comparison of their functionality and performance. Finally, this study discusses the need-to-be solved problems and main research directions in ABE.
    2009,20(3):524-545, DOI:
    [Abstract] (16628) [HTML] (0) [PDF 1.09 M] (19148)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2006,17(9):1848-1859, DOI:
    [Abstract] (11162) [HTML] (0) [PDF 770.40 K] (18018)
    Abstract:
    In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
    2013,24(5):1078-1097, DOI:10.3724/SP.J.1001.2013.04390
    [Abstract] (10772) [HTML] (0) [PDF 1.74 M] (17790)
    Abstract:
    The control and data planes are decoupled in software-defined networking, which provide a new solution for research on new network applications and future Internet technologies. The development status of OpenFlow-based SDN technologies is surveyed in this paper. The research background of decoupled architecture of network control and data transmission in OpenFlow network is summarized first, and the key components and research progress including OpenFlow switch, controller, and SDN technologies are introduced. Moreover, current problems and solutions of OpenFlow-based SDN technologies are analyzed in four aspects. Combined with the development status in recent years, the applications used in campus, data center, network management and network security are summarized. Finally, future research trends are discussed.
    2012,23(8):2058-2072, DOI:10.3724/SP.J.1001.2012.04237
    [Abstract] (9267) [HTML] (0) [PDF 800.05 K] (17762)
    Abstract:
    The Distributed denial of service (DDoS) attack is a major threat to the current network. Based on the attack packet level, the study divides DDoS attacks into network-level DDoS attacks and application-level DDoS attacks. Next, the study analyzes the detection and control methods of these two kinds of DDoS attacks in detail, and it also analyzes the drawbacks of different control methods implemented in different network positions. Finally, the study analyzes the drawbacks of the current detection and control methods, the development trend of the DDoS filter system, and corresponding technological challenges are also proposed.
    2010,21(7):1620-1634, DOI:
    [Abstract] (11861) [HTML] (0) [PDF 765.23 K] (17654)
    Abstract:
    As an application of mobile ad hoc networks (MANET) on Intelligent Transportation Information System, the most important goal of vehicular ad hoc networks (VANET) is to reduce the high number of accidents and fatal consequences dramatically. One of the most important factors that would contribute to the realization of this goal is the design of effective broadcast protocols. This paper introduces the characteristics and application fields of VANET briefly. Then, it discusses the characteristics, performance, and application areas with analysis and comparison of various categories of broadcast protocols in VANET. According to the characteristic of VANET and its application requirement, the paper proposes the ideas and breakthrough direction of information broadcast model design of inter-vehicle communication.
    2005,16(1):1-7, DOI:
    [Abstract] (20803) [HTML] (0) [PDF 614.61 K] (17638)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2004,15(11):1583-1594, DOI:
    [Abstract] (7452) [HTML] (0) [PDF 1.57 M] (17403)
    Abstract:
    Uncertainty exists widely in the subjective and objective world. In all kinds of uncertainty, randomness and fuzziness are the most important and fundamental. In this paper, the relationship between randomness and fuzziness is discussed. Uncertain states and their changes can be measured by entropy and hyper-entropy respectively. Taken advantage of entropy and hyper-entropy, the uncertainty of chaos, fractal and complex networks by their various evolution and differentiation are further studied. A simple and effective way is proposed to simulate the uncertainty by means of knowledge representation which provides a basis for the automation of both logic and image thinking with uncertainty. The AI (artificial intelligence) with uncertainty is a new cross-discipline, which covers computer science, physics, mathematics, brain science, psychology, cognitive science, biology and philosophy, and results in the automation of representation, process and thinking for uncertain information and knowledge.
    2014,25(1):37-50, DOI:10.13328/j.cnki.jos.004497
    [Abstract] (8766) [HTML] (365) [PDF 929.87 K] (17162)
    Abstract:
    This paper surveys the state of the art of speech emotion recognition (SER), and presents an outlook on the trend of future SER technology. First, the survey summarizes and analyzes SER in detail from five perspectives, including emotion representation models, representative emotional speech corpora, emotion-related acoustic features extraction, SER methods and applications. Then, based on the survey, the challenges faced by current SER research are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, and presents detailed comparison and analysis between these methods.
    2005,16(10):1743-1756, DOI:
    [Abstract] (9122) [HTML] (0) [PDF 545.62 K] (16761)
    Abstract:
    This paper presents a survey on the theory of provable security and its applications to the design and analysis of security protocols. It clarifies what the provable security is, explains some basic notions involved in the theory of provable security and illustrates the basic idea of random oracle model. It also reviews the development and advances of provably secure public-key encryption and digital signature schemes, in the random oracle model or the standard model, as well as the applications of provable security to the design and analysis of session-key distribution protocols and their advances.
    2003,14(9):1621-1628, DOI:
    [Abstract] (12248) [HTML] (0) [PDF 680.35 K] (16669)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2018,29(10):2966-2994, DOI:10.13328/j.cnki.jos.005551
    [Abstract] (7190) [HTML] (709) [PDF 610.06 K] (16074)
    Abstract:
    In recent years, the rapid development of Internet technology and Web applications has triggered the explosion of various data on the Internet, which generates a large amount of valuable knowledge. How to organize, represent and analyze these knowledge has attracted much attention. Knowledge graph was thus developed to organize these knowledge in a semantical and visualized manner. Knowledge reasoning over knowledge graph then becomes one of the hot research topics and plays an important role in many applications such as vertical search and intelligent question-answer. The goal of knowledge reasoning over knowledge graph is to infer new facts or identify erroneous facts according to existing ones. Unlike traditional knowledge reasoning, knowledge reasoning over knowledge graph is more diversified, due to the simplicity, intuitiveness, flexibility, and richness of knowledge representation in knowledge graph. Starting with the basic concept of knowledge reasoning, this paper presents a survey on the recently developed methods for knowledge reasoning over knowledge graph. Specifically, the research progress is reviewed in detail from two aspects:One-Step reasoning and multi-step reasoning, each including rule based reasoning, distributed embedding based reasoning, neural network based reasoning and hybrid reasoning. Finally, future research directions and outlook of knowledge reasoning over knowledge graph are discussed.
    2014,25(4):839-862, DOI:10.13328/j.cnki.jos.004558
    [Abstract] (14583) [HTML] (522) [PDF 1.32 M] (15949)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2013,24(2):295-316, DOI:10.3724/SP.J.1001.2013.04336
    [Abstract] (9150) [HTML] (0) [PDF 0.00 Byte] (15843)
    Abstract:
    Under the new application mode, the traditional hierarchy data centers face several limitations in size, bandwidth, scalability, and cost. In order to meet the needs of new applications, data center network should fulfill the requirements with low-cost, such as high scalability, low configuration overhead, robustness and energy-saving. First, the shortcomings of the traditional data center network architecture are summarized, and new requirements are pointed out. Secondly, the existing proposals are divided into two categories, i.e. server-centric and network-centric. Then, several representative architectures of these two categories are overviewed and compared in detail. Finally, the future directions of data center network are discussed.
    2009,20(6):1393-1405, DOI:
    [Abstract] (10800) [HTML] (0) [PDF 831.86 K] (15801)
    Abstract:
    Combinatorial testing can use a small number of test cases to test systems while preserving fault detection ability. However, the complexity of test case generation problem for combinatorial testing is NP-complete. The efficiency and complexity of this testing method have attracted many researchers from the area of combinatorics and software engineering. This paper summarizes the research works on this topic in recent years. They include: various combinatorial test criteria, the relations between the test generation problem and other NP-complete problems, the mathematical methods for constructing test cases, the computer search techniques for test generation and fault localization techniques based on combinatorial testing.
    2008,19(11):2803-2813, DOI:
    [Abstract] (8486) [HTML] (0) [PDF 319.20 K] (15733)
    Abstract:
    A semi-supervised clustering method based on affinity propagation (AP) algorithm is proposed in this paper. AP takes as input measures of similarity between pairs of data points. AP is an efficient and fast clustering algorithm for large dataset compared with the existing clustering algorithms, such as K-center clustering. But for the datasets with complex cluster structures, it cannot produce good clustering results. It can improve the clustering performance of AP by using the priori known labeled data or pairwise constraints to adjust the similarity matrix. Experimental results show that such method indeed reaches its goal for complex datasets, and this method outperforms the comparative methods when there are a large number of pairwise constraints.
    2009,20(8):2241-2254, DOI:
    [Abstract] (6018) [HTML] (0) [PDF 1.99 M] (15727)
    Abstract:
    Inspired from the idea of data fields, a community discovery algorithm based on topological potential is proposed. The basic idea is that a topological potential function is introduced to analytically model the virtual interaction among all nodes in a network and, by regarding each community as a local high potential area, the community structure in the network can be uncovered by detecting all local high potential areas margined by low potential nodes. The experiments on some real-world networks show that the algorithm requires no input parameters and can discover the intrinsic or even overlapping community structure in networks. The time complexity of the algorithm is O(m+n3/γ)~O(n2), where n is the number of nodes to be explored, m is the number of edges, and 2<γ<3 is a constant.
    2020,31(7):2245-2282, DOI:10.13328/j.cnki.jos.006037
    [Abstract] (2148) [HTML] (338) [PDF 967.02 K] (15498)
    Abstract:
    Ultrasonography is the first choice of imaging examination and preoperative evaluation for thyroid and breast cancer. However, ultrasonic characteristics of benign and malignant nodules are commonly overlapped. The diagnosis heavily relies on operator's experience other than quantitative and stable methods. In recent years, medical imaging analysis based on computer technology has developed rapidly, and a series of landmark breakthroughs have been made, which provides effective decision supports for medical imaging diagnosis. In this work, the research progress of computer vision and image recognition technologies in thyroid and breast ultrasound images is studied. A series of key technologies involved in automatic diagnosis of ultrasound images is the main lines of the work. The major algorithms in recent years are summarized and analyzed, such as ultrasound image preprocessing, lesion localization and segmentation, feature extraction and classification. Moreover, multi-dimensional analysis is made on the algorithms, data sets, and evaluation methods. Finally, existing problems related to automatic analysis of those two kinds of ultrasound imaging are discussed, research trend and development direction in the field of ultrasound images analysis are discussed.
    2010,21(7):1605-1619, DOI:
    [Abstract] (9234) [HTML] (0) [PDF 856.25 K] (15492)
    Abstract:
    The rapid development of Internet leads to an increase in system complexity and uncertainty. Traditional network management can not meet the requirement, and it shall evolve to fusion based Cyberspace Situational Awareness (CSA). Based on the analysis of function shortage and development requirement, this paper introduces CSA as well as its origin, conception, objective and characteristics. Firstly, a CSA research framework is proposed and the research history is investigated, based on which the main aspects and the existing issues of the research are analyzed. Meanwhile, assessment methods are divided into three categories: Mathematics model, knowledge reasoning and pattern recognition. Then, this paper discusses CSA from three aspects: Model, knowledge representation and assessment methods, and then goes into detail about main idea, assessment process, merits and shortcomings of novel methods. Many typical methods are compared. The current application research of CSA in the fields of security, transmission, survivable, system evaluation and so on is presented. Finally, this paper points the development directions of CSA and offers the conclusions from issue system, technical system and application system.
    2009,20(8):2199-2213, DOI:
    [Abstract] (9808) [HTML] (0) [PDF 2.05 M] (15418)
    Abstract:
    This paper analyzes the previous study of applying P2P technology in mobile Internet. It first introduces the P2P technology and the conception of mobile Internet, and presents the challenges and service pattern of P2P technology in mobile Internet. Second, the architectures of P2P technology in mobile Internet are described in terms of centralized architecture, super node architecture and ad hoc architecture, respectively. Further more, the resource location algorisms and cross-layer optimizations are introduced based on two different terminal access patterns. Detailed analyses of different key technologies are presented and the disadvantages are pointed out. At last, this paper outlines future research directions.
    2009,20(3):567-582, DOI:
    [Abstract] (7646) [HTML] (0) [PDF 780.38 K] (15155)
    Abstract:
    The research on the software quality model and software quality evaluation model has always been a hot topic in the area of software quality assurance and assessment. A great amount of domestic and foreignresearches have been done in building software quality model and quality assessment model, and so far certainaccomplishments have been achieved in these areas. In recent years, platform building and systematization havebecome the trends of developing basic softwares based on operating systems. Therefore, the quality evaluation ofthe foundational software platform becomes an essential issue to be solved. This article analyzes and concludes thecurrent development of researches on software quality model and software quality assessment model focusing onsummarizing and depicting the developing process of quality evaluation of foundational software platform. It alsodiscusses the future development of researches on quality assessment of foundational software platform in brief, trying to establish a good foundation for it.
    2010,21(5):916-929, DOI:
    [Abstract] (11451) [HTML] (0) [PDF 944.50 K] (15042)
    Abstract:
    Data deduplication technologies can be divided into two categories: a) identical data detection techniques, and b) similar data detection and encoding techniques. This paper presents a systematic survey on these two categories of data deduplication technologies and analyzes their advantages and disadvantages. Besides, since data deduplication technologies can affect the reliability and performance of storage systems, this paper also surveys various kinds of technologies proposed to cope with these two aspects of problems. Based on the analysis of the current state of research on data deduplication technologies, this paper makes several conclusions as follows: a) How to mine data characteristic information in data deduplication has not been completely solved, and how to use data characteristic information to effectively eliminate duplicate data also needs further study; b) From the perspective of storage system design, it still needs further study how to introduce proper mechanisms to overcome the reliability limitations of data deduplication techniques and reduce the additional system overheads caused by data deduplication techniques.
    2007,18(1):146-156, DOI:
    [Abstract] (9272) [HTML] (0) [PDF 728.16 K] (14982)
    Abstract:
    A new surrogate placement strategy, CCSP (capacity-constrained surrogate placement), is proposed to enhance the performance for content distribution networks (CDNs). CCSP aims to address surrogate placement in a manner that minimizes the communication cost while ensuring at the same time the maximization of system throughput. This work differs from the existing works on the resource allocation problem in communication networks, CCSP considers load distribution and processing capacity constraints on surrogates by modeling the underlying request-routing mechanism, thus guaranteeing a CDN to have minimum network resource consumption, maximum system throughput, and better load balancing among surrogates. An efficient greedy algorithm is developed for a simplified version of the CCSP problem in tree networks. The efficiency of the proposed algorithm is systematically analyzed through the experimental simulations.
    2012,23(5):1148-1166, DOI:10.3724/SP.J.1001.2012.04195
    [Abstract] (13482) [HTML] (0) [PDF 946.37 K] (14899)
    Abstract:
    With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here, the features, influence and related products of cloud databases are first discussed. Then, research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.
    2009,20(6):1425-1443, DOI:
    [Abstract] (9466) [HTML] (0) [PDF 1.09 M] (14602)
    Abstract:
    The software fault injection testing (SFIT) technique has been developed for thirty years. It is one of the most active parts in software testing research. As a non-traditional testing technique, it plays a very important role in enhancing software quality, eliminating software failures and improving the process of software development. A detailed review of the research on SFIT is presented based on the survey and classification of the current SFIT techniques. Then, some important testing frameworks and tools that are effective at present are also discussed. Meanwhile, a brief description of the testing system CSTS (Component Security Testing System) is provided as well. Based on the precise investigation on SFIT, the issues and challenges of SFIT are pointed out and the future development trend for SFIT is proposed.
    2016,27(3):691-713, DOI:10.13328/j.cnki.jos.004948
    [Abstract] (8431) [HTML] (308) [PDF 2.43 M] (14539)
    Abstract:
    Learning to rank(L2R) techniques try to solve sorting problems using machine learning methods, and have been well studied and widely used in various fields such as information retrieval, text mining, personalized recommendation, and biomedicine.The main task of L2R based recommendation algorithms is integrating L2R techniques into recommendation algorithms, and studying how to organize a large number of users and features of items, build more suitable user models according to user preferences requirements, and improve the performance and user satisfaction of recommendation algorithms.This paper surveys L2R based recommendation algorithms in recent years, summarizes the problem definition, compares key technologies and analyzes evaluation metrics and their applications.In addition, the paper discusses the future development trend of L2R based recommendation algorithms.
    2011,22(1):132-148, DOI:10.3724/SP.J.1001.2011.03899
    [Abstract] (9063) [HTML] (0) [PDF 852.82 K] (14496)
    Abstract:
    The Internet has become a vital information infrastructure for modern society. However, the concurrent nature of network introduces a wide-range of difficulties in traditional programming methodology in developing high-quality network programs that significantly reduce productivity. The influence of concurrency on the complexity of software development is comparable to the “concurrency crisis” of software brought by multi-core processors, but it receives much less attention here than what it deserves. There is no universal approach to cope with this issue, and there are even disagreements between different approaches. In this paper, the basic concurrency models and their implementations are introduced, and then the paper surveys the inherent complexities of these approaches, comparing their advantages and disadvantages. Finally, this paper offers an opinion on the possibilities for future research on this topic.