Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2025,36(7):2929-2946, DOI: 10.13328/j.cnki.jos.007332, CSTR: 32375.14.jos.007332
    [Abstract] (300) [HTML] (56) [PDF 6.75 K] (406)
    Abstract:
    Differential privacy, owing to its strong privacy protection capacity, is applied to the random forest algorithm to address the privacy leakage problem. However, the direct application of differential privacy to the random forest algorithm leads to a significant decline in the model’s classification accuracy. To balance the contradiction between privacy protection and model accuracy, this study proposes an efficient differential privacy random forest training algorithm, efficient differential privacy random forest (eDPRF). Specifically, the study designs a decision tree construction method based on the permute-and-flip mechanism. By introducing the efficient query output advantage of the permute and flip mechanism, the corresponding utility functions are further designed to achieve the precise output of split features and labels, effectively enhancing the learning ability of the tree model for data information under perturbation circumstances. At the same time, the study designs a privacy budget allocation strategy based on the composition theorem, which improves the privacy budget utilization rate of nodes by obtaining training subsets without replacement sampling and adjusting internal budgets through differentiation. Finally, through theoretical analysis and experimental evaluation, it is demonstrated that the proposed algorithm outperforms similar algorithms in terms of the model’s classification accuracy when given the same privacy budget.
    2025,36(7):2947-2963, DOI: 10.13328/j.cnki.jos.007333, CSTR: 32375.14.jos.007333
    [Abstract] (638) [HTML] (57) [PDF 6.72 K] (389)
    Abstract:
    Fuzz testing techniques play a significant role in software quality assurance and software security testing. However, when dealing with systems like compilers that have complex input semantics, existing fuzz testing tools often struggle as a lack of semantic awareness in their mutation strategies leads to the generated programs failing to pass compiler frontend checks. This study proposes a semantically-aware greybox fuzz testing method, aiming at enhancing the efficiency of fuzz testing tools in the domain of compiler testing. It designs and implements a series of mutation operators that can maintain input semantic validity and explore contextual diversity, and develops efficient selection strategies according to the characteristics of these operators. The greybox fuzz testing tool SemaAFL is developed by integrating these strategies with traditional greybox fuzz testing tools. Experimental results indicate that by applying these mutation operators, SemaAFL achieves approximately 14.5% and 11.2% higher code coverage on GCC and Clang compilers compared to AFL++ and similar tools like GrayC. During a week-long experimental period, six previously unknown bugs in GCC and Clang are discovered and reported by SemaAFL.
    2025,36(7):2964-3002, DOI: 10.13328/j.cnki.jos.007334, CSTR: 32375.14.jos.007334
    [Abstract] (791) [HTML] (66) [PDF 6.75 K] (438)
    Abstract:
    Distributed systems are the pillars of the current computing ecosystem, which make modern computing more powerful, reliable, and flexible, covering several key fields from cloud computing and big data processing to the Internet of Things. However, due to the complexity of the system, some code defects are inevitably introduced during the code implementation of distributed systems, thus posing a huge threat to the availability, robustness, and security of the system. Therefore, the testing and defect detection work of distributed systems is very important. Dynamic testing technology conducts real-time analysis during the system operation to detect its defects and evaluate its behavior and functions, and is widely used in the defect detection of various system applications and has successfully found many code defects. A four-layer defect threat model of distributed systems is proposed in this study. Based on it, the testing requirements and main challenges of distributed systems are analyzed, and a general framework for dynamic testing of distributed systems is proposed. Then, typical dynamic testing tools for distributed systems are introduced from the perspective of detecting different types of system defects. Next, the study highlights critical techniques such as multidimensional test input generation, system-critical state awareness, and defect judgment criteria. Additionally, the paper reviews popular dynamic testing tools and evaluates their effectiveness in defect discovery and test coverage. The coverage and defect discovery capabilities of the current mainstream dynamic testing tools for distributed systems are evaluated. The findings show that multidimensional input generation significantly enhances testing efficiency. Finally, the study discusses emerging trends and future directions in dynamic testing of distributed systems, aiming to address their inherent challenges and improve testing outcomes.
    2025,36(7):3003-3021, DOI: 10.13328/j.cnki.jos.007335, CSTR: 32375.14.jos.007335
    [Abstract] (171) [HTML] (50) [PDF 6.75 K] (365)
    Abstract:
    Binary2Source function similarity detection is regarded as one of the fundamental tasks in software composition analysis. In the existing binary2Source matching works, the 1-to-1 matching mechanism is mainly adopted, where one binary function is matched against one source function. However, it is found that such a mapping may be 1-to-n (one binary function is mapped to multiple source functions) due to the existence of function inlining. A 30% performance loss is suffered by the existing binary2Source matching methods under function inlining due to this difference. Aimed at the matching requirement of binary to source functions in the scene of function inlining, a binary2Source function similarity detection method for 1-to-n matching is proposed in this study, which is designed to generate source function sets as the matching objects for the inlined binary functions to make up for the lack of the source function library. The effectiveness of the proposed method is evaluated through a series of experiments. The experimental data indicate that the method can not only improve the existing binary2Source function similarity detection ability but also identify the inlined source code functions, helping the existing tools better cope with the challenges of inlining.
    2025,36(7):3022-3040, DOI: 10.13328/j.cnki.jos.007336, CSTR: 32375.14.jos.007336
    [Abstract] (251) [HTML] (59) [PDF 6.73 K] (324)
    Abstract:
    Deep Learning compilers (DL compilers) are widely applied to optimize and deploy deep learning models. Similar to traditional compilers, DL compilers also possess bugs. The buggy DL compilers can cause compilation failures, generate incorrect compilation results and even lead to disastrous consequences sometimes. To deeply understand the characteristics of DL compiler bugs, the existing works have analyzed 603 early bugs in DL compilers. In recent years, DL compilers have been updated rapidly, along with the introduction of a large number of new features and the abandonment of some old ones. At the same time, several bug detection approaches for DL compilers have been developed. Therefore, it is necessary to analyze whether the previous research conclusions on DL compiler bugs are still applicable. In addition, there is a lack of in-depth exploration of the relationship among the symptoms, root causes, and locations of bugs, and the characteristics of bug-revealing tests and bug-fixing patches have not been studied. To deeply analyze the evolution process of the current DL compiler bug characteristics and distribution over time, 613 recently fixed bugs in three mainstream DL compilers (i.e., TVM of Apache, Glow of Facebook, and AKG of Huawei) are collected in this study, and the characteristics such as root causes, symptoms and locations of bugs are manually labeled. Based on the labeling results, this study deeply explores the distribution characteristics of bugs from multiple dimensions and compares them with that in the existing works. Meanwhile, we further investigate the characteristic of bug-revealing regression tests and bug-fixing patches. In total, this study summarizes 12 major findings to comprehensively understand the current situation and evolution of DL compiler bugs and provide a series of feasible suggestions for the detection, location, and repair of DL compiler bugs. Finally, to verify the effectiveness of the research findings in this work, a testing tool CfgFuzz based on optimized configuration is developed. CfgFuzz conducts combinatorial tests on compilation configuration options and finally detects 8 TVM bugs, 7 of which have been confirmed or fixed by developers.
    2025,36(7):3041-3086, DOI: 10.13328/j.cnki.jos.007338, CSTR: 32375.14.jos.007338
    [Abstract] (221) [HTML] (36) [PDF 6.74 K] (298)
    Abstract:
    Java has become one of the most popular programming languages for application project development nowadays, due to its rich dependency libraries and convenient build tools such as Maven and Gradle. However, with the continuous increase in the scale of dependency libraries, the dependency management of Java projects becomes increasingly complex and constantly exceeds the management capabilities of existing tools. The potential problems are likely to be triggered unexpectedly, seriously affecting the building and running of the current project and other projects in the Java ecosystem, such as causing build errors, runtime crashes, or semantic conflicts. This study aims to address the gaps in the analysis of dependency management issues found in existing research and technical literature by introducing the concept of “dependency smell”, to build a unified model for these challenges. This study conducts a comprehensive empirical study on dependency management issues, covering all categories of Maven and Gradle related problems. This study analyzes diverse dependency management issues gathered from open-source communities (e.g., GitHub), official documentation (e.g., Maven manual), as well as various surveys and technical papers. Finally, 13 types of dependency smell, as well as their triggering roots and impact characteristics, are summarized. Based on the findings of this empirical study, a unified detection algorithm for dependency smells in Java projects is designed, and a special detection tool JDepAna suitable for Maven and Gradle build tools is implemented. Experimental results demonstrate that for known dependency smells, JDepAna achieves a detection recall rate of 95.9%. For hundreds of new Java projects, JDepAna detects 30689 instances of dependency smells. 360 instances are selected, and the true positive rate of manual verification reaches 96.1%. Additionally, this study reports 48 instances to developers, with 42 instances promptly confirmed and 21 promptly fixed, thereby validating the efficacy and practicality of the proposed Java dependency smell detection algorithm and tool in facilitating quality assurance for Java projects.
    2025,36(7):3087-3108, DOI: 10.13328/j.cnki.jos.007339, CSTR: 32375.14.jos.007339
    [Abstract] (140) [HTML] (47) [PDF 6.77 K] (263)
    Abstract:
    With the rapid development of information technology, security authentication technology has become a crucial safeguard for personal privacy and data security. Among them, iris recognition technology, with its outstanding accuracy and stability, is widely applied in fields such as system access control, healthcare, and judicial practices. However, once the iris feature data of a user is leaked, it means permanent loss, as it cannot be changed or revoked. Therefore, the privacy protection of iris feature data is particularly important. With the prominent performance of neural network technology in image processing, secure iris recognition schemes based on neural networks have been proposed, which maintain the high performance of recognition systems while protecting privacy data. However, in the face of constantly changing data and environments, secure iris recognition schemes are required to have effective scalability, that is, the recognition scheme should be able to maintain performance with new user registrations. However, most of the existing research on neural network-based secure iris recognition schemes does not consider the scalability of the schemes. Aiming at the above problems, the generative feature replay-based secure incremental iris recognition (GFR-SIR) method and the privacy-preserving template replay-based secure incremental iris recognition (PTR-SIR) method are proposed in this study. Specifically, the GFR-SIR method uses generative feature replay and feature distillation techniques to alleviate the forgetting of previous task knowledge during the expansion of neural networks and adopts the improved TNCB method to protect the privacy of iris feature data. The PTR-SIR method preserves the privacy-protecting templates obtained through the TNCB method in previous tasks and replays these templates during the model training of the current task to achieve the scalability of the recognition scheme. Experimental results show that after completing 5 rounds of expansion tasks, the recognition accuracy of GFR-SIR and PTR-SIR on the CASIA-Iris-Lamp dataset reaches 68.32% and 98.49% respectively, which is an improvement of 58.49% and 88.66% compared with the fine-tuning method. The analysis indicates that the GFR-SIR method has significant advantages in terms of security and model training efficiency since the data of previous tasks is not saved; while the PTR-SIR method is more outstanding in maintaining recognition performance, but its security and efficiency are lower than those of GFR-SIR.
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    Available online:  July 09, 2025 , DOI: 10.13328/j.cnki.jos.007295
    Abstract:
    The resource auction mechanism can maximize the resource allocation benefit by fully introducing competition, and has found widespread applications in mobile edge computing resource allocation and pricing. Currently, auction mechanisms relevant to resource allocation of mobile edge computing mainly focus on computing resource allocation of edge servers, and there are limitations in both considering the allocation of wireless bandwidth resources that do not belong to any edge servers and computing resources belonging to specific edge servers under a multi-base station environment. Furthermore, with multiple types of resource conditions considered, a challenge is posed to the design of a resource allocation and pricing strategy that guarantees benefits for both resource providers and users. By analyzing the characteristics of multi-base stations and resource constraints, this study proposes a double-auction-based combinational resource allocation (DACRA) mechanism for mobile edge computing. This mechanism considers the allocation of wireless bandwidth resources in multi-communication base stations and multiple computing resources of edge servers and introduces resource scarcity and bidding density to ensure high allocation efficiency. Theoretical analysis shows that the DACRA mechanism is a polynomial time algorithm that satisfies incentive compatibility, budget-balance, and individual rationality. Simulation results based on a publicly available dataset show that the proposed mechanism can yield lower computational time costs, and higher social welfare, request success rates, and resource utilization rates than existing research results.
    Available online:  July 09, 2025 , DOI: 10.13328/j.cnki.jos.007316
    Abstract:
    The uBlock algorithm has been widely used in algorithm design, side channel protection, Internet of Things applications, and cryptanalysis. Although the uBlock algorithm is suitable for high-speed implementation, the publicly available implementation rate of this algorithm is far lower than that of algorithms such as AES and SM4. Bit slicing is a common method to optimize block ciphers. However, when using bit slicing to optimize the uBlock algorithm, it faces the problem of huge memory access overhead due to insufficient register resources. In this study, a flexible bit slicing optimization method named FBS-uBlock is designed for the uBlock algorithm. It reduces the number of registers occupied by the algorithm under bit slicing, thus reducing the memory access overhead and improving the speed. After testing, the proposed optimization method can reduce the memory access instruction of uBlock-128/128, uBlock-128/256, and uBlock-256/256 algorithms by up to 71%, 71%, and 72%, respectively. The maximum encryption rates can reach 12758 Mb/s, 8944 Mb/s, and 8984 Mb/s respectively, which are 3.9, 4.2, and 3.4 times higher than the implementation rates in the design documentation.
    Available online:  July 09, 2025 , DOI: 10.13328/j.cnki.jos.007386
    Abstract:
    Contract cases account for a substantial proportion of daily civil disputes, reflecting a considerable volume. The limited accessibility and cumbersome management of traditional paper contracts have significantly hindered the efficiency of contract execution and dispute resolution. As a computer protocol designed to execute contract terms, smart contracts offer new possibilities for the execution and processing of legal contracts, with advantages such as automated execution, decentralization, and immutability. However, their reliance on strict programming logic, lack of interpretative flexibility, and difficulty in dynamic adjustments after deployment constrain the intentions of contract participants and result in uncertainties regarding legal applicability and binding force. Based on the distinctions between legal contracts and smart contracts, this study proposes four key principles, including grammatical requirements, the non-empowerment principle, validity review, and security criteria, providing a theoretical framework for generating and executing legally effective smart contracts. A smart contract transformation and verification model is further designed to adhere to these four principles. The proposed model enhances the processing of legal contracts expressed as transition systems, prevents re-entry attacks, and converts core and additional specifications into computational tree logic for security property verification. Contract passing verification is automatically converted into smart contracts. The entire transformation process complies with the proposed four principles, ensuring that the resulting smart contracts meet current legal standards and can be regarded as legal contracts. Experimental validation includes a simplified sales contract as a case study, demonstrating its initial and enhanced transition system models, partial verification results, and the representative Solidity code generated. The pre-processing operation yields a high-quality dataset constructed from 270592 samples. Consistency evaluation between contract terms and legal provisions achieves Recall rates of 90.27% at R@1, 97.91% at R@5, and 99.30% at R@10. The feature extraction model, aided by a format conversion tool with nearly 100% fidelity, achieves 91.87% accuracy at the token level, confirming the model’s accuracy and reliability. The findings indicate that the proposed principles are highly feasible, while the transformation and verification model effectively addresses the cumbersome nature of paper contract processing, enhances the convenience and flexibility of legal contract execution and management, and enables smart contracts to obtain legal protection while mitigating potential risks.
    Available online:  July 09, 2025 , DOI: 10.13328/j.cnki.jos.007402
    Abstract:
    To address the issues in current OWL representation learning methods, which lack the ability to jointly represent complex semantic information across both the concept layer and the instance layer, an OWL representation learning approach using multi-semantic views of concepts, properties, and instances is proposed. The proposed method adopts a three-stage architecture including multi-semantic views partitioning, semantic-aware self-supervised post-training, and joint multi-task representation learning. First, MSV-KRL optimizes the mapping strategy from OWL to RDF graphs based on OWL2Vec*, and five fine-grained semantic view partitioning strategies are proposed. Subsequently, serialized post-training data is generated through the random walk and annotated attribute replacement strategy. The self-supervised post-training of the pre-trained model is then carried out to enhance adaptability to multi-semantic views. Finally, by employing a multi-task learning strategy, the complex semantic representation learning of concepts, properties, and instances in OWL graphs is achieved through joint optimization loss of multi-semantic view prediction tasks. Experimental results demonstrate that MSV-KRL outperforms baseline representation learning methods on multiple benchmarks. MSV-KRL can be adapted to multiple language models, significantly improving the knowledge representation capability of OWL’s complex semantics.
    Available online:  July 09, 2025 , DOI: 10.13328/j.cnki.jos.007427
    Abstract:
    Blockchain, also known as a distributed ledger, is a prominent example of next-generation information technology. It has been widely applied in various fields, including finance, healthcare, energy, and government affairs. Privacy protection technologies within the blockchain that can be regulated not only safeguard users’ privacy and enhance trust but also prevent misuse of blockchain for illegal activities, ensuring compliance with regulations. Current privacy protection schemes for regulatable blockchains are typically based on bilinear pairing, which exhibit relatively low computational efficiency and fail to meet the demands of high-concurrency scenarios. To address these issues, this study proposes an efficient regulatable identity privacy protection scheme in blockchain. By designing a zero-knowledge proof to verify the consistency of the receiver’s identity without bilinear pairing, along with a traceable ring signature scheme, this approach effectively protects the identity privacy of both parties in transactions while maintaining the effectiveness of supervision. The experimental results indicate that when the number of ring members is set to 16, as required by Monero, the execution time of all algorithms in the efficient regulatable identity privacy protection scheme in blockchain is within 5 milliseconds. Compared to similar schemes, efficiency has improved by more than 14 times, and the message length has been reduced to 50% of the original scheme, demonstrating enhanced computational efficiency and a shorter message length.
    Available online:  July 09, 2025 , DOI: 10.13328/j.cnki.jos.007428
    Abstract:
    Attribute-based searchable encryption (ABSE) enables secure and fine-grained sharing of encrypted data in multi-user environments. However, it typically encounters challenges such as high computational overhead for encryption and decryption, limited query efficiency, and the inability to update indexes dynamically. To address these limitations, this study proposes an efficient searchable scheme based on ABSE that supports dynamic index updates. The reuse of identical access policies minimizes redundant computation during encryption. Most decryption operations are securely outsourced to the cloud, thus reducing the local device’s computational load. An inverted index structure supporting multi-keyword Boolean retrieval is constructed by integrating hash tables with skip lists. BLS short signature technology is employed to verify the permissions for index updates, ensuring data owners can manage the retrieval of encrypted data. Formal security analysis confirms that the proposed scheme effectively defends against collusion attacks, chosen plaintext attacks, forged update tokens, and decryption key forgery. Experimental results demonstrate high efficiency in both retrieval and index update operations, along with a significant reduction in encryption overhead when access policy reuse occurs.
    Available online:  July 09, 2025 , DOI: 10.13328/j.cnki.jos.007430
    Abstract:
    In the field of time series data analysis, cross-domain data distribution shifts significantly weaken model generalization performance. To address this, an end-to-end time series domain adaptation framework, called TPN, is developed. This framework creatively integrates a temporal pattern activation module (TPAM) with a Transformer encoder. TPAM captures spatial and temporal dependencies of sequence features through dual-layer spatio-temporal convolution operations, combines Sigmoid and Tanh activation functions for the non-linear fusion of extracted features, and restores the original channel dimensions via linear projection, thus enhancing the model’s ability to extract temporal features. TPN also introduces an enhanced adversarial paradigm (EAP), which strengthens generator-discriminator-based collaborative adversarial learning through domain classification loss and operation order prediction loss. This effectively reduces data distribution discrepancies between source and target domains, improving the model’s domain adaptability. Empirical results on three public human activity recognition datasets (Opportunity, WISDM, and HHAR) demonstrate that TPN improves accuracy and F1 by up to 6% compared to existing methods, with fewer parameters and shorter runtime. In-depth ablation and visualization experiments further validate the effectiveness of TPAM and EAP, showing TPN’s strong performance in feature extraction and domain alignment.
    Available online:  July 02, 2025 , DOI: 10.13328/j.cnki.jos.007384
    Abstract:
    To meet the economic design requirements of aircraft and reduce internal payload, the transition from wired to wireless networks has emerged as a key direction in the upgrade of airborne networks. However, traditional wireless technologies are unable to satisfy the real-time transmission requirements of time-triggered services in airborne networks. In this study, the application characteristics of the airborne wireless communication network (AWCN) are defined, and a hybrid topology is designed by integrating the AWCN with the airborne backbone switching network. By considering conflict-free nodes, interference-free channels, path dependencies, and end-to-end delay requirements, a first-order logic formulation for the deterministic scheduling of time-triggered AWCN is developed. The minimum number of time slots required for scheduling and the primary factors affecting end-to-end delay are theoretically analyzed under different channel configurations. In addition, the expected value of the information age for data flows at the gateway in a steady state is established. A scheduling method based on integer programming is designed, and an incremental solution strategy is proposed to address the low computational efficiency caused by the large number of decision variables and the high coupling of constraints in large-scale networks. The effectiveness of the deterministic scheduling model and theoretical analysis is validated through experiments, and the impact of various scheduling factors on total flow delay and scheduling scale is examined.
    Available online:  June 27, 2025 , DOI: 10.13328/j.cnki.jos.007294
    Abstract:
    Business rules are crucial for the securities domain and serve as the source of requirements for securities trading systems. Due to the variability of these business rules, how to improve the efficiency of specifying software requirements from business rule trading documents has become a core problem. The securities business rule documents feature numerous software-unrelated descriptions, abundant professional terms, and many context-related expressions and abstract representations, which necessitate the support of domain-specific knowledge for automatic specification. As a result, how to integrate the domain-related knowledge into the automatic process becomes a key problem for specification. This study proposes an automatic specification method for securities domain businesses integrating large language models and the domain knowledge base. It leverages the large language models, employing techniques such as fine-tuning and in-context learning to embed domain knowledge for natural language processing tasks such as rule classification and requirement information extraction. Additionally, this study also employs the domain knowledge base to provide professional knowledge and assist in the operationalization and relationship extraction of requirements. Finally, requirement specification in the form of data flow is formed. The evaluation results show that the proposed approach can process business rule documents in various securities trading fields, achieving an average function point identification rate of 91.97% on the evaluation dataset, which matches or even surpasses the level of experts in the domain, with the efficiency improved by an average of 10 times compared to human participants.
    Available online:  June 25, 2025 , DOI: 10.13328/j.cnki.jos.007385
    Abstract:
    With the rapid growth of network applications such as cloud computing, mobile internet, and artificial intelligence, network attacks and threats are becoming increasingly frequent and complex. This necessitates the development of network security defense technologies capable of effectively countering these threats and ensuring the security of critical infrastructure networks. Traditional defense technologies based on middleboxes can achieve high performance using specialized hardware; however, these solutions are costly, and deploying new defenses typically requires hardware upgrades. Software-based defense technologies offer high flexibility, but software-based packet processing leads to significant performance overhead. The emergence of programmable switches presents new opportunities for network security defense by offering notable advantages in both flexibility and performance, making this a prominent research focus. This study first reviews the origin and architecture of programmable switches and explores their relevant features and advantages in network security applications, including ease of management, low cost, high flexibility, and high performance. Subsequently, from the perspective of the basic triad of network security defense, namely prevention, detection, and response, this study systematically elaborates on various defense techniques utilizing programmable switches, such as access control, network scanning, network obfuscation, deep packet inspection, DDoS detection and mitigation, and intelligent data planes. The design principles, implementation mechanisms, and potential limitations of these technologies are analyzed. Finally, an outlook is provided on future research directions for network security based on programmable switches.
    Available online:  June 25, 2025 , DOI: 10.13328/j.cnki.jos.007400
    Abstract:
    Knowledge graph (KG), with their unique approach to knowledge management and representation capabilities, have been widely applied in various knowledge computing fields, including question answering. However, incomplete information is often present in KG, which undermines their quality and limits the performance of downstream tasks. As a result, knowledge graph completion (KGC) has emerged, aiming to enhance the quality of KG by predicting the missing information in triples using different methods. In recent years, extensive research has been conducted in the field of KGC. This study classifies KGC techniques into three categories based on the number of samples used: zero-shot KGC, few-shot KGC, and multi-shot KGC. To investigate and provide a first-hand reference for the core concepts and current status of KGC research, this study offers a comprehensive review of the latest research advancements in KGC from theoretical research, experimental analysis, and practical applications, such as the Huapu system. The problems and challenges faced by the current KGC technologies are summarized, and potential research directions for the future are discussed.
    Available online:  June 25, 2025 , DOI: 10.13328/j.cnki.jos.007399
    Abstract:
    The use of computer technology for intelligent management of genealogy data plays a significant role in inheriting and popularizing Chinese traditional culture. In recent years, with the widespread application of retrieval-augmented large language model (LLM) in the knowledge question-answering (Q&A) field, presenting diverse genealogy scenarios to users through dialogues with LLMs has become a highly anticipated research direction. However, the heterogeneity, autonomy, complexity, and evolution (HACE) characteristics of genealogy data pose challenges for existing knowledge retrieval frameworks to perform comprehensive knowledge reasoning within complex genealogy information. To address this issue, Huaputong, a genealogy Q&A system based on LLMs with knowledge graph reasoning, is proposed. A knowledge graph reasoning framework, suitable for LLM-based genealogy Q&A, is constructed from two aspects: logic reasoning completeness and information filtering accuracy. In terms of the completeness of logic reasoning, knowledge graphs are used as the medium for genealogy knowledge, and a comprehensive set of genealogy reasoning rules based on the Jena framework is proposed to improve the retrieval recall of genealogy knowledge reasoning. For information filtering, scenarios involving name ambiguity and multiple kinship relations in genealogy are considered. A multi-condition matching mechanism based on problem-condition triples and a Dijkstra path ranking algorithm using a max heap are designed to filter redundant retrieval information, thus ensuring accurate prompting for LLMs. Huaputong has been deployed on the Huapu platform, a publicly available intelligent genealogical website, where its effectiveness has been validated using real-world genealogical data.
    Available online:  June 18, 2025 , DOI: 10.13328/j.cnki.jos.007403
    Abstract:
    Quality issues, such as errors or deficiencies in triplets, become increasingly prominent in knowledge graphs, severely affecting the credibility of downstream applications. Accuracy evaluation is crucial for building confidence in the use and optimization of knowledge graphs. An embedding-model-based method is proposed to reduce reliance on manually labeled data and to achieve scalable automatic evaluation. Triplet verification is formulated as an automated threshold selection problem, with three threshold selection strategies proposed to enhance the robustness of the evaluation. In addition, triplet importance indicators are incorporated to place greater emphasis on critical triplets, with importance scores defined based on network structure and relationship semantics. Experiments are conducted to analyze and compare the impact on performance from various perspectives, such as embedding model capacity, knowledge graph sparsity, and triplet importance definition. The results demonstrate that, compared to existing automated evaluation methods, the proposed method can significantly reduce evaluation errors by nearly 30% in zero-shot conditions, particularly on datasets of dense graphs with high error rates.
    Available online:  June 18, 2025 , DOI: 10.13328/j.cnki.jos.007404
    Abstract:
    Session-based recommendation aims to predict the next item a user will interact with based on a series of items. Most existing session-based recommender systems do not fully utilize the temporal interval information between items within a session, affecting the accuracy of recommendations. In recent years, graph neural networks have gained significant attention in session-based recommendation due to their strong ability to model complex relationships. However, session-based recommendations that rely solely on graph neural networks overlook the hidden high-order relationships between sessions, resulting in less rich information. In addition, data sparsity has always been a phenomenon in recommender systems, and contrastive learning is often employed to address this issue. However, most contrastive learning frameworks lack strong generalization capabilities due to their singular form. Based on this, a session-based recommendation model combined with self-supervised learning is proposed. First, the model utilizes the temporal interval information between items within user sessions to perform data augmentation, enriching the information within the sessions to improve recommendation accuracy. Second, a dual-view encoder is constructed, combining a hypergraph convolutional network encoder and a Transformer encoder to capture the hidden high-order relationships between sessions from multiple perspectives, thus enhancing the diversity of recommendations. Finally, the model integrates the augmented intra-session information, the multi-viewed inter-session information, and the original session information for contrastive learning to strengthen the model’s generalization ability. Comparisons with 11 existing classic models on 4 datasets show that the proposed model is feasible and efficient, with average improvements of 5.96% and 5.89% on HR and NDCG metrics, respectively.
    Available online:  June 11, 2025 , DOI: 10.13328/j.cnki.jos.007401
    Abstract:
    Knowledge graph completion (KGC) models require inductive ability to generalize to new entities as the knowledge graph expands. However, current approaches understand entities only from a local perspective by aggregating neighboring information, failing to capture valuable interconnections between entities across different views. This study argues that global and sequential perspectives are essential for understanding entities beyond the local view by enabling interaction between disconnected and distant entity pairs. More importantly, it emphasizes that the aggregated information must be complementary across different views to avoid redundancy. Therefore, a multi-view framework with the differentiation mechanism is proposed for inductive KGC, aimed at learning complementary entity representations from various perspectives. Specifically, in addition to aggregating neighboring information to obtain the entity’s local representation through R-GCN, an attention-based differentiation mechanism is employed to aggregate complementary information from semantically related entities and entity-related paths, thus obtaining global and sequential representations of the entities. Finally, these representations are fused and used to score the triples. Experimental results demonstrate that the proposed framework consistently outperforms state-of-the-art approaches in the inductive setting. Moreover, it retains competitive performance in the transductive setting.
    Available online:  June 11, 2025 , DOI: 10.13328/j.cnki.jos.007397
    Abstract:
    In the field of model-based diagnosis, the system description is first encoded, and all minimal conflict sets are obtained using a mature SAT solver. Finally, the minimal hitting set of the minimal conflict sets is computed as the candidate diagnosis for the equipment to be diagnosed. However, this strategy consumes a significant amount of time, as it is equivalent to solving two NP-hard problems: computing the minimal conflict set and the minimal hitting set. This study re-encodes the description of the circuit system and proposes a novel variant hitting set algorithm, HSDiag, which can directly compute the diagnosis from the encoding. Compared to state-of-the-art diagnosis algorithms that first solve conflict sets and then hitting sets, the efficiency improves by a factor of 5 to 100. As the number of circuit components increases, the encoding clauses increase linearly, while the number of diagnoses increases exponentially. Since solving all conflict sets of large-scale circuits (ISCAS-85) is impractical, the proposed HSDiag algorithm, within the same cutoff time, yields more than twice the number of solutions compared to conflict-set-based diagnosis algorithms. In addition, this study proposes an equivalence class optimization strategy, which further decomposes the conflict set by using the newly proposed set splitting rule, even if the initial conflict set is inseparable. The efficiency of the HSDiag algorithm optimized by equivalence class is improved by more than 2 times in standard Polybox and Fulladder circuits.
    Available online:  June 11, 2025 , DOI: 10.13328/j.cnki.jos.007396
    Abstract:
    Smart contracts are computer programs running on blockchain platforms, which extend the functionality of the blockchain and enable complex applications. However, the potential security vulnerabilities of smart contracts can lead to significant financial losses. Symbolic execution-based security vulnerability detection methods offer advantages such as high accuracy and the ability to generate test cases that can reproduce vulnerabilities. Nevertheless, as the code size increases, symbolic execution faces challenges such as path explosion and excessive constraint-solving overhead. To address those issues, a novel approach for detecting smart contract security vulnerabilities through target-guided symbolic execution is proposed. First, vulnerable statements identified by static analysis tools or manually are treated as targets.The statements that depend on these target statements are analyzed, and the transaction sequence is augmented with symbolic constraints for the relevant variables. Second, the control flow graph (CFG) is constructed based on the bytecode of smart contracts, with the basic blocks containing the target statements and the dependentstatements located. The CFG is then pruned to generate guidance information. Third, path exploration in symbolic execution is optimized by reducing the number of basic blocks to be analyzed and reducing the time required for solving path constraints. With the guidance information, vulnerabilities are efficiently detected, and test cases capable of reproducing the vulnerabilities are generated. Based on this approach, a prototype tool named Smart-Target is developed. Experiments conducted on the SB Curated dataset in comparison with the symbolic execution tool, Mythril, demonstrate that Smart-Target reduces time overheads by 60.76% and 92.16% in vulnerability detection and replication scenarios, respectively. In addition, the analysis of target statementdependencies enhances vulnerability detection capability by identifying 22.02% more security vulnerabilities.
    Available online:  June 11, 2025 , DOI: 10.13328/j.cnki.jos.007407
    Abstract:
    With the increasing adoption of heterogeneous integrated architectures in high-performance computing, it has become essential to harness their potential and explore new strategies for application development. Traditional static compilation methodologies are no longer sufficient to meet the complex computational demands. Therefore, dynamic programming languages, known for their flexibility and efficiency, are gaining prominence. Julia, a modern high-performance language characterized by its JIT compilation mechanism, has demonstrated significant performance in fields such as scientific computing. Targeting the unique features of the Sunway heterogeneous many-core architecture, the ORCJIT engine is introduced, along with an on-chip storage management approach specifically designed for dynamic modes. Based on these advancements, swJulia is developed as a Julia dynamic language compiler tailored for the new generation of the Sunway supercomputer. This compiler not only inherits the flexibility of the Julia compiler but also provides robust support for the SACA many-core programming model and runtime encapsulation. By utilizing the swJulia compilation system, the deployment of the NNQS-Transformer quantum chemistry simulator on the new generation of the Sunway supercomputer is successfully achieved. Comprehensive validation across multiple dimensions demonstrates the efficacy and efficiency of swJulia. Experimental results show exceptional performance in single-threaded benchmark tests and many-core acceleration, significantly improving ultra-large-scale parallel simulations for the NNQS-Transformer quantum chemistry simulator.
    Available online:  June 11, 2025 , DOI: 10.13328/j.cnki.jos.007409
    Abstract:
    Temporal logic has been extensively applied in domains such as formal verification and robotics control, yet it remains challenging for non-expert users to master. Therefore, the automated extraction of temporal logic formulas from natural language texts is crucial. However, existing efforts are hindered by issues such as sparse sample availability and the ambiguity of natural language semantics, which impede the accurate identification of implicit temporal semantics within natural language texts, thus leading to errors in the translation of the original natural language semantics into temporal logic formulas. To address this issue, a novel method for temporal logic semantic analysis based on a few-shot learning network, termed FSLNets-TLSA, is proposed. This method employs data preprocessing techniques to enhance the temporal semantic logic features of the text. The network architecture consists of an encoder, an induction module, and a relation module, which aim to capture the implicit temporal logic semantic information in the input text. In addition, an enhancement module is incorporated to improve the accuracy of monitoring semantic recognition. The effectiveness of the proposed method is validated through experimental evaluations conducted on three public datasets comprising a total of 3 533 samples, and a comparison with similar tools. The analysis demonstrates an average Accuracy, Recall, and F1-score of 96.55%, 96.29%, and 96.42%, respectively.
    Available online:  June 04, 2025 , DOI: 10.13328/j.cnki.jos.007410
    Abstract:
    In recent years, the increasing complexity of space missions has led to an exponential growth in space-generated data. However, limited satellites-to-ground bandwidth and scarce frequency resources pose significant challenges to traditional bent-pipe architecture, which faces severe transmission bottlenecks. In addition, onboard data must wait for satellites to pass over ground stations before transmission. The large-scale construction of ground stations is not only cost-prohibitive but also carries geopolitical and economic risks. Satellite edge computing has emerged as a promising solution to these bottlenecks by integrating mobile edge computing technology into satellite edges. This approach significantly enhances user experience and reduces redundant network traffic. By enabling onboard data processing, satellite edge computing shortens data acquisition times and reduces reliance on extensive ground station infrastructure. Furthermore, the integration of artificial intelligence (AI) and edge computing technologies offers an efficient and forward-looking path to address existing challenges. This study reviews the latest progress in intelligent satellite edge computing. First, the demands and applications of satellite edge computing in various typical scenarios are discussed. Next, key challenges and recent research advancements in this field are analyzed. Finally, several open research topics are highlighted, and new ideas are proposed to guide future studies. This discussion aims to provide valuable insights to promote technological innovation and the practical implementation of satellite edge computing.
    Available online:  June 04, 2025 , DOI: 10.13328/j.cnki.jos.007408
    Abstract:
    With the rapid development of autonomous driving technology, the issue of vehicle control takeover has become a prominent research topic. A car equipped with an assisted driving system cannot fully handle all driving scenarios. When the actual driving scenario exceeds the operational design domain of the assisted system, human intervention is still required to control the vehicle and ensure the safe completion of the driving task. Takeover performance is an extremely important metric for evaluating a driver’s performance during the takeover process, which includes takeover reaction time and takeover quality. The takeover reaction time refers to the time from the system’s takeover request to the driver’s control of the steering wheel. The length of the takeover response time not only reflects the driver’s current state but also affects the subsequent handling of complex scenarios. Takeover quality refers to the quality of manual vehicle operation by the driver after regaining control. This study, based on the CARLA driving simulator, constructs 6 typical driving scenarios, simulates the vehicle control takeover process, and collects physiological signals and eye movement data from 31 drivers using a multi-channel acquisition system. Based on the driver’s takeover performance, and regarding International standards, an objective takeover performance evaluation metric is proposed, incorporating the driver’s takeover reaction time, maximum horizontal and vertical accelerations, and minimum collision time, derived from multiple vehicle data. By combining driver data, vehicle data, and scenario data, a deep neural network (DNN) model predicts takeover performance, while the SHAP model analyzes the impact of each feature, improving the model’s interpretability and transparency. The experimental results show that the proposed DNN model outperforms traditional machine learning methods in predicting takeover performance, achieving an accuracy of 92.2% and demonstrating good generalization. The SHAP analysis reveals the impact of key features such as heart rate variability, driving experience, and minimum safe distance on the prediction results. This research provides a theoretical and empirical foundation for the safety optimization and human-computer interaction design of autonomous driving systems and is of great significance for improving the efficiency and safety of human-vehicle cooperation in autonomous driving technology.
    Available online:  June 04, 2025 , DOI: 10.13328/j.cnki.jos.007406
    Abstract:
    The compiler is one of the most relied-upon performance tuning tools for program developers. However, due to the limited precision encoding of floating-point numbers, many compiler optimization options can alter the semantics of floating-point calculations, leading to result inconsistency. Locating the program statements that cause compilation optimization-induced result inconsistency is crucial for performance tuning and result reproducibility. The state-of-the-art approach employs precision enhancement-based binary search to locate the code snippets causing result inconsistency but suffers from insufficient support for multi-source localization and low search efficiency. This study proposes a floating-point instruction difference-guided Delta-Debugging localization method, FI3D, which utilizes the backtracking mechanism in Delta-Debugging to better support multi-source problem code localization and exploits the differences in floating-point instruction sequences under different compiler optimization options to guide the localization. FI3D is evaluated using 6 applications from the NPB benchmark, 10 programs from the GNU scientific library, and 2 programs from the floatsmith mixed-precision benchmark. Experimental results demonstrate that FI3D successfully locates the 4 applications where PLiner fails and achieves an average 26.8% performance improvement for the 14 cases successfully located by PLiner.
    Available online:  June 04, 2025 , DOI: 10.13328/j.cnki.jos.007380
    Abstract:
    With the widespread adoption and rapid advancement of open-source software, the maintenance of open-source software projects has become a critical phase within the software development cycle. As a globally representative developer community, GitHub hosts numerous software project repositories with similar functionalities within the same domain, creating challenges for users when selecting the appropriate project repository for use or further development. Therefore, accurate identification of project repository maintenance status holds substantial practical value. However, the GitHub platform does not provide direct metrics for assessing the maintenance status of repositories. This study proposes an automatic identification method for project repository maintenance status based on machine learning. A classification model, GitMT, has been developed and implemented to achieve this objective. By effectively integrating dynamic time series features and descriptive features, the proposed model enables accurate identification of “active” and “unmaintained” repository status. Through a series of experiments conducted on large-scale real-world data, an AUC value of 0.964 is achieved in maintenance status identification tasks. In addition, this study constructs an open-source dataset centered on the maintenance status of software project repositories—GitMT Dataset: https://doi.org/10.7910/DVN/OJ2NI3.
    Available online:  May 22, 2025 , DOI: 10.13328/j.cnki.jos.007371
    Abstract:
    Entity alignment (EA) aims to identify equivalent entities across different knowledge graph (KG). Embedding-based EA methods still have several limitations, listed below. First, the heterogeneous structures within KGs are not fully modeled. Second, the utilization of text information is constrained by word embeddings. Third, alignment inference algorithms are underexplored. To address these limitations, we propose a heterogeneous graph attention network for entity alignment (HGAT-EA). HGAT-EA consists of two channels: one for learning structural embeddings and the other for learning character-level semantic embeddings. The first channel employs a heterogeneous graph attention network (HGAT), which fully leverages heterogeneous structures and relation triples to learn entity embeddings. The second channel utilizes character-level literals to learn character-level semantic embeddings. HGAT-EA incorporates multiple views through these channels and maximizes the use of heterogeneous structures through HGAT. HGAT-EA introduces three alignment inference algorithms. Experimental results validate the effectiveness of HGAT-EA. Following these results, we provide detailed analyses of the various components of HGAT-EA and present the corresponding conclusions.
    Available online:  May 22, 2025 , DOI: 10.13328/j.cnki.jos.007372
    Abstract:
    The reproducibility of scientific research results is a fundamental guarantee for the reliability of scientific research and the cornerstone of scientific and technological advancement. However, the research community is currently facing a serious reproducibility crisis, with many research results published in top journals and conferences being irreproducible. In the field of data science, the reproducibility of research results faces challenges such as heterogeneous research data from multiple sources, complex computational processes, and intricate computational environments. To address these issues, this study proposes ReproLink, a reproducibility-oriented research data management system. ReproLink constructs a unified model of research data, abstracting it into research data objects that consist of three elements: identifier, attribute set, and data entity. Through fine-grained modeling of the reproduction process, ReproLink establishes a precise method for describing multi-step, complex reproduction processes. By integrating code and operating environment modeling, ReproLink eliminates the uncertainties caused by different environments affecting code execution. Performance tests and case studies show that ReproLink performs well with data scales up to one million records, demonstrating practical value in real-world scenarios such as paper reproduction and data provenance tracking. The technical architecture of ReproLink has been integrated into Conow Software, the only integrated comprehensive management and service platform in China specifically designed for scientific research institutes, supporting the reproducibility needs of hundreds of such institutes across the country.
    Available online:  May 22, 2025 , DOI: 10.13328/j.cnki.jos.007362
    Abstract:
    The longest common subsequence (LCS) is a practical metric for assessing code similarity. However, traditional LCS-based methods face challenges in scalability and in effectively capturing critical semantics for identifying code fragments that are textually different but semantically similar, due to their reliance on discrete representation-based token encoding. To address these limitations, this study proposes an LCS-oriented embedding method that encodes code into low-dimensional dense vectors, effectively capturing semantic information. This transformation enables the computationally expensive LCS calculation to be replaced with efficient vector arithmetic, further accelerated using an approximate nearest neighbor algorithm. To support this approach, an embeddable LCS-based distance metric is developed, as the original LCS metric is non-embeddable. Experimental results demonstrate that the proposed metric outperforms tree-based and literal similarity metrics in detecting complex code clones. In addition, two targeted loss functions and corresponding training datasets are designed to prioritize retaining critical semantics in the embedding process, allowing the model to identify textually different but semantically similar code elements. This improves performance in detecting complex code similarities. The proposed method demonstrates strong scalability and high accuracy in detecting complex clones. When applied to similar bug identification, it has reported 23 previously unknown bugs, all of which are confirmed by developers in real-world projects. Notably, several of these bugs are complex and challenging to detect using traditional LCS-based techniques.
    Available online:  May 14, 2025 , DOI: 10.13328/j.cnki.jos.007356
    Abstract:
    Since the advent of Bitcoin, blockchain technology has profoundly influenced numerous fields. However, the absence of effective communication mechanisms between heterogeneous and isolated blockchain systems has hindered the advancement and sustainable development of the blockchain ecosystem. In response, cross-chain technology has emerged as a rapidly evolving field and a focal point of research. The decentralized nature of blockchain, coupled with the complexity of cross-chain scenarios, introduces significant security challenges. This study proposes a formal analysis of the IBC (inter-blockchain communications) protocol, one of the most widely adopted cross-chain communication protocols, to assist developers in designing and implementing cross-chain technologies with enhanced security. The IBC protocol is formalized using TLA+, a temporal logic specification language, and its critical properties are verified through the model-checking tool TLC. An in-depth analysis of the verification results reveals several issues impacting the correctness of packet transmission and token transfer. Corresponding recommendations are proposed to mitigate these security risks. The findings have been reported to the IBC developer community, with most of them receiving acknowledgment.
    Available online:  May 14, 2025 , DOI: 10.13328/j.cnki.jos.007293
    Abstract:
    The purpose of text-image person re-identification is to employ the text description to retrieve the target persons in the image database. The main challenge of this technology is to embed image and text features into common potential space to achieve cross-modal alignment. Many existing studies try to adopt separate pre-trained unimodal models to extract visual and text features, and then employ segmentation or attention mechanisms to obtain explicit cross-modal alignment. However, these explicit alignment methods generally lack the underlying alignment ability needed to effectively match multimodal features, and the utilization of preset cross-modal correspondence to achieve explicit alignment may result in modal information distortion. An implicit multi-scale alignment and interaction for text-image person re-identification method is proposed. Firstly, the semantic consistent feature pyramid network is employed to extract multi-scale features of the images, and attention weights are adopted to fuse different scale features including global and local information. Secondly, the association between image and text is learned using a multivariate interaction attention mechanism, which can effectively capture the corresponding relationship between different visual features and text information, narrow the gap between modes, and achieve implicit multi-scale semantic alignment. Additionally, the foreground enhancement discriminator is adopted to enhance the target person and extract purer person features, which is helpful for alleviating the information inequality between images and texts. Experimental results on three mainstream text-image person re-identification datasets of CUHK-PEDES, ICFG-PEDES and RSTPReid show that the proposed method effectively improves the cross-modal retrieval performance, which is 2%?9% higher than the Rank-1 of SOTA algorithm.
    Available online:  May 14, 2025 , DOI: 10.13328/j.cnki.jos.007376
    Abstract:
    Software vulnerabilities are code segments in software that are prone to exploitation. Ensuring that software is not easily attacked is a crucial security requirement in software development. Software vulnerability prediction involves analyzing and predicting potential vulnerabilities in software code. Deep learning-driven software vulnerability prediction has become a popular research field in recent years, with a long time span, numerous studies, and substantial research achievements. To review relevant research findings and summarize the research hotspots, a survey of 151 studies related to deep learning-driven software vulnerability prediction published between 2017 and 2024 is conducted. It summarizes the research problems, progress, and challenges discussed in the literature, providing a reference for future research.
    Available online:  May 14, 2025 , DOI: 10.13328/j.cnki.jos.007377
    Abstract:
    A timer is used to schedule and execute delayed tasks in an operating system. It operates asynchronously in an atomic context and can execute concurrently with different threads at any time. If developers fail to account for all possible scenarios of multithread interleaving, various types of concurrency bugs may be introduced, posing a serious threat to the security of the operating system. Timer concurrency bugs are more difficult to detect than typical concurrency bugs because they involve not only multithread interleaving but also the delayed and repeated scheduling of timer handlers. Currently, there are no tools that can effectively detect such bugs. In this study, three types of timer concurrency bugs are summarized: sleeping timer bugs, timer deadlock bugs, and zombie timer bugs. To enhance detection efficiency, firstly, all timer-related code is extracted through pointer analysis, reducing unnecessary analysis overhead. A context-sensitive, path-sensitive, and flow-sensitive interprocedural control flow graph is then constructed to provide a foundation for subsequence analysis. Based on static analysis techniques, including call graph traversal, lockset analysis, points-to analysis, and control flow analysis, three detection algorithms are designed to identify the different types of timer concurrency bugs. To evaluate the effectiveness of the proposed algorithm, they are applied to the Linux 5.15 kernel, where 328 real-world timer concurrency bugs are detected. A total of 56 patches are submitted to the Linux kernel community, with 49 patches merged into the mainline kernel, 295 bugs confirmed and fixed, and 14 CVE identifiers assigned. These results demonstrate the effectiveness of the proposed method. Finally, a systematic analysis of performance, false positives, and false negatives is conducted through comparative experiments, and methods for repairing the three types of bugs are summarized.
    Available online:  May 14, 2025 , DOI: 10.13328/j.cnki.jos.007378
    Abstract:
    With the rapid development of embedded technology, mobile computing, and the Internet of Things (IoT), an increasing number of sensing devices have been integrated into people’s daily lives, including smartphones, cameras, smart bracelets, smart routers, and headsets. The sensors embedded in these devices facilitate the collection of personal information such as location, activities, vital signs, and social interactions, thus fostering a new class of applications known as human-centric sensing. Compared with traditional sensing methods, including wearable-based, vision-based, and wireless signal-based sensing, millimeter wave (mmWave) signals offer numerous advantages, such as high accuracy, non-line-of-sight capability, passive sensing (without requiring users to carry sensors), high spatiotemporal resolution, easy deployment, and robust environmental adaptability. The advantages of mmWave-based sensing have made it a research focus in both academia and industry in recent years, enabling non-contact, fine-grained perception of human activities and physical signs. Based on an overview of recent studies, the background and research significance of mmWave-based human sensing are examined. The existing methods are categorized into four main areas: tracking and positioning, motion recognition, biometric measurement, and human imaging. Commonly used publicly available datasets are also introduced. Finally, potential research challenges and future directions are discussed, highlighting promising developments toward achieving accurate, ubiquitous, and stable human perception.
    Available online:  May 14, 2025 , DOI: 10.13328/j.cnki.jos.007379
    Abstract:
    In recent years, impressive capabilities have been demonstrated by deep learning-based vulnerability detection models in detecting vulnerabilities. Previous research has widely explored adversarial attacks using variable renaming to introduce disturbances in source code and evade detection. However, the effectiveness of introducing multiple disturbances through various transformation techniques in source code has not been adequately investigated. In this study, multiple synonymous transformation operators are applied to introduce disturbances in source code. A combination optimization strategy based on genetic algorithms is proposed, enabling the selection of source code transformation operators with the highest fitness to guide the generation of adversarial code segments capable of evading vulnerability detection. The proposed method is implemented in a framework named non-vulnerability generator (NonVulGen) and evaluated against deep learning-based vulnerability detection models. When applied to recently developed deep learning models, an average attack success rate of 91.38% is achieved against the CodeBERT-based model and 93.65% against the GraphCodeBERT-based model, representing improvements of 28.94% and 15.52% over state-of-the-art baselines, respectively. To assess the generalization ability of the proposed attack method, common models including Devign, ReGVD, and LineVul are targeted, achieving average success rates of 98.88%, 97.85%, and 92.57%, respectively. Experimental results indicate that adversarial code segments generated by NonVulGenx cannot be effectively distinguished by deep learning-based vulnerability detection models. Furthermore, significant reductions in attack success rates are observed after retraining the models with adversarial samples generated based on the training data, with a decrease of 96.83% for CodeBERT, 97.12% for GraphCodeBERT, 98.79% for Devign, 98.57% for ReGVD, and 97.94% for LineVul. These findings reveal the critical challenge of adversarial attacks in deep learning-based vulnerability detection models and highlight the necessity for model reinforcement before deployment.
    Available online:  May 07, 2025 , DOI: 10.13328/j.cnki.jos.007309
    Abstract:
    With the continuous development of information technology, the quantity and variety of software products are increasing, but even high-quality software may contain vulnerabilities. In addition, the software update speed is fast, and the software architecture is increasingly complex, which leads to the gradual evolution of vulnerabilities into new forms. Consequently, traditional vulnerability detection methods and rules are difficult to apply to new vulnerability features. Due to the scarcity of zero-day vulnerability samples, zero-day vulnerabilities that appear in the software evolution process are difficult to find, which brings great potential risks to software security. This study proposes a vulnerability sample generation method based on abstract syntax tree mutation, which can simulate the structure and syntax rules of real vulnerabilities, generate vulnerability samples more in line with the actual situation, and provide a more effective solution for software security and reliability. This method analyzes the abstract syntax tree structure generated by Eclipse CDT, extracts the syntactic information in the nodes, reconstructs the nodes and abstract syntax trees, optimizes the abstract syntax tree structure, and designs a series of mutation operators. Subsequently, it performs mutation operations on the optimized abstract syntax trees. The method proposed in this paper can generate mutation samples with the characteristics of UAF and CUAF vulnerabilities, which can be used for the detection of zero-day vulnerabilities and help to improve the detection rate of zero-day vulnerabilities. Experimental results show that this method reduces the invalid sample size by 34% on average compared with the random variation method in traditional detection methods, and can generate more complex mutated samples. In addition, this method can generate more complex mutated samples, enhancing the coverage and accuracy of detection.
    Available online:  April 30, 2025 , DOI: 10.13328/j.cnki.jos.007311
    Abstract:
    To address the issue of untrustworthy behaviors resulting from malicious attackers exploiting security vulnerabilities within smart contracts in the consortium blockchain system, this study introduces a trusted verification mechanism of smart contract behavior for consortium blockchain to conduct trusted verification for contract behavior integrity. Firstly, the proposed approach takes the system call as the smallest behavior unit and describes the historical behavioral state with the behavior sequence based on system calls. Subsequently, on the premise of ensuring the trustworthiness of contract code release and the execution environment, it performs trusted verification according to predefined behavioral rules during contract execution. Finally, a theoretical analysis of this mechanism is carried out, and an experimental evaluation is conducted in the Hyperledger Fabric environment. Results demonstrate that the proposed method can effectively achieve the trusted verification of smart contract behavior and ensure the credibility of behavior within the life cycle of smart contracts.
    Available online:  April 30, 2025 , DOI: 10.13328/j.cnki.jos.007373
    Abstract:
    Chinese idioms, as an essential part of Chinese writing, possess concise expressiveness and profound cultural significance. They are typically phrases or short sentences that have become fixed through long-term use, with diverse origins and relatively stable meanings. However, due to the pictographic nature of Chinese characters and the historical evolution of Chinese vocabulary and semantics, there is often a discrepancy between the literal and actual meanings of idioms, which exhibits a unique non-compositional characteristic. This feature makes idioms prone to misuse of idioms in practice, with research showing that certain idioms are misused at a rate as high as 98.6%. Unlike in other languages, the misuse of Chinese idioms does not typically result in lexical or grammatical errors, which makes traditional spelling and grammar error detection methods ineffective at identifying idiom misuse. An intuitive approach is to incorporate the interpretations of idioms into the model, but simply combining these interpretations can lead to problems such as excessively long sentences that are hard to process and noise in knowledge. To address this, this study proposes a novel model that uses levitating knowledge injection to incorporate idiom interpretations. This model introduces learnable weight factors to control the injection process and explores effective strategies for knowledge infusion. To validate the model’s effectiveness, a dataset specifically for diagnosing the misuse of Chinese idioms is created. Experimental results show that the model achieves optimal performance across all test sets, particularly in complex scenarios involving long texts and multiple idioms, where its performance improves by 12.4%–13.9% compared to the baseline model. At the same time, training speed increases by 30%–40%, and testing speed is improved by 90%. These results demonstrate that the proposed model not only effectively integrates the interpretative features of idioms but also significantly reduces the negative impact of interpretation concatenation on the model’s processing capacity and efficiency, thus enhancing the performance of Chinese idiom misuse diagnosis and strengthening the model’s ability to handle complex scenarios with multiple idioms and lengthy interpretations.
    Available online:  April 25, 2025 , DOI: 10.13328/j.cnki.jos.007370
    Abstract:
    Knowledge graph (KG), as structured representations of knowledge, has a wide range of applications in the medical field. Entity alignment, which involves identifying equivalent entities across different KGs, is a fundamental step in constructing large-scale KGs. Although extensive research has focused on this issue, most of it has concentrated on aligning pairs of KGs, typically by capturing the semantic and structural information of entities to generate embeddings, followed by calculating embedding similarity to identify equivalent entities. This study identifies the problem of alignment error propagation when aligning multiple KGs. Given the high accuracy requirements for entity alignment in medical contexts, we propose a multi-source Chinese medical knowledge graph entity alignment method (MSOI-Align) that integrates entity semantics and ontology information. Our method pairs multiple KGs and uses representation learning to generate entity embeddings. It also incorporates both the similarity of entity names and ontology consistency constraints, leveraging a large language model to filter a set of candidate entities. Subsequently, based on triadic closure theory and the large language model, MSOI-Align automatically identifies and corrects the propagation of alignment errors for the candidate entities. Experimental results on four Chinese medical knowledge graphs show that MSOI-Align significantly enhances the precision of the entity alignment task, with the Hits@1 metric increasing from 0.42 to 0.92 compared to the state-of-the-art baseline. The fused knowledge graph, CMKG, contains 13 types of ontologies, 190000 entities, and approximately 700000 triplets. Due to copyright restrictions on one of the KGs, we are releasing the fusion of the other three KGs, named OpenCMKG.
    Available online:  April 23, 2025 , DOI: 10.13328/j.cnki.jos.007381
    Abstract:
    The prediction of future water quality, which involves leveraging historical water quality data from various observation nodes and their corresponding topological relationships, is recognized as a critical application of graph neural networks in environmental protection. This task is complicated by the presence of noise within both the collected numerical data and the inter-node topological structures, compounded by a coupling phenomenon. The varying directions of pollutant flow intensify the complexity of coupling between numerical and structural noise. To address these challenges, a novel tendency-aware graph neural network is proposed for water quality prediction with coupled noise. First, historical water quality trend features are used to uncover local interdependencies among raw water quality indicators, enabling the construction of multiple potential hydrological topological structures and the disentanglement of structural noise. Second, spatio-temporal features are extracted from the constructed adjacency matrices and original data to separate numerical noise. Finally, water quality predictions are obtained by aggregating coherent node representations derived from the inferred latent structures across pre- and post-structure construction phases. Experimental results demonstrate that the proposed method outperforms state-of-the-art models on real-world datasets and generates potential hydrological topological structures that closely align with actual observations. The code and data are publicly available on GitHub: https://github.com/aTongs1/TaGNN.
    Available online:  April 23, 2025 , DOI: 10.13328/j.cnki.jos.007374
    Abstract:
    Attributed graphs are increasingly used to represent data with relational structures, and detecting anomalies with them is gaining attention. Due to their characteristics, such as rich attribute information and complex structural relationships, various types of anomalies may exist, including global, structural, and community anomalies, which often remain hidden within the graph’s deep structure. Existing methods face challenges such as loss of structural information and difficulty identifying abnormal nodes. Structural information theory leverages encoding trees to represent hierarchical relationships within data and establishes correlations across different levels by minimizing structural entropy, effectively capturing the graph’s essential structure. This study proposes an anomaly detection method for attributed graphs based on structural entropy. First, by integrating the structural and attribute information of attributed graphs, a K-dimensional encoding tree to represent the hierarchical community structure through structural entropy minimization is constructed. Next, using the node attributes and hierarchical community information within the encoding tree, scoring mechanisms for detecting structural and attribute anomalies based on Euclidean distance and connection strength between nodes are designed. This approach identifies abnormal nodes and detects various types of anomalies. The proposed method is evaluated through comparative tests on several attributed graph datasets. Experimental results demonstrate that the proposed method effectively detects different types of anomalies and significantly outperforms existing state-of-the-art methods.
    Available online:  April 23, 2025 , DOI: 10.13328/j.cnki.jos.007375
    Abstract:
    Software vulnerabilities pose significant threats to real-world systems. In recent years, learning-based vulnerability detection methods, especially deep learning-based approaches, have gained widespread attention due to their ability to extract implicit vulnerability features from large-scale vulnerability samples. However, due to differences in features among different types of vulnerabilities and the problem of imbalanced data distribution, existing deep learning-based vulnerability detection methods struggle to accurately identify specific vulnerability types. To address this issue, this study proposes MulVD, a deep learning-based multi-class vulnerability detection method. MulVD constructs a structure-aware graph neural network (SA-GNN) that can adaptively extract local and representative vulnerability patterns while rebalancing the data distribution without introducing noise. The effectiveness of the proposed approach in both binary and multi-class vulnerability detection tasks is evaluated. Experimental results demonstrate that MulVD significantly improves the performance of existing deep learning-based vulnerability detection techniques.
    Available online:  April 23, 2025 , DOI: 10.13328/j.cnki.jos.007369
    Abstract:
    With the widespread adoption of programming naming conventions and the increasing emphasis on self-explanatory code, traditional summarizing code comments, which are often similar to code literal meaning, are losing appeal among developers. Instead, developers value supplementary code comments that provide additional information beyond the code itself to facilitate program understanding and maintenance. However, generating such comments typically requires external information resources beyond the code base, and the diversity of supplementary information presents significant challenges to existing methods. This study leverages Issue reports as a crucial external information source and proposes an Issue-based retrieval augmentation method using large language models (LLMs) to generate supplementary code comments. The proposed method classifies the supplementary information found in Issue reports into five categories, retrieves Issue sentences containing this information, and generates corresponding comments using LLMs. In addition, the code relevance and Issue verifiability of the generated comments are evaluated to minimize hallucinations. Experiments conducted on two popular LLMs, ChatGPT and GPT-4o, demonstrate the effectiveness of the proposed method. Compared to existing approaches, the proposed method significantly improves the coverage of manual supplementary comments from 33.6% to 72.2% for ChatGPT and from 35.8% to 88.4% for GPT-4o. Moreover, the generated comments offer developers valuable supplementary information, proving essential for understanding some tricky code.
    Available online:  April 18, 2025 , DOI: 10.13328/j.cnki.jos.007383
    Abstract:
    Stochastic optimization algorithms are recognized as essential for addressing large-scale data and complex models in machine learning. Among these, variance reduction methods, such as the STORM algorithm, have gained attention for their ability to achieve optimal convergence rates of $ {\mathrm{O}}\left({T}^{-1/3}\right) $. However, traditional variance reduction methods typically depend on specific problem parameters (e.g., the smoothness constant, noise variance, and gradient upper bound) for setting the learning rate and momentum, limiting their practical applicability. To overcome this limitation, this study proposes an adaptive variance reduction method based on a normalization technique, which eliminates the need for prior knowledge of problem parameters while maintaining optimal convergence rates. Compared to existing adaptive variance reduction methods, the proposed approach offers several advantages: (1) no reliance on additional assumptions, such as bounded gradients, bounded function values, or excessively large initial batch sizes; (2) the achievement of the optimal convergence rate of $ {\mathrm{O}}\left({T}^{-1/3}\right) $without extra term of $ {\mathrm{O}}\left(\mathrm{log}T\right)$; (3) a concise and straightforward proof, facilitating extensions to other stochastic optimization problems. The superiority of the proposed method is further validated through numerical experiments, demonstrating enhanced performance when compared to other approaches.
    Available online:  March 26, 2025 , DOI: 10.13328/j.cnki.jos.007318
    Abstract:
    Blockchain has shown strong vitality in the field of cryptocurrency investment, attracting the participation of a large number of investors. However, due to the anonymity of blockchain, it induces a lot of fraud, among which the Ponzi scheme smart contract is a typical fraudulent investment activity, causing huge economic losses for investors. Therefore, the detection of Ponzi scheme contracts on Ethereum becomes particularly important. Nevertheless, most existing studies have ignored control flow information in the source code of Ponzi scheme contracts. To extract more semantic and structural information from Ponzi scheme contracts, this study proposes a Ponzi scheme contract detection model based on code control flow graph. First, the model constructs the obtained contract source code in the form of a control flow diagram. Then, key features including data flow information and code structure information are extracted by the Word2Vec algorithm. Considering that the functions of each smart contract are different and the length of the code varies significantly, resulting in a large difference in the extracted feature vectors. In this study, feature vectors generated by different smart contracts are aligned so that all feature vectors have the same dimension, which is convenient for subsequent processing. Secondly, the feature learning module based on graph convolution and Transformer is utilized to introduce multi-head attention mechanism to learn the dependency of node features. Finally, the multilayer perceptron is used to identify the Ponzi scheme contract. By comparing the proposed model with the traditional graph feature learning model on the dataset provided by the Xblock website, the performance of the multi-head attention mechanism introduced by the model is verified. Experimental results demonstrate that this model effectively improves the ability to detect Ponzi scheme contracts.
    Available online:  March 26, 2025 , DOI: 10.13328/j.cnki.jos.007320
    Abstract:
    The application of artificial intelligence technology has extended from relatively static tasks such as classification, translation, and question answering to relatively dynamic tasks that require a series of “interaction-action” with the environment to be completed, like autonomous driving, robotic control, and games. The core of the model for executing such tasks is the sequential decision-making (SDM) algorithm. As it faces higher uncertainties of the environment and interaction and these tasks are often safety-critical systems, the testing techniques are confronted with great challenges. The existing testing technologies for intelligent algorithm models mainly focus on the reliability of a single model, the generation of diverse test scenarios for complex tasks, simulation testing, etc., while no attention is paid to the “interaction-action” decision sequence of the SDM model, leading to unadaptability or low cost-effectiveness. In this study, a fuzz testing method named IIFuzzing for intervening in the execution of inert “interaction-action” decision sequences is proposed. In the fuzz testing framework, by learning the “interaction-action” decision sequence pattern, the inert “interaction-action” decision sequences that will not trigger failure accidents are predicted and the testing execution of such sequences is terminated to improve the testing efficiency. The experimental evaluations are conducted in four common test configurations, and the results show that compared with the latest fuzz testing for SDM models, IIFuzzing can detect 16.7%–54.5% more failure accidents within the same time, and the diversity of accidents is also better than that of the baseline approach.
    Available online:  March 12, 2025 , DOI: 10.13328/j.cnki.jos.007310
    Abstract:
    With the continuous deepening of research on the security and privacy of deep learning models, researchers find that model stealing attacks pose a tremendous threat to neural networks. A typical data-dependent model stealing attack can use a certain percentage of real data to query the target model and train an alternative model locally to steal the target model. Since 2020, a novel data-free model stealing attack method has been proposed, which can steal and attack deep neural networks simply by using fake query examples generated by generative models. Since it does not rely on real data, the data-free model stealing attack can cause more serious damage. However, the diversity and effectiveness of the query examples constructed by the current data-free model stealing attack methods are insufficient, and there are problems of a large number of queries and a relatively low success rate of the attack during the model stealing process. Therefore, this study proposes a vision feature decoupling-based model stealing attack (VFDA), which decouples and generates the visual features of the query examples generated during the data-free model stealing process by using a multi-decoder structure, thus improving the diversity of query examples and the effectiveness of model stealing. Specifically, VFDA uses three decoders to respectively generate the texture information, region encoding, and smoothing information of query examples to complete the decoupling of visual features of query examples. Secondly, to make the generated query examples more consistent with the visual features of real examples, the sparsity of the texture information is limited and the generated smoothing information is filtered. VFDA exploits the property that the representational tendency of neural networks depends on the image texture features, and can generate query examples with inter-class diversity, thus effectively improving the similarity of model stealing and the success rate of the attack. In addition, VFDA adds intra-class diversity loss to the smoothed information of query samples generated through decoupling to make the query samples more consistent with real sample distribution. By comparing with multiple model stealing attack methods, the VFDA method proposed in this study has better performance in the similarity of model stealing and the success rate of the attack. In particular, on the GTSRB and Tiny-ImageNet datasets with high resolution, the attack success rate is respectively improved by 3.86% and 4.15% on average compared with the currently better EBFA method.
    Available online:  February 26, 2025 , DOI: 10.13328/j.cnki.jos.007306
    Abstract:
    Twin support vector machine (TSVM) can effectively tackle data such as cross or XOR data. However, when set-valued data are handled, TSVM usually makes use of statistical information of set-valued objects such as the mean and the median. Unlike TSVM, this study proposes twin support function machine (TSFM) that can directly deal with set-valued data. In terms of support functions defined for set-valued objects, TSFM obtains nonparallel hyperplanes in a Banach space. To suppress outliers in set-valued data, TSFM adopts the pinball loss function and introduce the weights of set-valued objects. Considering that TSFM involves optimization problems in the infinite-dimensional space, the measure is taken in the form of a linear combination of Dirac measures. Thus the optimization model in the finite-dimensional space is constructed. To solve the optimization model effectively, this study employs the sampling strategy to transform the model into quadratic programming (QP) problems. The dual formulations of the QP problems are derived, which provides theoretical foundations for determining which sampling points are support vectors. To classify set-valued data, the distance from the set-valued object to the hyperplane in a Banach space is defined, and the decision rule is derived therefrom. This study also considers the kernelization of support functions to capture the nonlinear features of data, which makes the proposed model available for indefinite kernels. Experimental results demonstrate that TSFM can capture the intrinsic structure of cross-plane set-valued data and obtain good classification performance in the case of outliers or set-valued objects containing a few high-dimensional examples.
    Available online:  February 26, 2025 , DOI: 10.13328/j.cnki.jos.007299
    Abstract:
    Large language model (LLM) like ChatGPT has found widespread applications across various fields due to their strong natural language understanding and generation capabilities. However, deep learning models exhibit vulnerability when subjected to adversarial example attacks. In natural language processing, current research on adversarial example generation methods typically employs CNN-based models, RNN-based models, and Transformer-based pre-trained models as target models, with few studies exploring the robustness of LLMs under adversarial attacks and quantifying the evaluation criteria of LLM robustness. Taking ChatGPT against Chinese adversarial attacks as an example, this study introduces a novel concept termed offset average difference (OAD) and proposes a quantifiable LLM robustness evaluation metric based on OAD, named OAD-based robustness score (ORS). In a black-box attack scenario, this study selects nine mainstream Chinese adversarial attack methods based on word importance to generate adversarial texts, which are then employed to attack ChatGPT and yield the attack success rate of each method. The proposed ORS assigns a robustness score to LLMs for each attack method based on the attack success rate. In addition to the ChatGPT that outputs hard labels, this study designs ORS for target models with soft-labeled outputs based on the attack success rate and the proportion of misclassified adversarial texts with high confidence. Meanwhile, this study extends the scoring formula to the fluency assessment of adversarial texts, proposing an OAD-based adversarial text fluency scoring method, named OAD-based fluency score (OFS). Compared to traditional methods requiring human involvement, the proposed OFS greatly reduces evaluation costs. Experiments conducted on real-world Chinese news and sentiment classification datasets to some extent initially demonstrate that, for text classification tasks, the robustness score of ChatGPT against adversarial attacks is nearly 20% higher than that of Chinese BERT. However, the powerful ChatGPT still produces erroneous predictions under adversarial attacks, with the highest attack success rate exceeding 40%.
    Available online:  February 26, 2025 , DOI: 10.13328/j.cnki.jos.007302
    Abstract:
    This study discusses the computational complexity of the partition function of the symmetric dual-spin system on regular graphs. Based on # exponential time hypothesis (#ETH) and random exponential time hypothesis (rETH), this study develops the classical dichotomies of this problem class into the exponential dichotomies, also known as the fine-grained dichotomies. In other words, this study proves that when the given tractable conditions are satisfied, then the problem is solvable in polynomial time; otherwise, there is no sub-exponential time algorithm when #ETH holds. This study also proposes two solutions to solve the in-effectiveness of existing interpolation methods on building sqrt-sub-exponential time reductions under the restriction of planar graphs. It also utilizes these two solutions to discuss the related fine-grained complexity and dichotomy of this problem under the planar graph restriction.
    Available online:  February 26, 2025 , DOI: 10.13328/j.cnki.jos.007321
    Abstract:
    Visual-language pre-training (VLP) aims to obtain a powerful multimodal representation by learning on a large-scale image-text multimodal dataset. Multimodal feature fusion and alignment is a key challenge in multimodal model training. In most of the existing visual-language pre-training models, for the multimodal feature fusion and alignment problem, the main approach is that the extracted visual features and text features are directly input into the Transformer model. Since the attention mechanism in the Transformer calculates the similarity between pairs, it is difficult to achieve the alignment among multiple entities. Considering that the hyperedges of hypergraph neural networks possess the characteristics of connecting multiple entities and encoding high-order entity correlations, thus enabling the establishment of relationships among multiple entities. In this study, a visual-language multimodal model pre-training method based on multi-entity alignment of hypergraph neural networks is proposed. In this method, the hypergraph neural network learning module is introduced into the Transformer multi-modal fusion encoder to learn the alignment relationship of multi-modal entities, thereby enhancing the entity alignment ability of the multi-modal fusion encoder in the pre-training model. The proposed visual-language pre-training model is pre-trained on the large-scale image-text datasets and fine-tuned on multiple visual-language downstream tasks such as visual question answering, image-text retrieval, visual grounding, and natural language visual reasoning. The experimental results indicate that compared with the baseline method, the proposed method has performance improvements in multiple downstream tasks, among which the accuracy is improved by 1.8% on the NLVR2 task.
    Available online:  February 26, 2025 , DOI: 10.13328/j.cnki.jos.007322
    Abstract:
    There are numerous and miscellaneous sources of online information. Judging whether it is a rumor in a timely and accurate manner is a crucial issue in the research of the cognitive domain of social media. Most of the previous studies have mainly concentrated on the text content of rumors, user characteristics, or the inherent features confined to the propagation mode, ignoring the key clues of the collective emotions generated by users’ participation in event discussions and the emotional steady-state characteristics hidden in the spread of rumors. In this study, a social network rumor detection method that is oriented by collective emotional stabilization and integrates temporal and spatial steady-state features is proposed. Based on the text features and user behaviors in rumor propagation, the temporal and spatial relationship steady-state features of collective emotions are combined for the first time, which can achieve strong expressiveness and detection accuracy. Specifically, this method takes the emotional keywords of users’ attitude towards a certain event or topic as the basis and uses recurrent neural networks to construct emotional steady-state features of the temporal relationship, enabling the collective emotions to have temporally consistent features with strong expressiveness, which can reflect the convergence effect of the collective emotions over time. The heterogeneous graph neural network is utilized to establish the connections between users and keywords, as well as between texts and keywords so that the collective emotions possess the fine-grained collective emotional steady-state features of the spatial relationship. Finally, the two types of local steady-state features are fused, possessing globality and improving the feature expression. Further classification can obtain the rumor detection results. The proposed method is run on two internationally publicly available and widely used Twitter datasets. Compared with the best-performing method in the baselines, the accuracy is improved by 3.4% and 3.2% respectively; the T-F1 value is improved by 3.0% and 1.8% respectively; the N-F1 value is improved by 2.7% and 2.3% respectively; the U-F1 value is improved by 2.3% and 1.0% respectively.
    Available online:  February 19, 2025 , DOI: 10.13328/j.cnki.jos.007296
    Abstract:
    Offline reinforcement learning has yielded significant results in tasks with continuous and intensive rewards. However, since the training process does not interact with the environment, the generalization ability is reduced, and the performance is difficult to guarantee in a discrete and sparse reward environment. The diffusion model combines the information in the neighborhood of the sample data with noise addition to generate actions that are close to the distribution of the sample data, which strengthens the learning and generalization ability of the agents. To this end, offline reinforcement learning with diffusion models and expectation maximization (DMEM) is proposed. The method updates the objective function by maximizing the expectation of the maximum likelihood logarithm to make the strategy more generalizable. Additionally, the diffusion model is introduced into the strategy network to utilize the diffusion characteristics to enhance the ability of the strategy to learn data samples. Meanwhile, the expectile regression is employed to update the value function from the perspective of high-dimensional space, and a penalty term is introduced to make the evaluation of the value function more accurate. DMEM is applied to a series of tasks with discrete and sparse rewards, and experiments show that DMEM has a large advantage in performance over other classical offline reinforcement learning methods.
    Available online:  February 19, 2025 , DOI: 10.13328/j.cnki.jos.007297
    Abstract:
    In recent years, as an algorithm for identifying bug-introducing changes, SZZ has been widely employed in just-in-time software defect prediction. Previous studies show that the SZZ algorithm may mislabel data during data annotation, which could influence the dataset quality and consequently the performance of the defect prediction model. Therefore, researchers have made improvements to the SZZ algorithm and proposed multiple variants of SZZ. However, there is no empirical study to explore the effect of data annotation quality by SZZ on the performance and interpretability of just-in-time defect prediction for mobile APP. To investigate the influence of mislabeled changes by SZZ on just-in-time defect prediction for mobile APP, this study conducts an extensive and in-depth empirical comparison of four SZZ algorithms. Firstly, 17 large-scale mobile APP projects are selected from the GitHub repository, and software metrics are extracted by adopting the PyDriller tool. Then, B-SZZ (original SZZ), AG-SZZ, MA-SZZ, and RA-SZZ are employed for data annotation. Then, the just-in-time defect prediction models are built with random forest, naive Bayes, and logistic regression classifiers based on the time-series data partitioning. Finally, the performance of the models is evaluated by traditional measures of AUC, MCC, and G-mean, and effort-aware measures of F-measure@20% and IFA, and a statistical significance test and interpretability analysis are conducted on the results by employing SKESD and SHAP respectively. By comparing the annotation performance of the four SZZ algorithms, the results are as follows. (1) The data annotation quality conforms to the progressive relationship among SZZ variants. (2) The mislabeled changes by B-SZZ, AG-SZZ, and MA-SZZ can cause performance reduction of AUC and MCC of different levels, but cannot lead to performance reduction of G-mean. (3) B-SZZ is likely to cause a performance reduction of F-measure@20%, while B-SZZ, AG-SZZ, and MA-SZZ are unlikely to increase effort during code inspection. (4) In terms of model interpretation, different SZZ algorithms will influence the three metrics with the largest contribution during the prediction, and the la metric has a significant influence on the prediction results.
    Available online:  January 24, 2025 , DOI: 10.13328/j.cnki.jos.007300
    Abstract:
    Existing adversarial example detection methods based on image transformation employ the characteristic that the image transformation can significantly change the feature distribution of adversarial examples but slightly change the feature distribution of benign examples. Adversarial examples can be detected by calculating the feature distance before and after image transformation. However, with the deepening research on adversarial attacks, researchers pay more attention to enhancing the robustness of adversarial examples, so that some attacks can be “immune” to the effect exerted by image transformation. Existing methods are difficult to detect robust adversarial examples effectively. This paper observes that the existing adversarial examples are too robust, and the feature distribution distance of robust adversarial examples under image transformation is much smaller than that of benign examples, which is not consistent with the feature distribution laws of benign examples. Based on this key observation, this study proposes a dual-threshold adversarial example detection based on image transformation, which sets a lower threshold combining existing single-threshold methods to form a dual-threshold detection interval. An example whose feature distribution is not within the dual-threshold detection interval will be judged as an adversarial example. Additionally, this study conducts extensive experiments on VGG19, DenseNet, and ConvNeXt models for image classification. The results show that the proposed approach is compatible with the detection ability of existing single-threshold detection schemes, and yields outstanding detection performance against robust adversarial examples.
    Available online:  January 24, 2025 , DOI: 10.13328/j.cnki.jos.007301
    Abstract:
    Scalar multiplication is the core operation in traditional elliptic curve cryptography (ECC). Scalar representations determine the iterations in scalar multiplication algorithms, which directly affect the security and efficiency of the algorithms. This study proposes two new scalar representation algorithms. One algorithm is ordered window width non-adjacent form (OWNAF) which combines traditional window non-adjacent form with random key segmentation and can resist energy analysis attacks while yielding better efficiency. The other is called window joint regular form (wJRF), which is improved from the traditional joint regular form. The wJRF algorithm is applicable to multi-scalar multiplication algorithms, which can reduce computational costs and ensure sound security compared with the existing algorithms.
    Available online:  January 24, 2025 , DOI: 10.13328/j.cnki.jos.007291
    Abstract:
    Deep stochastic configuration network (DSCN) adopts a feedforward learning approach and randomly assigns node parameters based on a unique supervisory mechanism, which has universal approximation. However, in actual scenarios, the potential outliers and noise during data collection can negatively affect the classification results. To improve the performance of DSCN in solving binary classification problems, this study introduces the idea of intuitionistic fuzzy numbers based on DSCN and proposes an intuitionistic fuzzy deep stochastic configuration network (IFDSCN). Different from the standard DSCN, IFDSCN assigns an intuitionistic fuzzy number to each sample by calculating the sample membership and non-membership, and generates the optimal classifier by a weighting method to overcome the negative effect of noise and outliers on data classification. The experimental results on eight benchmark datasets show that compared to other learning models including the intuitionistic fuzzy twin support vector machine (IFTWSVM), kernel ridge regression (KRR), intuitionistic fuzzy kernel ridge regression (IFKRR), random vector functional link neural network (RVFL), and SCN, IFDSCN has better binary classification performance.
    Available online:  October 18, 2017
    [Abstract] (3034) [HTML] (0) [PDF 525.21 K] (6513)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,315-325页. 原文链接如下:https://doi.org/10.1145/3106237.3106242, 读者如需引用该文请标引原文出处。
    Available online:  October 18, 2017
    [Abstract] (2955) [HTML] (0) [PDF 352.38 K] (7316)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,303-314页. 原文链接如下:https://doi.org/10.1145/3106237.3106239, 读者如需引用该文请标引原文出处。
    Available online:  September 11, 2017
    [Abstract] (3526) [HTML] (0) [PDF 276.42 K] (4670)
    Abstract:
    GitHub, a popular social-software-development platform, has fostered a variety of software ecosystems where projects depend on one another and practitioners interact with each other. Projects within an ecosystem often have complex inter-dependencies that impose new challenges in bug reporting and fixing. In this paper, we conduct an empirical study on cross-project correlated bugs, i.e., causally related bugs reported to different projects, focusing on two aspects: 1) how developers track the root causes across projects; and 2) how the downstream developers coordinate to deal with upstream bugs. Through manual inspection of bug reports collected from the scientific Python ecosystem and an online survey with developers, this study reveals the common practices of developers and the various factors in fixing cross-project bugs. These findings provide implications for future software bug analysis in the scope of ecosystem, as well as shed light on the requirements of issue trackers for such bugs.
    Available online:  June 21, 2017
    [Abstract] (3536) [HTML] (0) [PDF 169.43 K] (4453)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在IEEE Transactions on Software Engineering 2017 已录用待发表. 原文链接如下:http://ieeexplore.ieee.org/document/7792694, 读者如需引用该文请标引原文出处。
    Available online:  June 13, 2017
    [Abstract] (4790) [HTML] (0) [PDF 174.91 K] (4888)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 39th International Conference on Software Engineering, Pages 27-37, Buenos Aires, Argentina — May 20 - 28, 2017, IEEE Press Piscataway, NJ, USA ?2017, ISBN: 978-1-5386-3868-2 原文链接如下:http://dl.acm.org/citation.cfm?id=3097373, 读者如需引用该文请标引原文出处。
    Available online:  January 25, 2017
    [Abstract] (3639) [HTML] (0) [PDF 254.98 K] (4441)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 871-882. DOI: https://doi.org/10.1145/2950290.2950364 原文链接如下:http://dl.acm.org/citation.cfm?id=2950364, 读者如需引用该文请标引原文出处。
    Available online:  January 18, 2017
    [Abstract] (4134) [HTML] (0) [PDF 472.29 K] (4646)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Pages 133—143, Seattle WA, USA, November 2016. 原文链接如下:http://dl.acm.org/citation.cfm?id=2950327, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017
    [Abstract] (3859) [HTML] (0) [PDF 293.93 K] (4130)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE'16), 810 – 821, November 13 - 18, 2016. 原文链接如下:https://doi.org/10.1145/2950290.2950310, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017
    [Abstract] (4188) [HTML] (0) [PDF 244.61 K] (4768)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE 2016, 原文链接如下:http://dl.acm.org/citation.cfm?doid=2950290.2950313, 读者如需引用该文请标引原文出处。
    Available online:  December 12, 2016
    [Abstract] (3711) [HTML] (0) [PDF 358.69 K] (4565)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE'16会议上Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 原文链接如下:http://dl.acm.org/citation.cfm?id=2950340, 读者如需引用该文请标引原文出处。
    Available online:  September 30, 2016
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在ASE2016会议上http://ase2016.org/ 原文链接如下:http://dl.acm.org/citation.cfm?id=2970366 读者如需引用该文请标引原文出处。
    Available online:  September 09, 2016
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 俊杰的文章发表在ASE2016会议上,http://ase2016.org/。 原文链接如下:http://dl.acm.org/citation.cfm?doid=2970276.2970300 请读者标引时请引注原文出处。
    Available online:  September 07, 2016
    Abstract:
    CCF 软件工程专业委员会白晓颖教授(清华大学)推荐。 原文发表在 ASE 2016 Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering。 全文链接:http://dx.doi.org/10.1145/2970276.2970307。 重要提示:读者如引用该文时请标注原文出处。
    Available online:  August 29, 2016
    Abstract:
    CCF软件工程专业委员会白晓颖教授(清华大学)推荐。 该论文发表在ACM Transactions on Software Engineering and Methodology (TOSEM, Vol. 25, No. 2, Article 13, May 2016),被ICSE 2016主会邀请为“Journal first”报告, 全文参见http://dl.acm.org/citation.cfm?id=2876443。 论文作者包括北京大学的周明辉,马秀娟,张路和梅宏,以及田纳西大学的Audris Mockus。 重要提示:读者如引用该文时请标注原文出处。
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291
    [Abstract] (37887) [HTML] (0) [PDF 832.28 K] (84205)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2010,21(3):427-437
    [Abstract] (33398) [HTML] (0) [PDF 308.76 K] (42472)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2011,22(1):71-83 , DOI: 10.3724/SP.J.1001.2011.03958
    [Abstract] (30535) [HTML] (0) [PDF 781.42 K] (61519)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2016,27(1):45-71 , DOI: 10.13328/j.cnki.jos.004914
    [Abstract] (30496) [HTML] (5162) [PDF 880.96 K] (39188)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2008,19(1):48-61
    [Abstract] (28921) [HTML] (0) [PDF 671.39 K] (65750)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2009,20(5):1337-1348
    [Abstract] (28628) [HTML] (0) [PDF 1.06 M] (48302)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2009,20(2):271-289
    [Abstract] (27686) [HTML] (0) [PDF 675.56 K] (48928)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2005,16(1):1-7
    [Abstract] (22770) [HTML] (0) [PDF 614.61 K] (24498)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2010,21(8):1834-1848
    [Abstract] (21569) [HTML] (0) [PDF 682.96 K] (61408)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2004,15(3):428-442
    [Abstract] (20954) [HTML] (0) [PDF 1009.57 K] (19830)
    Abstract:
    With the rapid development of e-business, web applications based on the Web are developed from localization to globalization, from B2C(business-to-customer) to B2B(business-to-business), from centralized fashion to decentralized fashion. Web service is a new application model for decentralized computing, and it is also an effective mechanism for the data and service integration on the web. Thus, web service has become a solution to e-business. It is important and necessary to carry out the research on the new architecture of web services, on the combinations with other good techniques, and on the integration of services. In this paper, a survey presents on various aspects of the research of web services from the basic concepts to the principal research problems and the underlying techniques, including data integration in web services, web service composition, semantic web service, web service discovery, web service security, the solution to web services in the P2P (Peer-to-Peer) computing environment, and the grid service, etc. This paper also presents a summary of the current art of the state of these techniques, a discussion on the future research topics, and the challenges of the web services.
    2005,16(5):857-868
    [Abstract] (20060) [HTML] (0) [PDF 489.65 K] (33690)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2009,20(1):54-66
    [Abstract] (20045) [HTML] (0) [PDF 1.41 M] (54378)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2012,23(4):962-986 , DOI: 10.3724/SP.J.1001.2012.04175
    [Abstract] (19267) [HTML] (0) [PDF 2.09 M] (36308)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):32-45 , DOI: 10.3724/SP.J.1001.2012.04091
    [Abstract] (18953) [HTML] (0) [PDF 408.86 K] (35652)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2009,20(3):524-545
    [Abstract] (17657) [HTML] (0) [PDF 1.09 M] (26391)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2009,20(1):124-137
    [Abstract] (17314) [HTML] (0) [PDF 1.06 M] (25714)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2015,26(1):62-81 , DOI: 10.13328/j.cnki.jos.004701
    [Abstract] (17228) [HTML] (5781) [PDF 1.04 M] (36466)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2009,20(2):350-362
    [Abstract] (16811) [HTML] (0) [PDF 1.39 M] (44468)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2004,15(8):1208-1219
    [Abstract] (16736) [HTML] (0) [PDF 948.49 K] (18001)
    Abstract:
    With the explosive growth of network applications and complexity, the threat of Internet worms against network security becomes increasingly serious. Especially under the environment of Internet, the variety of the propagation ways and the complexity of the application environment result in worm with much higher frequency of outbreak, much deeper latency and more wider coverage, and Internet worms have been a primary issue faced by malicious code researchers. In this paper, the concept and research situation of Internet worms, exploration function component and execution mechanism are first presented, then the scanning strategies and propagation model are discussed, and finally the critical techniques of Internet worm prevention are given. Some major problems and research trends in this area are also addressed.
    2009,20(11):2965-2976
    [Abstract] (16680) [HTML] (0) [PDF 442.42 K] (18949)
    Abstract:
    This paper studies uncertain graph data mining and especially investigates the problem of mining frequent subgraph patterns from uncertain graph data. A data model is introduced for representing uncertainties in graphs, and an expected support is employed to evaluate the significance of subgraph patterns. By using the apriori property of expected support, a depth-first search-based mining algorithm is proposed with an efficient method for computing expected supports and a technique for pruning search space, which reduces the number of subgraph isomorphism testings needed by computing expected support from the exponential scale to the linear scale. Experimental results show that the proposed algorithm is 3 to 5 orders of magnitude faster than a na?ve depth-first search algorithm, and is efficient and scalable.
    2009,20(5):1226-1240
    [Abstract] (16606) [HTML] (0) [PDF 926.82 K] (20577)
    Abstract:
    This paper introduces the concrete details of combining the automated reasoning techniques with planning methods, which includes planning as satisfiability using propositional logic, Conformant planning using modal logic and disjunctive reasoning, planning as nonmonotonic logic, and Flexible planning as fuzzy description logic. After considering experimental results of International Planning Competition and relevant papers, it concludes that planning methods based on automated reasoning techniques is helpful and can be adopted. It also proposes the challenges and possible hotspots.
    2003,14(10):1717-1727
    [Abstract] (16463) [HTML] (0) [PDF 839.25 K] (18734)
    Abstract:
    Sensor networks are integration of sensor techniques, nested computation techniques, distributed computation techniques and wireless communication techniques. They can be used for testing, sensing, collecting and processing information of monitored objects and transferring the processed information to users. Sensor network is a new research area of computer science and technology and has a wide application future. Both academia and industries are very interested in it. The concepts and characteristics of the sensor networks and the data in the networks are introduced, and the issues of the sensor networks and the data management of sensor networks are discussed. The advance of the research on sensor networks and the data management of sensor networks are also presented.
    2014,25(4):839-862 , DOI: 10.13328/j.cnki.jos.004558
    [Abstract] (15811) [HTML] (4324) [PDF 1.32 M] (24589)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2012,23(1):1-20 , DOI: 10.3724/SP.J.1001.2012.04100
    [Abstract] (14909) [HTML] (0) [PDF 1017.73 K] (36220)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2015,26(1):26-39 , DOI: 10.13328/j.cnki.jos.004631
    [Abstract] (14733) [HTML] (3788) [PDF 763.52 K] (20523)
    Abstract:
    In recent years, transfer learning has provoked vast amount of attention and research. Transfer learning is a new machine learning method that applies the knowledge from related but different domains to target domains. It relaxes the two basic assumptions in traditional machine learning: (1) the training (also referred as source domain) and test data (also referred target domain) follow the independent and identically distributed (i.i.d.) condition; (2) there are enough labeled samples to learn a good classification model, aiming to solve the problems that there are few or even not any labeled data in target domains. This paper surveys the research progress of transfer learning and introduces its own works, especially the ones in building transfer learning models by applying generative model on the concept level. Finally, the paper introduces the applications of transfer learning, such as text classification and collaborative filtering, and further suggests the future research direction of transfer learning.
    2009,20(10):2729-2743
    [Abstract] (14624) [HTML] (0) [PDF 1.12 M] (13974)
    Abstract:
    In a multi-hop wireless sensor network (WSN), the sensors closest to the sink tend to deplete their energy faster than other sensors, which is known as an energy hole around the sink. No more data can be delivered to the sink after an energy hole appears, while a considerable amount of energy is wasted and the network lifetime ends prematurely. This paper investigates the energy hole problem, and based on the improved corona model with levels, it concludes that the assignment of transmission ranges of nodes in different coronas is an effective approach for achieving energy-efficient network. It proves that the optimal transmission ranges for all areas is a multi-objective optimization problem (MOP), which is NP hard. The paper proposes an ACO (ant colony optimization)-based distributed algorithm to prolong the network lifetime, which can help nodes in different areas to adaptively find approximate optimal transmission range based on the node distribution. Furthermore, the simulation results indicate that the network lifetime under this solution approximates to that using the optimal list. Compared with existing algorithms, this ACO-based algorithm can not only make the network lifetime be extended more than two times longer, but also have good performance in the non-uniform node distribution.
    2000,11(11):1460-1466
    [Abstract] (14599) [HTML] (0) [PDF 520.69 K] (13935)
    Abstract:
    Intrusion detection is a highlighted topic of network security research in recent years. In this paper, first the necessity o f intrusion detection is presented, and its concepts and models are described. T hen, many intrusion detection techniques and architectures are summarized. Final ly, the existing problems and the future direction in this field are discussed.
    2012,23(5):1148-1166 , DOI: 10.3724/SP.J.1001.2012.04195
    [Abstract] (14595) [HTML] (0) [PDF 946.37 K] (20918)
    Abstract:
    With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here, the features, influence and related products of cloud databases are first discussed. Then, research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.
    2002,13(7):1228-1237
    [Abstract] (14383) [HTML] (0) [PDF 500.04 K] (18081)
    Abstract:
    Software architecture (SA) is emerging as one of the primary research areas in software engineering recently and one of the key technologies to the development of large-scale software-intensive system and software product line system. The history and the major direction of SA are summarized, and the concept of SA is brought up based on analyzing and comparing the several classical definitions about SA. Based on summing up the activities about SA, two categories of study about SA are extracted out, and the advancements of researches on SA are subsequently introduced from seven aspects.Additionally,some disadvantages of study on SA are discussed,and the causes are explained at the same.Finally,it is concluded with some singificantly promising tendency about research on SA.
    2013,24(8):1786-1803 , DOI: 10.3724/SP.J.1001.2013.04416
    [Abstract] (14266) [HTML] (0) [PDF 1.04 M] (23064)
    Abstract:
    Many specific application oriented NoSQL database systems are developed for satisfying the new requirement of big data management. This paper surveys researches on typical NoSQL database based on key-value data model. First, the characteristics of big data, and the key technique issues supporting big data management are introduced. Then frontier efforts and research challenges are given, including system architecture, data model, access mode, index, transaction, system elasticity, load balance, replica strategy, data consistency, flash cache, MapReduce based data process and new generation data management system etc. Finally, research prospects are given.
    2006,17(7):1588-1600
    [Abstract] (14054) [HTML] (0) [PDF 808.73 K] (17792)
    Abstract:
    Routing technology at the network layer is pivotal in the architecture of wireless sensor networks. As an active branch of routing technology, cluster-based routing protocols excel in network topology management, energy minimization, data aggregation and so on. In this paper, cluster-based routing mechanisms for wireless sensor networks are analyzed. Cluster head selection, cluster formation and data transmission are three key techniques in cluster-based routing protocols. As viewed from the three techniques, recent representative cluster-based routing protocols are presented, and their characteristics and application areas are compared. Finally, the future research issues in this area are pointed out.
    2011,22(1):115-131 , DOI: 10.3724/SP.J.1001.2011.03950
    [Abstract] (13996) [HTML] (0) [PDF 845.91 K] (31734)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2004,15(4):571-583
    [Abstract] (13951) [HTML] (0) [PDF 1005.17 K] (12770)
    Abstract:
    For most peer-to-peer file-swapping applications, sharing is a volunteer action, and peers are not responsible for their irresponsible bartering history. This situation indicates the trust between participants can not be set up simply on the traditional trust mechanism. A reasonable trust construction approach comes from the social network analysis, in which trust relations between individuals are set up upon recommendations of other individuals. Current p2p trust model could not promise the convergence of iteration for trust computation, and takes no consideration for model security problems, such as sybil attack and slandering. This paper presents a novel recommendation-based global trust model and gives a distributed implementation method. Mathematic analyses and simulations show that, compared to the current global trust model, the proposed model is more robust on trust security problems and more complete on iteration for computing peer trust.
    2009,20(1):11-29
    [Abstract] (13855) [HTML] (0) [PDF 787.30 K] (18303)
    Abstract:
    Constrained optimization problems (COPs) are mathematical programming problems frequently encountered in the disciplines of science and engineering application. Solving COPs has become an important research area of evolutionary computation in recent years. In this paper, the state-of-the-art of constrained optimization evolutionary algorithms (COEAs) is surveyed from two basic aspects of COEAs (i.e., constraint-handling techniques and evolutionary algorithms). In addition, this paper discusses some important issues of COEAs. More specifically, several typical algorithms are analyzed in detail. Based on the analyses, it concluded that to obtain competitive results, a proper constraint-handling technique needs to be considered in conjunction with an appropriate search algorithm. Finally, the open research issues in this field are also pointed out.
    2008,19(zk):112-120
    [Abstract] (13701) [HTML] (0) [PDF 594.29 K] (17630)
    Abstract:
    An ad hoc network is a collection of wireless mobile nodes dynamically forming a temporary network without the use of any existing network infrastructure or centralized administration. Due to bandwidth constraint and dynamic topology of mobile ad hoc networks, multipath supported routing is a very important research issue. In this paper, we present an entropy-based metric to support stability multipath on-demand routing (SMDR). The key idea of SMDR protocol is to construct the new metric-entropy and select the stability multipath with the help of entropy metric to reduce the number of route reconstruction so as to provide QoS guarantee in the ad hoc network whose topology changes continuously. Simulation results show that, with the proposed multipath routing protocol, packet delivery ratio, end-to-end delay, and routing overhead ratio can be improved in most of cases. It is an available approach to multipath routing decision.
    2013,24(1):50-66 , DOI: 10.3724/SP.J.1001.2013.04276
    [Abstract] (13698) [HTML] (0) [PDF 0.00 Byte] (20681)
    Abstract:
    As an important application of acceleration in the cloud, the distributed caching technology has received considerable attention in industry and academia. This paper starts with a discussion on the combination of cloud computing and distributed caching technology, giving an analysis of its characteristics, typical application scenarios, stages of development, standards, and several key elements, which have promoted its development. In order to systematically know the state of art progress and weak points of the distributed caching technology, the paper builds a multi-dimensional framework, DctAF. This framework is constituted of 6 dimensions through analyzing the characteristics of cloud computing and boundary of the caching techniques. Based on DctAF, current techniques have been analyzed and summarized; comparisons among several influential products have also been made. Finally, the paper describes and highlights the several challenges that the cache system faces and examines the current research through in-depth analysis and comparison.
    2002,13(10):1952-1961
    [Abstract] (13544) [HTML] (0) [PDF 570.96 K] (16462)
    Abstract:
    The crucial technologies related to personalization are introduced in this paper, which include the representation and modification of user profile, the representation of resource, the recommendation technology, and the architecture of personalization. By comparing with some existing prototype systems, the key technologies about how to implement personalization are discussed in detail. In addition, three representative personalization systems are analyzed. At last, some research directions for personalization are presented.
    2003,14(9):1621-1628
    [Abstract] (13470) [HTML] (0) [PDF 680.35 K] (23513)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2003,14(9):1635-1644
    [Abstract] (13375) [HTML] (0) [PDF 622.06 K] (15327)
    Abstract:
    Computer forensics is the technology field that attempts to prove thorough, efficient, and secure means to investigate computer crime. Computer evidence must be authentic, accurate, complete and convincing to juries. In this paper, the stages of computer forensics are presented, and the theories and the realization of the forensics software are described. An example about forensic practice is also given. The deficiency of computer forensics technique and anti-forensics are also discussed. The result comes out that it is as the improvement of computer science technology, the forensics technique will become more integrated and thorough.
    2008,19(7):1565-1580
    [Abstract] (13353) [HTML] (0) [PDF 815.02 K] (19898)
    Abstract:
    Software defect prediction has been one of the active parts of software engineering since it was developed in 1970's. It plays a very important role in the analysis of software quality and balance of software cost. This paper investigates and discusses the motivation, evolvement, solutions and challenges of software defect prediction technologies, and it also categorizes, analyzes and compares the representatives of these prediction technologies. Some case studies for software defect distribution models are given to help understanding.
    2012,23(1):82-96 , DOI: 10.3724/SP.J.1001.2012.04101
    [Abstract] (13216) [HTML] (0) [PDF 394.07 K] (18317)
    Abstract:
    Botnets are one of the most serious threats to the Internet. Researchers have done plenty of research and made significant progress. However, botnets keep evolving and have become more and more sophisticated. Due to the underlying security limitation of current system and Internet architecture, and the complexity of botnet itself, how to effectively counter the global threat of botnets is still a very challenging issue. This paper first introduces the evolving of botnet’s propagation, attack, command, and control mechanisms. Then the paper summarizes recent advances of botnet defense research and categorizes into five areas: Botnet monitoring, botnet infiltration, analysis of botnet characteristics, botnet detection and botnet disruption. The limitation of current botnet defense techniques, the evolving trend of botnet, and some possible directions for future research are also discussed.
    2008,19(8):1947-1964
    [Abstract] (13182) [HTML] (0) [PDF 811.11 K] (13076)
    Abstract:
    Wide-Spread deployment for interactive information visualization is difficult. Non-Specialist users need a general development method and a toolkit to support the generic data structures suited to tree, network and multi-dimensional data, special visualization techniques and interaction techniques, and well-known generic information tasks. This paper presents a model driven development method for interactive information visualization. First, an interactive information visualization interface model (IIVM) is proposed. Then, the development method for interactive information visualization based on IIVM is presented. The Daisy toolkit is introduced, which includes Daisy model builder, Daisy IIV generator and runtime framework with Daisy library. Finally, an application example is given. Experimental results show that Daisy can provide a general solution for development for interactive information visualization.
    2008,19(8):1902-1919
    [Abstract] (13101) [HTML] (0) [PDF 521.73 K] (16216)
    Abstract:
    Visual language techniques have exhibited more advantages in describing various software artifacts than one-dimensional textual languages during software development, ranging from the requirement analysis and design to testing and maintenance, as diagrammatic and graphical notations have been well applied in modeling system. In addition to an intuitive appearance, graph grammars provide a well-established foundation for defining visual languages with the power of precise modeling and verification on computers. This paper discusses the issues and techniques for a formal foundation of visual languages, reviews related practical graphical environments, presents a spatial graph grammar formalism, and applies the spatial graph grammar to defining behavioral semantics of UML diagrams and developing a style-driven framework for software architecture design.
    2006,17(9):1848-1859
    [Abstract] (12979) [HTML] (0) [PDF 770.40 K] (24030)
    Abstract:
    In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
    2010,21(2):231-247
    [Abstract] (12880) [HTML] (0) [PDF 1.21 M] (19672)
    Abstract:
    In this paper, a framework is proposed for handling fault of service composition through analyzing fault requirements. Petri nets are used in the framework for fault detecting and its handling, which focuses on targeting the failure of available services, component failure and network failure. The corresponding fault models are given. Based on the model, the correctness criterion of fault handling is given to analyze fault handling model, and its correctness is proven. Finally, CTL (computational tree logic) is used to specify the related properties and enforcement algorithm of fault analysis. The simulation results show that this method can ensure the reliability and consistency of service composition.
    2017,28(1):1-16 , DOI: 10.13328/j.cnki.jos.005139
    [Abstract] (12793) [HTML] (5369) [PDF 1.75 M] (14920)
    Abstract:
    Knapsack problem (KP) is a well-known combinatorial optimization problem which includes 0-1 KP, bounded KP, multi-constraint KP, multiple KP, multiple-choice KP, quadratic KP, dynamic knapsack KP, discounted KP and other types of KPs. KP can be considered as a mathematical model extracted from variety of real fields and therefore has wide applications. Evolutionary algorithms (EAs) are universally considered as an efficient tool to solve KP approximately and quickly. This paper presents a survey on solving KP by EAs over the past ten years. It not only discusses various KP encoding mechanism and the individual infeasible solution processing but also provides useful guidelines for designing new EAs to solve KPs.
    2010,21(7):1620-1634
    [Abstract] (12686) [HTML] (0) [PDF 765.23 K] (22998)
    Abstract:
    As an application of mobile ad hoc networks (MANET) on Intelligent Transportation Information System, the most important goal of vehicular ad hoc networks (VANET) is to reduce the high number of accidents and fatal consequences dramatically. One of the most important factors that would contribute to the realization of this goal is the design of effective broadcast protocols. This paper introduces the characteristics and application fields of VANET briefly. Then, it discusses the characteristics, performance, and application areas with analysis and comparison of various categories of broadcast protocols in VANET. According to the characteristic of VANET and its application requirement, the paper proposes the ideas and breakthrough direction of information broadcast model design of inter-vehicle communication.
    2010,21(5):916-929
    [Abstract] (12664) [HTML] (0) [PDF 944.50 K] (21382)
    Abstract:
    Data deduplication technologies can be divided into two categories: a) identical data detection techniques, and b) similar data detection and encoding techniques. This paper presents a systematic survey on these two categories of data deduplication technologies and analyzes their advantages and disadvantages. Besides, since data deduplication technologies can affect the reliability and performance of storage systems, this paper also surveys various kinds of technologies proposed to cope with these two aspects of problems. Based on the analysis of the current state of research on data deduplication technologies, this paper makes several conclusions as follows: a) How to mine data characteristic information in data deduplication has not been completely solved, and how to use data characteristic information to effectively eliminate duplicate data also needs further study; b) From the perspective of storage system design, it still needs further study how to introduce proper mechanisms to overcome the reliability limitations of data deduplication techniques and reduce the additional system overheads caused by data deduplication techniques.
    2009,20(6):1393-1405
    [Abstract] (12490) [HTML] (0) [PDF 831.86 K] (22833)
    Abstract:
    Combinatorial testing can use a small number of test cases to test systems while preserving fault detection ability. However, the complexity of test case generation problem for combinatorial testing is NP-complete. The efficiency and complexity of this testing method have attracted many researchers from the area of combinatorics and software engineering. This paper summarizes the research works on this topic in recent years. They include: various combinatorial test criteria, the relations between the test generation problem and other NP-complete problems, the mathematical methods for constructing test cases, the computer search techniques for test generation and fault localization techniques based on combinatorial testing.
    2008,19(10):2706-2719
    [Abstract] (12339) [HTML] (0) [PDF 778.29 K] (14693)
    Abstract:
    Web search engine has become a very important tool for finding information efficiently from the massive Web data. With the explosive growth of the Web data, traditional centralized search engines become harder to catch up with the growing step of people's information needs. With the rapid development of peer-to-peer (P2P) technology, the notion of P2P Web search has been proposed and quickly becomes a research focus. The goal of this paper is to give a brief summary of current P2P Web search technologies in order to facilitate future research. First, some main challenges for P2P Web search are presented. Then, key techniques for building a feasible and efficient P2P Web search engine are reviewed, including system topology, data placement, query routing, index partitioning, collection selection, relevance ranking and Web crawling. Finally, three recently proposed novel P2P Web search prototypes are introduced.
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291
    [Abstract] (37887) [HTML] (0) [PDF 832.28 K] (84205)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2008,19(1):48-61
    [Abstract] (28921) [HTML] (0) [PDF 671.39 K] (65750)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2011,22(1):71-83 , DOI: 10.3724/SP.J.1001.2011.03958
    [Abstract] (30535) [HTML] (0) [PDF 781.42 K] (61519)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2010,21(8):1834-1848
    [Abstract] (21569) [HTML] (0) [PDF 682.96 K] (61408)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2009,20(1):54-66
    [Abstract] (20045) [HTML] (0) [PDF 1.41 M] (54378)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2009,20(2):271-289
    [Abstract] (27686) [HTML] (0) [PDF 675.56 K] (48928)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2009,20(5):1337-1348
    [Abstract] (28628) [HTML] (0) [PDF 1.06 M] (48302)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2014,25(9):1889-1908 , DOI: 10.13328/j.cnki.jos.004674
    [Abstract] (12211) [HTML] (5029) [PDF 550.98 K] (45362)
    Abstract:
    This paper first introduces the key features of big data in different processing modes and their typical application scenarios, as well as corresponding representative processing systems. It then summarizes three development trends of big data processing systems. Next, the paper gives a brief survey on system supported analytic technologies and applications (including deep learning, knowledge computing, social computing, and visualization), and summarizes the key roles of individual technologies in big data analysis and understanding. Finally, the paper lays out three grand challenges of big data processing and analysis, i.e., data complexity, computation complexity, and system complexity. Potential ways for dealing with each complexity are also discussed.
    2009,20(2):350-362
    [Abstract] (16811) [HTML] (0) [PDF 1.39 M] (44468)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2010,21(3):427-437
    [Abstract] (33398) [HTML] (0) [PDF 308.76 K] (42472)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2004,15(10):1493-1504
    [Abstract] (9326) [HTML] (0) [PDF 937.72 K] (42395)
    Abstract:
    Graphics processing unit (GPU) has been developing rapidly in recent years at a speed over Moor抯 law, and as a result, various applications associated with computer graphics advance greatly. At the same time, the highly processing power, parallelism and programmability available nowadays on the contemporary GPU provide an ideal platform on which the general-purpose computation could be made. Starting from an introduction to the development history and the architecture of GPU, the technical fundamentals of GPU are described in the paper. Then in the main part of the paper, the development of various applications on general purpose computation on GPU is introduced, and among those applications, fluid dynamics, algebraic computation, database operations, and spectrum analysis are introduced in detail. The experience of our work on fluid dynamics has been also given, and the development of software tools in this area is introduced. Finally, a conclusion is made, and the future development and the new challenge on both hardware and software in this subject are discussed.
    2021,32(2):349-369 , DOI: 10.13328/j.cnki.jos.006138
    [Abstract] (9305) [HTML] (10979) [PDF 2.36 M] (41973)
    Abstract:
    Few-shot learning is defined as learning models to solve problems from small samples. In recent years, under the trend of training model with big data, machine learning and deep learning have achieved success in many fields. However, in many application scenarios in the real world, there is not a large amount of data or labeled data for model training, and labeling a large number of unlabeled samples will cost a lot of manpower. Therefore, how to use a small number of samples for learning has become a problem that needs to be paid attention to at present. This paper systematically combs the current approaches of few-shot learning. It introduces each kind of corresponding model from the three categories: fine-tune based, data augmentation based, and transfer learning based. Then, the data augmentation based approaches are subdivided into unlabeled data based, data generation based, and feature augmentation based approaches. The transfer learning based approaches are subdivided into metric learning based, meta-learning based, and graph neural network based methods. In the following, the paper summarizes the few-shot datasets and the results in the experiments of the aforementioned models. Next, the paper summarizes the current situation and challenges in few-shot learning. Finally, the future technological development of few-shot learning is prospected.
    2013,24(11):2476-2497 , DOI: 10.3724/SP.J.1001.2013.04486
    [Abstract] (10857) [HTML] (0) [PDF 1.14 M] (41397)
    Abstract:
    Probabilistic graphical models are powerful tools for compactly representing complex probability distributions, efficiently computing (approximate) marginal and conditional distributions, and conveniently learning parameters and hyperparameters in probabilistic models. As a result, they have been widely used in applications that require some sort of automated probabilistic reasoning, such as computer vision and natural language processing, as a formal approach to deal with uncertainty. This paper surveys the basic concepts and key results of representation, inference and learning in probabilistic graphical models, and demonstrates their uses in two important probabilistic models. It also reviews some recent advances in speeding up classic approximate inference algorithms, followed by a discussion of promising research directions.
    2022,33(7):2464-2481 , DOI: 10.13328/j.cnki.jos.006585
    [Abstract] (1433) [HTML] (3163) [PDF 2.00 M] (39936)
    Abstract:
    Symbolic propagation methods based on linear abstraction play a significant role in neural network verification. This study proposes the notion of multi-path back-propagation for these methods. Existing methods are viewed as using only a single back-propagation path to calculate the upper and lower bounds of each node in a given neural network, being specific instances of the proposed notion. Leveraging multiple back-propagation paths effectively improves the accuracy of this kind of method. For evaluation, the proposed method is quantitatively compared using multiple back-propagation paths with the state-of-the-art tool DeepPoly on benchmarks ACAS Xu, MNIST, and CIFAR10. The experiment results show that the proposed method achieves significant accuracy improvement while introducing only a low extra time cost. In addition, the multi-path back-propagation method is compared with the Optimized LiRPA based on global optimization, on the dataset MNIST. The results show that the proposed method still has an accuracy advantage.
    2016,27(1):45-71 , DOI: 10.13328/j.cnki.jos.004914
    [Abstract] (30496) [HTML] (5162) [PDF 880.96 K] (39188)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2018,29(5):1471-1514 , DOI: 10.13328/j.cnki.jos.005519
    [Abstract] (6667) [HTML] (7014) [PDF 4.38 M] (37524)
    Abstract:
    Computer aided detection/diagnosis (CAD) can improve the accuracy of diagnosis,reduce false positive,and provide decision supports for doctors.The main purpose of this paper is to analyze the latest development of computer aided diagnosis tools.Focusing on the top four fatal cancer's incidence positions,major recent publications on CAD applications in different medical imaging areas are reviewed in this survey according to different imaging techniques and diseases.Further more,multidimentional analysis is made on the researches from image data sets,algorithms and evaluation methods.Finally,existing problems,research trend and development direction in the field of medical image CAD system are discussed.
    2015,26(1):62-81 , DOI: 10.13328/j.cnki.jos.004701
    [Abstract] (17228) [HTML] (5781) [PDF 1.04 M] (36466)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2012,23(4):962-986 , DOI: 10.3724/SP.J.1001.2012.04175
    [Abstract] (19267) [HTML] (0) [PDF 2.09 M] (36308)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):1-20 , DOI: 10.3724/SP.J.1001.2012.04100
    [Abstract] (14909) [HTML] (0) [PDF 1017.73 K] (36220)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2012,23(1):32-45 , DOI: 10.3724/SP.J.1001.2012.04091
    [Abstract] (18953) [HTML] (0) [PDF 408.86 K] (35652)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2020,31(7):2245-2282 , DOI: 10.13328/j.cnki.jos.006037
    [Abstract] (3313) [HTML] (6632) [PDF 967.02 K] (35043)
    Abstract:
    Ultrasonography is the first choice of imaging examination and preoperative evaluation for thyroid and breast cancer. However, ultrasonic characteristics of benign and malignant nodules are commonly overlapped. The diagnosis heavily relies on operator's experience other than quantitative and stable methods. In recent years, medical imaging analysis based on computer technology has developed rapidly, and a series of landmark breakthroughs have been made, which provides effective decision supports for medical imaging diagnosis. In this work, the research progress of computer vision and image recognition technologies in thyroid and breast ultrasound images is studied. A series of key technologies involved in automatic diagnosis of ultrasound images is the main lines of the work. The major algorithms in recent years are summarized and analyzed, such as ultrasound image preprocessing, lesion localization and segmentation, feature extraction and classification. Moreover, multi-dimensional analysis is made on the algorithms, data sets, and evaluation methods. Finally, existing problems related to automatic analysis of those two kinds of ultrasound imaging are discussed, research trend and development direction in the field of ultrasound images analysis are discussed.
    2005,16(5):857-868
    [Abstract] (20060) [HTML] (0) [PDF 489.65 K] (33690)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2013,24(1):77-90 , DOI: 10.3724/SP.J.1001.2013.04339
    [Abstract] (11436) [HTML] (0) [PDF 0.00 Byte] (32404)
    Abstract:
    Task parallel programming model is a widely used parallel programming model on multi-core platforms. With the intention of simplifying parallel programming and improving the utilization of multiple cores, this paper provides an introduction to the essential programming interfaces and the supporting mechanism used in task parallel programming models and discusses issues and the latest achievements from three perspectives: Parallelism expression, data management and task scheduling. In the end, some future trends in this area are discussed.
    2011,22(1):115-131 , DOI: 10.3724/SP.J.1001.2011.03950
    [Abstract] (13996) [HTML] (0) [PDF 845.91 K] (31734)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2017,28(4):959-992 , DOI: 10.13328/j.cnki.jos.005143
    [Abstract] (9930) [HTML] (7375) [PDF 3.58 M] (31534)
    Abstract:
    The development of mobile internet and the popularity of mobile terminals produce massive trajectory data of moving objects under the era of big data. Trajectory data has spatio-temporal characteristics and rich information. Trajectory data processing techniques can be used to mine the patterns of human activities and behaviors, the moving patterns of vehicles in the city and the changes of atmospheric environment. However, trajectory data also can be exploited to disclose moving objects' privacy information (e.g., behaviors, hobbies and social relationships). Accordingly, attackers can easily access moving objects' privacy information by digging into their trajectory data such as activities and check-in locations. In another front of research, quantum computation presents an important theoretical direction to mine big data due to its scalable and powerful storage and computing capacity. Applying quantum computing approaches to handle trajectory big data could make some complex problem solvable and achieve higher efficiency. This paper reviews the key technologies of processing trajectory data. First the concept and characteristics of trajectory data is introduced, and the pre-processing methods, including noise filtering and data compression, are summarized. Then, the trajectory indexing and querying techniques, and the current achievements of mining trajectory data, such as pattern mining and trajectory classification, are reviewed. Next, an overview of the basic theories and characteristics of privacy preserving with respect to trajectory data is provided. The supporting techniques of trajectory big data mining, such as processing framework and data visualization, are presented in detail. Some possible ways of applying quantum computation into trajectory data processing, as well as the implementation of some core trajectory mining algorithms by quantum computation are also described. Finally, the challenges of trajectory data processing and promising future research directions are discussed.
    2010,21(2):344-358
    [Abstract] (8589) [HTML] (0) [PDF 1.01 M] (29390)
    Abstract:
    In this paper, the existing intrusion tolerance and self-destruction technology are integrated into autonomic computing in order to construct an autonomic dependability model based on SM-PEPA (semi-Markov performance evaluation process algebra) which is capable of formal analysis and verification. It can hierarchically anticipate Threats to dependability (TtD) at different levels in a self-management manner to satisfy the special requirements for dependability of mission-critical systems. Based on this model, a quantification approach is proposed on the view of steady-state probability to evaluate autonomic dependability. Finally, this paper analyzes the impacts of parameters of the model on autonomic dependability in a case study, and the experimental results demonstrate that improving the detection rate of TtD as well as the successful rate of self-healing will greatly increase the autonomic dependability.
    2014,25(1):37-50 , DOI: 10.13328/j.cnki.jos.004497
    [Abstract] (10669) [HTML] (5601) [PDF 929.87 K] (28143)
    Abstract:
    This paper surveys the state of the art of speech emotion recognition (SER), and presents an outlook on the trend of future SER technology. First, the survey summarizes and analyzes SER in detail from five perspectives, including emotion representation models, representative emotional speech corpora, emotion-related acoustic features extraction, SER methods and applications. Then, based on the survey, the challenges faced by current SER research are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, and presents detailed comparison and analysis between these methods.
    2011,22(6):1299-1315 , DOI: 10.3724/SP.J.1001.2011.03993
    [Abstract] (11759) [HTML] (0) [PDF 987.90 K] (28102)
    Abstract:
    Attribute-Based encryption (ABE) scheme takes attributes as the public key and associates the ciphertext and user’s secret key with attributes, so that it can support expressive access control policies. This dramatically reduces the cost of network bandwidth and sending node’s operation in fine-grained access control of data sharing. Therefore, ABE has a broad prospect of application in the area of fine-grained access control. After analyzing the basic ABE system and its two variants, Key-Policy ABE (KP-ABE) and Ciphertext-Policy ABE (CP-ABE), this study elaborates the research problems relating to ABE systems, including access structure design for CP-ABE, attribute key revocation, key abuse and multi-authorities ABE with an extensive comparison of their functionality and performance. Finally, this study discusses the need-to-be solved problems and main research directions in ABE.
    2018,29(10):2966-2994 , DOI: 10.13328/j.cnki.jos.005551
    [Abstract] (10628) [HTML] (7098) [PDF 610.06 K] (26813)
    Abstract:
    In recent years, the rapid development of Internet technology and Web applications has triggered the explosion of various data on the Internet, which generates a large amount of valuable knowledge. How to organize, represent and analyze these knowledge has attracted much attention. Knowledge graph was thus developed to organize these knowledge in a semantical and visualized manner. Knowledge reasoning over knowledge graph then becomes one of the hot research topics and plays an important role in many applications such as vertical search and intelligent question-answer. The goal of knowledge reasoning over knowledge graph is to infer new facts or identify erroneous facts according to existing ones. Unlike traditional knowledge reasoning, knowledge reasoning over knowledge graph is more diversified, due to the simplicity, intuitiveness, flexibility, and richness of knowledge representation in knowledge graph. Starting with the basic concept of knowledge reasoning, this paper presents a survey on the recently developed methods for knowledge reasoning over knowledge graph. Specifically, the research progress is reviewed in detail from two aspects:One-Step reasoning and multi-step reasoning, each including rule based reasoning, distributed embedding based reasoning, neural network based reasoning and hybrid reasoning. Finally, future research directions and outlook of knowledge reasoning over knowledge graph are discussed.
    2009,20(3):524-545
    [Abstract] (17657) [HTML] (0) [PDF 1.09 M] (26391)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2013,24(4):825-842 , DOI: 10.3724/SP.J.1001.2013.04369
    [Abstract] (9063) [HTML] (0) [PDF 1.09 M] (26030)
    Abstract:
    Honeypot is a proactive defense technology, introduced by the defense side to change the asymmetric situation of a network attack and defensive game. Through the deployment of the honeypots, i.e. security resources without any production purpose, the defenders can deceive attackers to illegally take advantage of the honeypots and capture and analyze the attack behaviors to understand the attack tools and methods, and to learn the intentions and motivations. Honeypot technology has won the sustained attention of the security community to make considerable progress and get wide application, and has become one of the main technical means of the Internet security threat monitoring and analysis. In this paper, the origin and evolution process of the honeypot technology are presented first. Next, the key mechanisms of honeypot technology are comprehensively analyzed, the development process of the honeypot deployment structure is also reviewed, and the latest applications of honeypot technology in the directions of Internet security threat monitoring, analysis and prevention are summarized. Finally, the problems of honeypot technology, development trends and further research directions are discussed.
    2018,29(10):3068-3090 , DOI: 10.13328/j.cnki.jos.005607
    [Abstract] (9478) [HTML] (9886) [PDF 2.28 M] (25872)
    Abstract:
    Designing problems are ubiquitous in science research and industry applications. In recent years, Bayesian optimization, which acts as a very effective global optimization algorithm, has been widely applied in designing problems. By structuring the probabilistic surrogate model and the acquisition function appropriately, Bayesian optimization framework can guarantee to obtain the optimal solution under a few numbers of function evaluations, thus it is very suitable to solve the extremely complex optimization problems in which their objective functions could not be expressed, or the functions are non-convex, multimodal and computational expensive. This paper provides a detailed analysis on Bayesian optimization in methodology and application areas, and discusses its research status and the problems in future researches. This work is hopefully beneficial to the researchers from the related communities.
    2009,20(1):124-137
    [Abstract] (17314) [HTML] (0) [PDF 1.06 M] (25714)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2011,22(3):381-407 , DOI: 10.3724/SP.J.1001.2011.03934
    [Abstract] (10677) [HTML] (0) [PDF 614.69 K] (25403)
    Abstract:
    The popularity of the Internet and the boom of the World Wide Web foster innovative changes in software technology that give birth to a new form of software—networked software, which delivers diversified and personalized on-demand services to the public. With the ever-increasing expansion of applications and users, the scale and complexity of networked software are growing beyond the information processing capability of human beings, which brings software engineers a series of challenges to face. In order to come to a scientific understanding of this kind of ultra-large-scale artificial complex systems, a survey research on the infrastructure, application services, and social interactions of networked software is conducted from a three-dimensional perspective of cyberization, servicesation, and socialization. Interestingly enough, most of them have been found to share the same global characteristics of complex networks such as “Small World” and “Scale Free”. Next, the impact of the empirical study on software engineering research and practice and its implications for further investigations are systematically set forth. The convergence of software engineering and other disciplines will put forth new ideas and thoughts that will breed a new way of thinking and input new methodologies for the study of networked software. This convergence is also expected to achieve the innovations of theories, methods, and key technologies of software engineering to promote the rapid development of software service industry in China.
    2019,30(2):440-468 , DOI: 10.13328/j.cnki.jos.005659
    [Abstract] (9532) [HTML] (7881) [PDF 3.27 M] (25391)
    Abstract:
    Recent years, applying Deep Learning (DL) into Image Semantic Segmentation (ISS) has been widely used due to its state-of-the-art performances and high-quality results. This paper systematically reviews the contribution of DL to the field of ISS. Different methods of ISS based on DL (ISSbDL) are summarized. These methods are divided into ISS based on the Regional Classification (ISSbRC) and ISS based on the Pixel Classification (ISSbPC) according to the image segmentation characteristics and segmentation granularity. Then, the methods of ISSbPC are surveyed from two points of view:ISS based on Fully Supervised Learning (ISSbFSL) and ISS based on Weakly Supervised Learning (ISSbWSL). The representative algorithms of each method are introduced and analyzed, as well as the basic workflow, framework, advantages and disadvantages of these methods are detailedly analyzed and compared. In addition, the related experiments of ISS are analyzed and summarized, and the common data sets and performance evaluation indexes in ISS experiments are introduced. Finally, possible research directions and trends are given and analyzed.
    2004,15(11):1583-1594
    [Abstract] (9424) [HTML] (0) [PDF 1.57 M] (25114)
    Abstract:
    Uncertainty exists widely in the subjective and objective world. In all kinds of uncertainty, randomness and fuzziness are the most important and fundamental. In this paper, the relationship between randomness and fuzziness is discussed. Uncertain states and their changes can be measured by entropy and hyper-entropy respectively. Taken advantage of entropy and hyper-entropy, the uncertainty of chaos, fractal and complex networks by their various evolution and differentiation are further studied. A simple and effective way is proposed to simulate the uncertainty by means of knowledge representation which provides a basis for the automation of both logic and image thinking with uncertainty. The AI (artificial intelligence) with uncertainty is a new cross-discipline, which covers computer science, physics, mathematics, brain science, psychology, cognitive science, biology and philosophy, and results in the automation of representation, process and thinking for uncertain information and knowledge.
    2018,29(7):2092-2115 , DOI: 10.13328/j.cnki.jos.005589
    [Abstract] (11059) [HTML] (7811) [PDF 2.52 M] (24969)
    Abstract:
    Blockchain is a distributed public ledger technology that originates from the digital cryptocurrency, bitcoin. Its development has attracted wide attention in industry and academia fields. Blockchain has the advantages of de-centralization, trustworthiness, anonymity and immutability. It breaks through the limitation of traditional center-based technology and has broad development prospect. This paper introduces the research progress of blockchain technology and its application in the field of information security. Firstly, the basic theory and model of blockchain are introduced from five aspects:Basic framework, key technology, technical feature, and application mode and area. Secondly, from the perspective of current research situation of blockchain in the field of information security, this paper summarizes the research progress of blockchain in authentication technology, access control technology and data protection technology, and compares the characteristics of various researches. Finally, the application challenges of blockchain technology are analyzed, and the development outlook of blockchain in the field of information security is highlighted. This study intends to provide certain reference value for future research work.
    2014,25(4):839-862 , DOI: 10.13328/j.cnki.jos.004558
    [Abstract] (15811) [HTML] (4324) [PDF 1.32 M] (24589)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2005,16(1):1-7
    [Abstract] (22770) [HTML] (0) [PDF 614.61 K] (24498)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2006,17(9):1848-1859
    [Abstract] (12979) [HTML] (0) [PDF 770.40 K] (24030)
    Abstract:
    In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
    2013,24(2):295-316 , DOI: 10.3724/SP.J.1001.2013.04336
    [Abstract] (10050) [HTML] (0) [PDF 0.00 Byte] (23870)
    Abstract:
    Under the new application mode, the traditional hierarchy data centers face several limitations in size, bandwidth, scalability, and cost. In order to meet the needs of new applications, data center network should fulfill the requirements with low-cost, such as high scalability, low configuration overhead, robustness and energy-saving. First, the shortcomings of the traditional data center network architecture are summarized, and new requirements are pointed out. Secondly, the existing proposals are divided into two categories, i.e. server-centric and network-centric. Then, several representative architectures of these two categories are overviewed and compared in detail. Finally, the future directions of data center network are discussed.
    2012,23(8):2058-2072 , DOI: 10.3724/SP.J.1001.2012.04237
    [Abstract] (10370) [HTML] (0) [PDF 800.05 K] (23834)
    Abstract:
    The Distributed denial of service (DDoS) attack is a major threat to the current network. Based on the attack packet level, the study divides DDoS attacks into network-level DDoS attacks and application-level DDoS attacks. Next, the study analyzes the detection and control methods of these two kinds of DDoS attacks in detail, and it also analyzes the drawbacks of different control methods implemented in different network positions. Finally, the study analyzes the drawbacks of the current detection and control methods, the development trend of the DDoS filter system, and corresponding technological challenges are also proposed.
    2023,34(2):625-654 , DOI: 10.13328/j.cnki.jos.006696
    [Abstract] (3854) [HTML] (5276) [PDF 3.04 M] (23789)
    Abstract:
    Source code bug (vulnerability) detection is a process of judging whether there are unexpected behaviors in the program code. It is widely used in software engineering tasks such as software testing and software maintenance, and plays a vital role in software functional assurance and application security. Traditional vulnerability detection research is based on program analysis, which usually requires strong domain knowledge and complex calculation rules, and faces the problem of state explosion, resulting in limited detection performance, and there is room for greater improvement in the rate of false positives and false negatives. In recent years, the open source community's vigorous development has accumulated massive amounts of data with open source code as the core. In this context, the feature learning capabilities of deep learning can automatically learn semantically rich code representations, thereby providing a new way for vulnerability detection. This study collected the latest high-level papers in this field, systematically summarized and explained the current methods from two aspects:vulnerability code dataset and deep learning vulnerability detection model. Finally, it summarizes the main challenges faced by the research in this field, and looks forward to the possible future research focus.
    2005,16(10):1743-1756
    [Abstract] (10526) [HTML] (0) [PDF 545.62 K] (23746)
    Abstract:
    This paper presents a survey on the theory of provable security and its applications to the design and analysis of security protocols. It clarifies what the provable security is, explains some basic notions involved in the theory of provable security and illustrates the basic idea of random oracle model. It also reviews the development and advances of provably secure public-key encryption and digital signature schemes, in the random oracle model or the standard model, as well as the applications of provable security to the design and analysis of session-key distribution protocols and their advances.
    2021,32(2):496-518 , DOI: 10.13328/j.cnki.jos.006140
    [Abstract] (6407) [HTML] (9926) [PDF 2.20 M] (23702)
    Abstract:
    Deep learning has achieved great success in the field of computer vision, surpassing many traditional methods. However, in recent years, deep learning technology has been abused in the production of fake videos, making fake videos represented by Deepfakes flooding on the Internet. This technique produces pornographic movies, fake news, political rumors by tampering or replacing the face information of the original videos and synthesizes fake speech. In order to eliminate the negative effects brought by such forgery technologies, many researchers have conducted in-depth research on the identification of fake videos and proposed a series of detection methods to help institutions or communities to identify such fake videos. Nevertheless, the current detection technology still has many limitations such as specific distribution data, specific compression ratio, and so on, far behind the generation technology of fake video. In addition, different researchers handle the problem from different angles. The data sets and evaluation indicators used are not uniform. So far, the academic community still lacks a unified understanding of deep forgery and detection technology. The architecture of deep forgery and detection technology research is not clear. In this review, the development of deep forgery and detection technologies are reviewed. Besides, existing research works are systematically summarize and scientifically classified. Finally, the social risks posed by the spread of Deepfakes technology are discussed, the limitations of detection technology are analyzed, and the challenges and potential research directions of detection technology are discussed, aiming to provide guidance for follow-up researchers to further promote the development and deployment of Deepfakes detection technology.
    2003,14(9):1621-1628
    [Abstract] (13470) [HTML] (0) [PDF 680.35 K] (23513)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2020,31(7):2127-2156 , DOI: 10.13328/j.cnki.jos.006052
    [Abstract] (6714) [HTML] (7793) [PDF 802.56 K] (23487)
    Abstract:
    Machine learning has become a core technology in areas such as big data, Internet of Things, and cloud computing. Training machine learning models requires a large amount of data, which is often collected by means of crowdsourcing and contains a large number of private data including personally identifiable information (such as phone number, id number, etc.) and sensitive information (such as financial data, health care, etc.). How to protect these data with low cost and high efficiency is an important issue. This paper first introduces the concept of machine learning, explains various definitions of privacy in machine learning and demonstrates all kinds of privacy threats encountered in machine learning, then continues to elaborate on the working principle and outstanding features of the mainstream technology of machine learning privacy protection. According to differential privacy, homomorphic encryption, and secure multi-party computing, the research achievements in the field of machine learning privacy protection are summarized respectively. On this basis, the paper comparatively analyzes the main advantages and disadvantages of different mechanisms of privacy preserving for machine learning. Finally, the developing trend of privacy preserving for machine learning is prospected, and the possible research directions in this field are proposed.
    2010,21(7):1605-1619
    [Abstract] (10160) [HTML] (0) [PDF 856.25 K] (23197)
    Abstract:
    The rapid development of Internet leads to an increase in system complexity and uncertainty. Traditional network management can not meet the requirement, and it shall evolve to fusion based Cyberspace Situational Awareness (CSA). Based on the analysis of function shortage and development requirement, this paper introduces CSA as well as its origin, conception, objective and characteristics. Firstly, a CSA research framework is proposed and the research history is investigated, based on which the main aspects and the existing issues of the research are analyzed. Meanwhile, assessment methods are divided into three categories: Mathematics model, knowledge reasoning and pattern recognition. Then, this paper discusses CSA from three aspects: Model, knowledge representation and assessment methods, and then goes into detail about main idea, assessment process, merits and shortcomings of novel methods. Many typical methods are compared. The current application research of CSA in the fields of security, transmission, survivable, system evaluation and so on is presented. Finally, this paper points the development directions of CSA and offers the conclusions from issue system, technical system and application system.
    2016,27(11):2855-2869 , DOI: 10.13328/j.cnki.jos.004932
    [Abstract] (3276) [HTML] (2871) [PDF 1.85 M] (23130)
    Abstract:
    With the proliferation of the Chinese social network (especially the rise of weibo), the productivity and lifestyle of the country's society is more and more profoundly influenced by the Chinese internet public events. Due to the lack of the effective technical means, the efficiency of information processing is limited. This paper proposes a public event information entropy calculation method. First, a mathematical modeling of event information content is built. Then, multidimensional random variable information entropy of the public events is calculated based on Shannon information theory. Furthermore, a new technical index of quantitative analysis to the internet public events is put forward, laying out a foundation for further research work.
    2013,24(8):1786-1803 , DOI: 10.3724/SP.J.1001.2013.04416
    [Abstract] (14266) [HTML] (0) [PDF 1.04 M] (23064)
    Abstract:
    Many specific application oriented NoSQL database systems are developed for satisfying the new requirement of big data management. This paper surveys researches on typical NoSQL database based on key-value data model. First, the characteristics of big data, and the key technique issues supporting big data management are introduced. Then frontier efforts and research challenges are given, including system architecture, data model, access mode, index, transaction, system elasticity, load balance, replica strategy, data consistency, flash cache, MapReduce based data process and new generation data management system etc. Finally, research prospects are given.

External Links

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063