Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2025,36(5):1907-1923, DOI: 10.13328/j.cnki.jos.007176, CSTR: 32375.14.jos.007176
    [Abstract] (266) [HTML] (55) [PDF 6.74 K] (797)
    Abstract:
    Continuous dynamical systems safety verification is an important research issue, and over the years, various verification methods have been very limited in the scale of the problems they can handle. For a given continuous dynamical system, this study proposes an algorithm to generate a set of compositional probably approximately correct (PAC) barrier certificates through a counterexample-guided approach. A formal description of the infinite-time domain safety verification problem is given in terms of probability and statistics. By establishing and solving a mixed-integer programming method based on the big-M method, the barrier certificate problem is transformed into a constrained optimization problem. Nonlinear inequalities are linearized in intervals using the mean value theorem of differentiation. Finally, this study implements the compositional PAC barrier certificate generator CPBC and evaluates its performance on 11 benchmark systems. The experimental results show that CPBC can successfully verify the safety of each dynamical system under specified different safety requirement thresholds. Compared with existing methods, the proposed method can more efficiently generate reliable probabilistic barrier certificates for complex or high-dimensional systems, with the verified example scale reaching up to hundreds of dimensions.
    2025,36(5):1924-1948, DOI: 10.13328/j.cnki.jos.007178, CSTR: 32375.14.jos.007178
    [Abstract] (204) [HTML] (38) [PDF 6.76 K] (1461)
    Abstract:
    Software traceability is considered critical to trustworthy software engineering, ensuring software reliability through the tracking of the software development process. Despite significant progress in automatic software traceability recovery techniques in recent years, their application in real-world commercial software projects does not meet expectations. An investigation into the application of learning-based software traceability recovery classifier models in commercial software projects is conducted. It uncovers three critical challenges faced in industrial settings. These challenges contribute to underperforming traceability models: low-quality raw data, data sparsity, and class imbalance. In response to these challenges, STRACE(AL+SSL) is proposed. It is a software traceability recovery framework that integrates active learning and semi-supervised learning. By strategically selecting valuable annotated samples and generating high-quality pseudo-labeled samples, STRACE(AL+SSL) effectively harnesses unlabeled data to address data-related challenges. Multiple comparative experiments are conducted with nearly one million issue-commit trace pair samples from 10 different enterprise projects. The results of these experiments validate the effectiveness of the proposed framework for real-world software traceability recovery tasks. The ablation results show that the unlabeled samples selected by the active learning in STRACE(AL+SSL) play a crucial role in the traceability recovery task. Additionally, the optimal combination of sample selection strategies in STRACE(AL+SSL) is confirmed. This includes CBST-Adjust for the semi-supervised sample rebalancing strategy and SMI_Flqmi, which is recognized for its cost-effectiveness and efficiency in active learning.
    2025,36(5):1949-1973, DOI: 10.13328/j.cnki.jos.007185, CSTR: 32375.14.jos.007185
    [Abstract] (182) [HTML] (28) [PDF 6.70 K] (1335)
    Abstract:
    The service descriptions provide limited information about application scenarios, creating a gap between Mashup service component Web API recommendations based on functional similarity calculation and desired expectations. Consequently, there is a need to enhance the accuracy of function matching. While some researchers utilize collaborative associations among Web APIs to enhance recommendation compatibility, they overlook the adverse effects of functional associations on Mashup service creation, thereby limiting the enhancement of recommendation diversity. To address this issue, this study proposes a Web API recommendation method for Mashup service components that integrates latent related words and heterogeneous association compatibility. The study extracts latent related words associated with application scenarios for both Mashup requirements and Web APIs, integrating them into the generation of function vectors. By enhancing the accuracy of functional similarity matching, it obtains a high-quality candidate set of Web API components. Function association and collaboration association are modeled as heterogeneous service association. The study utilizes heterogeneous association compatibility to replace collaboration compatibility in traditional methods, thus enhancing the recommendation diversity of Web APIs. In comparison, the proposed approach demonstrates improvements in evaluation indicators, with Recall, Precision, and NDCG enhanced by 4.17% to 16.05%, 4.46% to 16.62%, and 5.57% to 17.26%, respectively. Additionally, the diversity index ILS is reduced by 8.22% to 15.23%. The Recall and Precision values for cold-start Web API recommendation are 47.71% and 46.58% of those for non-cold-start Web API recommendation, respectively. Experimental results demonstrate that the proposed method not only enhances the quality of Web API recommendation but also yields favorable results for cold-start Web API recommendations.
    2025,36(5):1974-2005, DOI: 10.13328/j.cnki.jos.007191, CSTR: 32375.14.jos.007191
    [Abstract] (233) [HTML] (23) [PDF 6.75 K] (1726)
    Abstract:
    Serverless computing is an emerging paradigm of cloud computing, allowing developers to focus only on application logic development without the need to manage complex underlying tasks. This paradigm allows developers to quickly build smaller-granularity applications, the one at the function level. With the increasing popularity of serverless computing, major cloud computing vendors have introduced their commercial serverless platforms one after another. However, the characteristics of these platforms have yet to be systematically studied and reliably compared. A comprehensive analysis of these characteristics can help developers choose an appropriate serverless platform while developing and executing serverless applications in the right way. To this end, an empirical study is conducted on the characteristics of mainstream commercial serverless platforms. This study involves such mainstream serverless platforms as AWS Lambda, Google Cloud Functions, Microsoft Azure Functions, and Alibaba Function Compute. This study is divided into two major parts: feature summarization and runtime performance analysis. In the feature summarization, the official documents of these serverless platforms are discussed and their key features are summarized and compared in terms of development, deployment, and runtime. In the runtime performance analysis, representative benchmarks are applied to analyze the runtime performance offered by these serverless platforms on a multidimensional basis. Specifically, key factors for the cold-start performance of the applications are first analyzed, such as programming languages and memory sizes. Furthermore, the tasks-executing performance of serverless platforms is discussed. Based on the results of feature summarization and runtime performance analysis, this study sums up a series of findings and provides practical insights and potential research opportunities for developers, cloud computing vendors, and researchers.
    2025,36(5):2006-2025, DOI: 10.13328/j.cnki.jos.007203, CSTR: 32375.14.jos.007203
    [Abstract] (164) [HTML] (32) [PDF 6.76 K] (1526)
    Abstract:
    As too many redundant events included in crash test sequences generated by Android automated test tools may result in test replay, defect comprehension, and repairing difficulty, a great number of test sequence reduction works have been proposed. While current works only focus on the application interface changes and ignore the internal state changes during program execution. Moreover, current works only model application states at a single and abstract granularity, such as control widget granularity or activity granularity, resulting in long test sequences after reduction or inefficient reduction. This study proposes an Android test sequence reduction method combined with multi-granularity based on event labeling. By taking into account the Android lifecycle management mechanism and data flow analysis to label critical events that trigger crashes, this method can narrow the sequence reduction space and design a strategy of rough selection under low granularity and detailed reduction under high granularity. At last, a crash test sequence set containing complex scenarios such as inter-application interaction and user input is collected, and the comparison with other test sequence reduction works on this set verifies the effectiveness of the method proposed in this study.
    2025,36(5):2026-2042, DOI: 10.13328/j.cnki.jos.007204, CSTR: 32375.14.jos.007204
    [Abstract] (159) [HTML] (25) [PDF 6.73 K] (1614)
    Abstract:
    During the path coverage testing of a message passing interface (MPI) program based on evolutionary optimization, the fitness of evolutionary individuals needs to be evaluated by repeatedly executing the MPI program. However, repeated execution of an MPI program often requires high computational costs. Therefore, this study proposes an approach to generate test cases for path coverage of MPI programs guided by surrogate-assisted multi-task evolutionary optimization, which significantly reduces the actual execution times of MPI programs, thereby improving testing efficiency. Firstly, surrogate models are trained for each target sub-path in the target path of an MPI program. Then, the fitness of evolutionary individuals is estimated using the surrogate model corresponding to each target sub-path, and a candidate set of test cases is formed. Finally, all surrogate models are updated based on the candidate set and the actual fitness for each target sub-path. The proposed approach is applied to the basis path coverage testing of seven benchmark MPI programs and compared with several state-of-the-art approaches. The experimental results show that the proposed approach significantly improves testing efficiency while ensuring high effectiveness in generating test cases.
    2025,36(5):2043-2063, DOI: 10.13328/j.cnki.jos.007207, CSTR: 32375.14.jos.007207
    [Abstract] (234) [HTML] (45) [PDF 6.74 K] (1921)
    Abstract:
    The interactions between elements in contemporary software systems are notably intricate, encompassing relationships between packages, classes, and functions. Accurate comprehension of these relationships is pivotal for optimizing system structures and enhancing software quality. Analyzing inter-package relationships can help unveil dependencies between modules, thereby assisting developers in more effectively managing and organizing software architectures. On the other hand, a clear understanding of inter-class relationships contributes to the creation of code repositories that are more scalable and maintainable. Moreover, a clear understanding of inter-function relationships facilitates rapid identification and resolution of logical errors within programs, consequently enhancing the robustness and reliability of the software. However, current predictions of software system interaction confront challenges such as granularity disparities, inadequate features, and version changes. To address this challenge, this study constructs corresponding software network models based on the three granularities, including software packages, classes, and functions. It introduces a novel approach combining local and global features to reinforce the analysis and prediction of software systems through feature extraction and link prediction of software networks. This approach is based on the construction and handling of software networks, involving specific steps such as leveraging the node2vec method to learn local features of software networks and combining Laplacian feature vector encoding to comprehensively represent the global positional information of nodes. Subsequently, the Graph Transformer model is employed to further optimize the feature vectors of node attributes, culminating in the completion of the interaction prediction task of the software system. Extensive experimental validations are conducted on three Java open-source projects, encompassing within-version and cross-version interaction prediction tasks. The experimental results demonstrate that, compared to benchmark methods, the proposed approach achieves an average increase of 8.2% and 8.5% in AUC and AP values, respectively in within-version prediction tasks. This approach reaches an average rise of 3.5% and 2.4% in AUC and AP values, respectively, in cross-version prediction tasks.
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    Available online:  May 14, 2025 , DOI: 10.13328/j.cnki.jos.007293
    Abstract:
    The purpose of text-image person re-identification is to employ the text description to retrieve the target persons in the image database. The main challenge of this technology is to embed image and text features into common potential space to achieve cross-modal alignment. Many existing studies try to adopt separate pre-trained unimodal models to extract visual and text features, and then employ segmentation or attention mechanisms to obtain explicit cross-modal alignment. However, these explicit alignment methods generally lack the underlying alignment ability needed to effectively match multimodal features, and the utilization of preset cross-modal correspondence to achieve explicit alignment may result in modal information distortion. An implicit multi-scale alignment and interaction for text-image person re-identification method is proposed. Firstly, the semantic consistent feature pyramid network is employed to extract multi-scale features of the images, and attention weights are adopted to fuse different scale features including global and local information. Secondly, the association between image and text is learned using a multivariate interaction attention mechanism, which can effectively capture the corresponding relationship between different visual features and text information, narrow the gap between modes, and achieve implicit multi-scale semantic alignment. Additionally, the foreground enhancement discriminator is adopted to enhance the target person and extract purer person features, which is helpful for alleviating the information inequality between images and texts. Experimental results on three mainstream text-image person re-identification datasets of CUHK-PEDES, ICFG-PEDES and RSTPReid show that the proposed method effectively improves the cross-modal retrieval performance, which is 2%?9% higher than the Rank-1 of SOTA algorithm.
    Available online:  May 14, 2025 , DOI: 10.13328/j.cnki.jos.007356
    Abstract:
    Since the advent of Bitcoin, blockchain technology has profoundly influenced numerous fields. However, the absence of effective communication mechanisms between heterogeneous and isolated blockchain systems has hindered the advancement and sustainable development of the blockchain ecosystem. In response, cross-chain technology has emerged as a rapidly evolving field and a focal point of research. The decentralized nature of blockchain, coupled with the complexity of cross-chain scenarios, introduces significant security challenges. This study proposes a formal analysis of the IBC (inter-blockchain communications) protocol, one of the most widely adopted cross-chain communication protocols, to assist developers in designing and implementing cross-chain technologies with enhanced security. The IBC protocol is formalized using TLA+, a temporal logic specification language, and its critical properties are verified through the model-checking tool TLC. An in-depth analysis of the verification results reveals several issues impacting the correctness of packet transmission and token transfer. Corresponding recommendations are proposed to mitigate these security risks. The findings have been reported to the IBC developer community, with most of them receiving acknowledgment.
    Available online:  May 14, 2025 , DOI: 10.13328/j.cnki.jos.007376
    Abstract:
    Software vulnerabilities are code segments in software that are prone to exploitation. Ensuring that software is not easily attacked is a crucial security requirement in software development. Software vulnerability prediction involves analyzing and predicting potential vulnerabilities in software code. Deep learning-driven software vulnerability prediction has become a popular research field in recent years, with a long time span, numerous studies, and substantial research achievements. To review relevant research findings and summarize the research hotspots, a survey of 151 studies related to deep learning-driven software vulnerability prediction published between 2017 and 2024 is conducted. It summarizes the research problems, progress, and challenges discussed in the literature, providing a reference for future research.
    Available online:  May 14, 2025 , DOI: 10.13328/j.cnki.jos.007377
    Abstract:
    A timer is used to schedule and execute delayed tasks in an operating system. It operates asynchronously in an atomic context and can execute concurrently with different threads at any time. If developers fail to account for all possible scenarios of multithread interleaving, various types of concurrency bugs may be introduced, posing a serious threat to the security of the operating system. Timer concurrency bugs are more difficult to detect than typical concurrency bugs because they involve not only multithread interleaving but also the delayed and repeated scheduling of timer handlers. Currently, there are no tools that can effectively detect such bugs. In this study, three types of timer concurrency bugs are summarized: sleeping timer bugs, timer deadlock bugs, and zombie timer bugs. To enhance detection efficiency, firstly, all timer-related code is extracted through pointer analysis, reducing unnecessary analysis overhead. A context-sensitive, path-sensitive, and flow-sensitive interprocedural control flow graph is then constructed to provide a foundation for subsequence analysis. Based on static analysis techniques, including call graph traversal, lockset analysis, points-to analysis, and control flow analysis, three detection algorithms are designed to identify the different types of timer concurrency bugs. To evaluate the effectiveness of the proposed algorithm, they are applied to the Linux 5.15 kernel, where 328 real-world timer concurrency bugs are detected. A total of 56 patches are submitted to the Linux kernel community, with 49 patches merged into the mainline kernel, 295 bugs confirmed and fixed, and 14 CVE identifiers assigned. These results demonstrate the effectiveness of the proposed method. Finally, a systematic analysis of performance, false positives, and false negatives is conducted through comparative experiments, and methods for repairing the three types of bugs are summarized.
    Available online:  May 14, 2025 , DOI: 10.13328/j.cnki.jos.007378
    Abstract:
    With the rapid development of embedded technology, mobile computing, and the Internet of Things (IoT), an increasing number of sensing devices have been integrated into people’s daily lives, including smartphones, cameras, smart bracelets, smart routers, and headsets. The sensors embedded in these devices facilitate the collection of personal information such as location, activities, vital signs, and social interactions, thus fostering a new class of applications known as human-centric sensing. Compared with traditional sensing methods, including wearable-based, vision-based, and wireless signal-based sensing, millimeter wave (mmWave) signals offer numerous advantages, such as high accuracy, non-line-of-sight capability, passive sensing (without requiring users to carry sensors), high spatiotemporal resolution, easy deployment, and robust environmental adaptability. The advantages of mmWave-based sensing have made it a research focus in both academia and industry in recent years, enabling non-contact, fine-grained perception of human activities and physical signs. Based on an overview of recent studies, the background and research significance of mmWave-based human sensing are examined. The existing methods are categorized into four main areas: tracking and positioning, motion recognition, biometric measurement, and human imaging. Commonly used publicly available datasets are also introduced. Finally, potential research challenges and future directions are discussed, highlighting promising developments toward achieving accurate, ubiquitous, and stable human perception.
    Available online:  May 14, 2025 , DOI: 10.13328/j.cnki.jos.007379
    Abstract:
    In recent years, impressive capabilities have been demonstrated by deep learning-based vulnerability detection models in detecting vulnerabilities. Previous research has widely explored adversarial attacks using variable renaming to introduce disturbances in source code and evade detection. However, the effectiveness of introducing multiple disturbances through various transformation techniques in source code has not been adequately investigated. In this study, multiple synonymous transformation operators are applied to introduce disturbances in source code. A combination optimization strategy based on genetic algorithms is proposed, enabling the selection of source code transformation operators with the highest fitness to guide the generation of adversarial code segments capable of evading vulnerability detection. The proposed method is implemented in a framework named non-vulnerability generator (NonVulGen) and evaluated against deep learning-based vulnerability detection models. When applied to recently developed deep learning models, an average attack success rate of 91.38% is achieved against the CodeBERT-based model and 93.65% against the GraphCodeBERT-based model, representing improvements of 28.94% and 15.52% over state-of-the-art baselines, respectively. To assess the generalization ability of the proposed attack method, common models including Devign, ReGVD, and LineVul are targeted, achieving average success rates of 98.88%, 97.85%, and 92.57%, respectively. Experimental results indicate that adversarial code segments generated by NonVulGenx cannot be effectively distinguished by deep learning-based vulnerability detection models. Furthermore, significant reductions in attack success rates are observed after retraining the models with adversarial samples generated based on the training data, with a decrease of 96.83% for CodeBERT, 97.12% for GraphCodeBERT, 98.79% for Devign, 98.57% for ReGVD, and 97.94% for LineVul. These findings reveal the critical challenge of adversarial attacks in deep learning-based vulnerability detection models and highlight the necessity for model reinforcement before deployment.
    Available online:  May 07, 2025 , DOI: 10.13328/j.cnki.jos.007309
    Abstract:
    With the continuous development of information technology, the quantity and variety of software products are increasing, but even high-quality software may contain vulnerabilities. In addition, the software update speed is fast, and the software architecture is increasingly complex, which leads to the gradual evolution of vulnerabilities into new forms. Consequently, traditional vulnerability detection methods and rules are difficult to apply to new vulnerability features. Due to the scarcity of zero-day vulnerability samples, zero-day vulnerabilities that appear in the software evolution process are difficult to find, which brings great potential risks to software security. This study proposes a vulnerability sample generation method based on abstract syntax tree mutation, which can simulate the structure and syntax rules of real vulnerabilities, generate vulnerability samples more in line with the actual situation, and provide a more effective solution for software security and reliability. This method analyzes the abstract syntax tree structure generated by Eclipse CDT, extracts the syntactic information in the nodes, reconstructs the nodes and abstract syntax trees, optimizes the abstract syntax tree structure, and designs a series of mutation operators. Subsequently, it performs mutation operations on the optimized abstract syntax trees. The method proposed in this paper can generate mutation samples with the characteristics of UAF and CUAF vulnerabilities, which can be used for the detection of zero-day vulnerabilities and help to improve the detection rate of zero-day vulnerabilities. Experimental results show that this method reduces the invalid sample size by 34% on average compared with the random variation method in traditional detection methods, and can generate more complex mutated samples. In addition, this method can generate more complex mutated samples, enhancing the coverage and accuracy of detection.
    Available online:  April 30, 2025 , DOI: 10.13328/j.cnki.jos.007311
    Abstract:
    To address the issue of untrustworthy behaviors resulting from malicious attackers exploiting security vulnerabilities within smart contracts in the consortium blockchain system, this study introduces a trusted verification mechanism of smart contract behavior for consortium blockchain to conduct trusted verification for contract behavior integrity. Firstly, the proposed approach takes the system call as the smallest behavior unit and describes the historical behavioral state with the behavior sequence based on system calls. Subsequently, on the premise of ensuring the trustworthiness of contract code release and the execution environment, it performs trusted verification according to predefined behavioral rules during contract execution. Finally, a theoretical analysis of this mechanism is carried out, and an experimental evaluation is conducted in the Hyperledger Fabric environment. Results demonstrate that the proposed method can effectively achieve the trusted verification of smart contract behavior and ensure the credibility of behavior within the life cycle of smart contracts.
    Available online:  April 30, 2025 , DOI: 10.13328/j.cnki.jos.007373
    Abstract:
    Chinese idioms, as an essential part of Chinese writing, possess concise expressiveness and profound cultural significance. They are typically phrases or short sentences that have become fixed through long-term use, with diverse origins and relatively stable meanings. However, due to the pictographic nature of Chinese characters and the historical evolution of Chinese vocabulary and semantics, there is often a discrepancy between the literal and actual meanings of idioms, which exhibits a unique non-compositional characteristic. This feature makes idioms prone to misuse of idioms in practice, with research showing that certain idioms are misused at a rate as high as 98.6%. Unlike in other languages, the misuse of Chinese idioms does not typically result in lexical or grammatical errors, which makes traditional spelling and grammar error detection methods ineffective at identifying idiom misuse. An intuitive approach is to incorporate the interpretations of idioms into the model, but simply combining these interpretations can lead to problems such as excessively long sentences that are hard to process and noise in knowledge. To address this, this study proposes a novel model that uses levitating knowledge injection to incorporate idiom interpretations. This model introduces learnable weight factors to control the injection process and explores effective strategies for knowledge infusion. To validate the model’s effectiveness, a dataset specifically for diagnosing the misuse of Chinese idioms is created. Experimental results show that the model achieves optimal performance across all test sets, particularly in complex scenarios involving long texts and multiple idioms, where its performance improves by 12.4%–13.9% compared to the baseline model. At the same time, training speed increases by 30%–40%, and testing speed is improved by 90%. These results demonstrate that the proposed model not only effectively integrates the interpretative features of idioms but also significantly reduces the negative impact of interpretation concatenation on the model’s processing capacity and efficiency, thus enhancing the performance of Chinese idiom misuse diagnosis and strengthening the model’s ability to handle complex scenarios with multiple idioms and lengthy interpretations.
    Available online:  April 25, 2025 , DOI: 10.13328/j.cnki.jos.007370
    Abstract:
    Knowledge graph (KG), as structured representations of knowledge, has a wide range of applications in the medical field. Entity alignment, which involves identifying equivalent entities across different KGs, is a fundamental step in constructing large-scale KGs. Although extensive research has focused on this issue, most of it has concentrated on aligning pairs of KGs, typically by capturing the semantic and structural information of entities to generate embeddings, followed by calculating embedding similarity to identify equivalent entities. This study identifies the problem of alignment error propagation when aligning multiple KGs. Given the high accuracy requirements for entity alignment in medical contexts, we propose a multi-source Chinese medical knowledge graph entity alignment method (MSOI-Align) that integrates entity semantics and ontology information. Our method pairs multiple KGs and uses representation learning to generate entity embeddings. It also incorporates both the similarity of entity names and ontology consistency constraints, leveraging a large language model to filter a set of candidate entities. Subsequently, based on triadic closure theory and the large language model, MSOI-Align automatically identifies and corrects the propagation of alignment errors for the candidate entities. Experimental results on four Chinese medical knowledge graphs show that MSOI-Align significantly enhances the precision of the entity alignment task, with the Hits@1 metric increasing from 0.42 to 0.92 compared to the state-of-the-art baseline. The fused knowledge graph, CMKG, contains 13 types of ontologies, 190000 entities, and approximately 700000 triplets. Due to copyright restrictions on one of the KGs, we are releasing the fusion of the other three KGs, named OpenCMKG.
    Available online:  April 23, 2025 , DOI: 10.13328/j.cnki.jos.007374
    Abstract:
    Attributed graphs are increasingly used to represent data with relational structures, and detecting anomalies with them is gaining attention. Due to their characteristics, such as rich attribute information and complex structural relationships, various types of anomalies may exist, including global, structural, and community anomalies, which often remain hidden within the graph’s deep structure. Existing methods face challenges such as loss of structural information and difficulty identifying abnormal nodes. Structural information theory leverages encoding trees to represent hierarchical relationships within data and establishes correlations across different levels by minimizing structural entropy, effectively capturing the graph’s essential structure. This study proposes an anomaly detection method for attributed graphs based on structural entropy. First, by integrating the structural and attribute information of attributed graphs, a K-dimensional encoding tree to represent the hierarchical community structure through structural entropy minimization is constructed. Next, using the node attributes and hierarchical community information within the encoding tree, scoring mechanisms for detecting structural and attribute anomalies based on Euclidean distance and connection strength between nodes are designed. This approach identifies abnormal nodes and detects various types of anomalies. The proposed method is evaluated through comparative tests on several attributed graph datasets. Experimental results demonstrate that the proposed method effectively detects different types of anomalies and significantly outperforms existing state-of-the-art methods.
    Available online:  April 23, 2025 , DOI: 10.13328/j.cnki.jos.007375
    Abstract:
    Software vulnerabilities pose significant threats to real-world systems. In recent years, learning-based vulnerability detection methods, especially deep learning-based approaches, have gained widespread attention due to their ability to extract implicit vulnerability features from large-scale vulnerability samples. However, due to differences in features among different types of vulnerabilities and the problem of imbalanced data distribution, existing deep learning-based vulnerability detection methods struggle to accurately identify specific vulnerability types. To address this issue, this study proposes MulVD, a deep learning-based multi-class vulnerability detection method. MulVD constructs a structure-aware graph neural network (SA-GNN) that can adaptively extract local and representative vulnerability patterns while rebalancing the data distribution without introducing noise. The effectiveness of the proposed approach in both binary and multi-class vulnerability detection tasks is evaluated. Experimental results demonstrate that MulVD significantly improves the performance of existing deep learning-based vulnerability detection techniques.
    Available online:  April 23, 2025 , DOI: 10.13328/j.cnki.jos.007369
    Abstract:
    With the widespread adoption of programming naming conventions and the increasing emphasis on self-explanatory code, traditional summarizing code comments, which are often similar to code literal meaning, are losing appeal among developers. Instead, developers value supplementary code comments that provide additional information beyond the code itself to facilitate program understanding and maintenance. However, generating such comments typically requires external information resources beyond the code base, and the diversity of supplementary information presents significant challenges to existing methods. This study leverages Issue reports as a crucial external information source and proposes an Issue-based retrieval augmentation method using large language models (LLMs) to generate supplementary code comments. The proposed method classifies the supplementary information found in Issue reports into five categories, retrieves Issue sentences containing this information, and generates corresponding comments using LLMs. In addition, the code relevance and Issue verifiability of the generated comments are evaluated to minimize hallucinations. Experiments conducted on two popular LLMs, ChatGPT and GPT-4o, demonstrate the effectiveness of the proposed method. Compared to existing approaches, the proposed method significantly improves the coverage of manual supplementary comments from 33.6% to 72.2% for ChatGPT and from 35.8% to 88.4% for GPT-4o. Moreover, the generated comments offer developers valuable supplementary information, proving essential for understanding some tricky code.
    Available online:  April 23, 2025 , DOI: 10.13328/j.cnki.jos.007381
    Abstract:
    The prediction of future water quality, which involves leveraging historical water quality data from various observation nodes and their corresponding topological relationships, is recognized as a critical application of graph neural networks in environmental protection. This task is complicated by the presence of noise within both the collected numerical data and the inter-node topological structures, compounded by a coupling phenomenon. The varying directions of pollutant flow intensify the complexity of coupling between numerical and structural noise. To address these challenges, a novel tendency-aware graph neural network is proposed for water quality prediction with coupled noise. First, historical water quality trend features are used to uncover local interdependencies among raw water quality indicators, enabling the construction of multiple potential hydrological topological structures and the disentanglement of structural noise. Second, spatio-temporal features are extracted from the constructed adjacency matrices and original data to separate numerical noise. Finally, water quality predictions are obtained by aggregating coherent node representations derived from the inferred latent structures across pre- and post-structure construction phases. Experimental results demonstrate that the proposed method outperforms state-of-the-art models on real-world datasets and generates potential hydrological topological structures that closely align with actual observations. The code and data are publicly available on GitHub: https://github.com/aTongs1/TaGNN.
    Available online:  April 18, 2025 , DOI: 10.13328/j.cnki.jos.007383
    Abstract:
    Stochastic optimization algorithms are recognized as essential for addressing large-scale data and complex models in machine learning. Among these, variance reduction methods, such as the STORM algorithm, have gained attention for their ability to achieve optimal convergence rates of $ {\mathrm{O}}\left({T}^{-1/3}\right) $. However, traditional variance reduction methods typically depend on specific problem parameters (e.g., the smoothness constant, noise variance, and gradient upper bound) for setting the learning rate and momentum, limiting their practical applicability. To overcome this limitation, this study proposes an adaptive variance reduction method based on a normalization technique, which eliminates the need for prior knowledge of problem parameters while maintaining optimal convergence rates. Compared to existing adaptive variance reduction methods, the proposed approach offers several advantages: (1) no reliance on additional assumptions, such as bounded gradients, bounded function values, or excessively large initial batch sizes; (2) the achievement of the optimal convergence rate of $ {\mathrm{O}}\left({T}^{-1/3}\right) $without extra term of $ {\mathrm{O}}\left(\mathrm{log}T\right)$; (3) a concise and straightforward proof, facilitating extensions to other stochastic optimization problems. The superiority of the proposed method is further validated through numerical experiments, demonstrating enhanced performance when compared to other approaches.
    Available online:  March 26, 2025 , DOI: 10.13328/j.cnki.jos.007318
    Abstract:
    Blockchain has shown strong vitality in the field of cryptocurrency investment, attracting the participation of a large number of investors. However, due to the anonymity of blockchain, it induces a lot of fraud, among which the Ponzi scheme smart contract is a typical fraudulent investment activity, causing huge economic losses for investors. Therefore, the detection of Ponzi scheme contracts on Ethereum becomes particularly important. Nevertheless, most existing studies have ignored control flow information in the source code of Ponzi scheme contracts. To extract more semantic and structural information from Ponzi scheme contracts, this study proposes a Ponzi scheme contract detection model based on code control flow graph. First, the model constructs the obtained contract source code in the form of a control flow diagram. Then, key features including data flow information and code structure information are extracted by the Word2Vec algorithm. Considering that the functions of each smart contract are different and the length of the code varies significantly, resulting in a large difference in the extracted feature vectors. In this study, feature vectors generated by different smart contracts are aligned so that all feature vectors have the same dimension, which is convenient for subsequent processing. Secondly, the feature learning module based on graph convolution and Transformer is utilized to introduce multi-head attention mechanism to learn the dependency of node features. Finally, the multilayer perceptron is used to identify the Ponzi scheme contract. By comparing the proposed model with the traditional graph feature learning model on the dataset provided by the Xblock website, the performance of the multi-head attention mechanism introduced by the model is verified. Experimental results demonstrate that this model effectively improves the ability to detect Ponzi scheme contracts.
    Available online:  March 26, 2025 , DOI: 10.13328/j.cnki.jos.007320
    Abstract:
    The application of artificial intelligence technology has extended from relatively static tasks such as classification, translation, and question answering to relatively dynamic tasks that require a series of “interaction-action” with the environment to be completed, like autonomous driving, robotic control, and games. The core of the model for executing such tasks is the sequential decision-making (SDM) algorithm. As it faces higher uncertainties of the environment and interaction and these tasks are often safety-critical systems, the testing techniques are confronted with great challenges. The existing testing technologies for intelligent algorithm models mainly focus on the reliability of a single model, the generation of diverse test scenarios for complex tasks, simulation testing, etc., while no attention is paid to the “interaction-action” decision sequence of the SDM model, leading to unadaptability or low cost-effectiveness. In this study, a fuzz testing method named IIFuzzing for intervening in the execution of inert “interaction-action” decision sequences is proposed. In the fuzz testing framework, by learning the “interaction-action” decision sequence pattern, the inert “interaction-action” decision sequences that will not trigger failure accidents are predicted and the testing execution of such sequences is terminated to improve the testing efficiency. The experimental evaluations are conducted in four common test configurations, and the results show that compared with the latest fuzz testing for SDM models, IIFuzzing can detect 16.7%–54.5% more failure accidents within the same time, and the diversity of accidents is also better than that of the baseline approach.
    Available online:  March 12, 2025 , DOI: 10.13328/j.cnki.jos.007310
    Abstract:
    With the continuous deepening of research on the security and privacy of deep learning models, researchers find that model stealing attacks pose a tremendous threat to neural networks. A typical data-dependent model stealing attack can use a certain percentage of real data to query the target model and train an alternative model locally to steal the target model. Since 2020, a novel data-free model stealing attack method has been proposed, which can steal and attack deep neural networks simply by using fake query examples generated by generative models. Since it does not rely on real data, the data-free model stealing attack can cause more serious damage. However, the diversity and effectiveness of the query examples constructed by the current data-free model stealing attack methods are insufficient, and there are problems of a large number of queries and a relatively low success rate of the attack during the model stealing process. Therefore, this study proposes a vision feature decoupling-based model stealing attack (VFDA), which decouples and generates the visual features of the query examples generated during the data-free model stealing process by using a multi-decoder structure, thus improving the diversity of query examples and the effectiveness of model stealing. Specifically, VFDA uses three decoders to respectively generate the texture information, region encoding, and smoothing information of query examples to complete the decoupling of visual features of query examples. Secondly, to make the generated query examples more consistent with the visual features of real examples, the sparsity of the texture information is limited and the generated smoothing information is filtered. VFDA exploits the property that the representational tendency of neural networks depends on the image texture features, and can generate query examples with inter-class diversity, thus effectively improving the similarity of model stealing and the success rate of the attack. In addition, VFDA adds intra-class diversity loss to the smoothed information of query samples generated through decoupling to make the query samples more consistent with real sample distribution. By comparing with multiple model stealing attack methods, the VFDA method proposed in this study has better performance in the similarity of model stealing and the success rate of the attack. In particular, on the GTSRB and Tiny-ImageNet datasets with high resolution, the attack success rate is respectively improved by 3.86% and 4.15% on average compared with the currently better EBFA method.
    Available online:  February 26, 2025 , DOI: 10.13328/j.cnki.jos.007303
    Abstract:
    Phrasal visual grounding, a fundamental and critical research task in the field of multimodal studies, aims at predicting fine-grained alignment relationships between textual phrases and image regions. Despite the remarkable progress achieved by existing phrasal visual grounding approaches, they all ignore the implicit alignment relationships between textual phrases and their corresponding image regions, commonly referred to as implicit phrase-region alignment. Predicting such relationships can effectively evaluate the ability of models to understand deep multimodal semantics. Therefore, to effectively model implicit phrase-region alignment relationships, this study proposes an implicit-enhanced causal modeling (ICM) approach for phrasal visual grounding, which employs the intervention strategies of causal reasoning to mitigate the confusion caused by shallow semantics. To evaluate models’ ability to understand deep multimodal semantics, this study annotates a high-quality implicit dataset and conducts a large number of experiments. Multiple sets of comparative experimental results demonstrate the effectiveness of the proposed ICM approach in modeling implicit phrase-region alignment relationships. Furthermore, the proposed ICM approach outperforms some advanced multimodal large language models (MLLMs) on the implicit dataset, further promoting the research of MLLMs towards more implicit scenarios.
    Available online:  February 26, 2025 , DOI: 10.13328/j.cnki.jos.007306
    Abstract:
    Twin support vector machine (TSVM) can effectively tackle data such as cross or XOR data. However, when set-valued data are handled, TSVM usually makes use of statistical information of set-valued objects such as the mean and the median. Unlike TSVM, this study proposes twin support function machine (TSFM) that can directly deal with set-valued data. In terms of support functions defined for set-valued objects, TSFM obtains nonparallel hyperplanes in a Banach space. To suppress outliers in set-valued data, TSFM adopts the pinball loss function and introduce the weights of set-valued objects. Considering that TSFM involves optimization problems in the infinite-dimensional space, the measure is taken in the form of a linear combination of Dirac measures. Thus the optimization model in the finite-dimensional space is constructed. To solve the optimization model effectively, this study employs the sampling strategy to transform the model into quadratic programming (QP) problems. The dual formulations of the QP problems are derived, which provides theoretical foundations for determining which sampling points are support vectors. To classify set-valued data, the distance from the set-valued object to the hyperplane in a Banach space is defined, and the decision rule is derived therefrom. This study also considers the kernelization of support functions to capture the nonlinear features of data, which makes the proposed model available for indefinite kernels. Experimental results demonstrate that TSFM can capture the intrinsic structure of cross-plane set-valued data and obtain good classification performance in the case of outliers or set-valued objects containing a few high-dimensional examples.
    Available online:  February 26, 2025 , DOI: 10.13328/j.cnki.jos.007299
    Abstract:
    Large language model (LLM) like ChatGPT has found widespread applications across various fields due to their strong natural language understanding and generation capabilities. However, deep learning models exhibit vulnerability when subjected to adversarial example attacks. In natural language processing, current research on adversarial example generation methods typically employs CNN-based models, RNN-based models, and Transformer-based pre-trained models as target models, with few studies exploring the robustness of LLMs under adversarial attacks and quantifying the evaluation criteria of LLM robustness. Taking ChatGPT against Chinese adversarial attacks as an example, this study introduces a novel concept termed offset average difference (OAD) and proposes a quantifiable LLM robustness evaluation metric based on OAD, named OAD-based robustness score (ORS). In a black-box attack scenario, this study selects nine mainstream Chinese adversarial attack methods based on word importance to generate adversarial texts, which are then employed to attack ChatGPT and yield the attack success rate of each method. The proposed ORS assigns a robustness score to LLMs for each attack method based on the attack success rate. In addition to the ChatGPT that outputs hard labels, this study designs ORS for target models with soft-labeled outputs based on the attack success rate and the proportion of misclassified adversarial texts with high confidence. Meanwhile, this study extends the scoring formula to the fluency assessment of adversarial texts, proposing an OAD-based adversarial text fluency scoring method, named OAD-based fluency score (OFS). Compared to traditional methods requiring human involvement, the proposed OFS greatly reduces evaluation costs. Experiments conducted on real-world Chinese news and sentiment classification datasets to some extent initially demonstrate that, for text classification tasks, the robustness score of ChatGPT against adversarial attacks is nearly 20% higher than that of Chinese BERT. However, the powerful ChatGPT still produces erroneous predictions under adversarial attacks, with the highest attack success rate exceeding 40%.
    Available online:  February 26, 2025 , DOI: 10.13328/j.cnki.jos.007289
    Abstract:
    Cross-domain recommendation (CDR) alleviates the cold start problem by transferring the user-item rating patterns from a dataset in a dense rating auxiliary domain to one in a sparse rating target domain, and has been widely studied in recent years. The clustering methods based on single-domain recommendation adopted by most CDR algorithms fail to effectively utilize overlapping information and sufficiently adapt to CDR, resulting in inaccurate clustering results. In CDR, graph convolution network (GCN) methods can fully utilize the associations between nodes to improve recommendation accuracy. However, GCN-based CDR often employs static graph learning for node embedding, ignoring the fact that user preferences may change with different recommendation scenarios, which causes poor model performance across different recommendation tasks and ineffective mitigation of data sparsity. To this end, a multi-layer recurrent GCN CDR model based on a pseudo-overlap detection mechanism is proposed. Firstly, by fully leveraging overlapping data based on the community clustering algorithm Louvain, a pseudo-overlap detection mechanism is designed to mine user trust relationships and similar user communities, thereby enhancing the adaptability and accuracy of clustering algorithms in CDR. Secondly, a multi-layer recurrent GCN consisting of an embedding learning module and a graph learning module is proposed to learn dynamic domain-shared features, domain-specific features, and dynamic graph structures. By conducting iterative enhancement of the two modules, the latest user preferences are obtained to alleviate data sparsity. Finally, a multi-layer perceptron (MLP) is employed to model user-item interactions and obtain predicted ratings. Comparative results with 12 related models across four groups of data domains demonstrate the effectiveness of the proposed method, with average improvements of 5.47%, 3.44%, and 2.38% in MRR, NDCG, and HR metrics respectively.
    Available online:  February 26, 2025 , DOI: 10.13328/j.cnki.jos.007302
    Abstract:
    This study discusses the computational complexity of the partition function of the symmetric dual-spin system on regular graphs. Based on # exponential time hypothesis (#ETH) and random exponential time hypothesis (rETH), this study develops the classical dichotomies of this problem class into the exponential dichotomies, also known as the fine-grained dichotomies. In other words, this study proves that when the given tractable conditions are satisfied, then the problem is solvable in polynomial time; otherwise, there is no sub-exponential time algorithm when #ETH holds. This study also proposes two solutions to solve the in-effectiveness of existing interpolation methods on building sqrt-sub-exponential time reductions under the restriction of planar graphs. It also utilizes these two solutions to discuss the related fine-grained complexity and dichotomy of this problem under the planar graph restriction.
    Available online:  February 26, 2025 , DOI: 10.13328/j.cnki.jos.007321
    Abstract:
    Visual-language pre-training (VLP) aims to obtain a powerful multimodal representation by learning on a large-scale image-text multimodal dataset. Multimodal feature fusion and alignment is a key challenge in multimodal model training. In most of the existing visual-language pre-training models, for the multimodal feature fusion and alignment problem, the main approach is that the extracted visual features and text features are directly input into the Transformer model. Since the attention mechanism in the Transformer calculates the similarity between pairs, it is difficult to achieve the alignment among multiple entities. Considering that the hyperedges of hypergraph neural networks possess the characteristics of connecting multiple entities and encoding high-order entity correlations, thus enabling the establishment of relationships among multiple entities. In this study, a visual-language multimodal model pre-training method based on multi-entity alignment of hypergraph neural networks is proposed. In this method, the hypergraph neural network learning module is introduced into the Transformer multi-modal fusion encoder to learn the alignment relationship of multi-modal entities, thereby enhancing the entity alignment ability of the multi-modal fusion encoder in the pre-training model. The proposed visual-language pre-training model is pre-trained on the large-scale image-text datasets and fine-tuned on multiple visual-language downstream tasks such as visual question answering, image-text retrieval, visual grounding, and natural language visual reasoning. The experimental results indicate that compared with the baseline method, the proposed method has performance improvements in multiple downstream tasks, among which the accuracy is improved by 1.8% on the NLVR2 task.
    Available online:  February 26, 2025 , DOI: 10.13328/j.cnki.jos.007322
    Abstract:
    There are numerous and miscellaneous sources of online information. Judging whether it is a rumor in a timely and accurate manner is a crucial issue in the research of the cognitive domain of social media. Most of the previous studies have mainly concentrated on the text content of rumors, user characteristics, or the inherent features confined to the propagation mode, ignoring the key clues of the collective emotions generated by users’ participation in event discussions and the emotional steady-state characteristics hidden in the spread of rumors. In this study, a social network rumor detection method that is oriented by collective emotional stabilization and integrates temporal and spatial steady-state features is proposed. Based on the text features and user behaviors in rumor propagation, the temporal and spatial relationship steady-state features of collective emotions are combined for the first time, which can achieve strong expressiveness and detection accuracy. Specifically, this method takes the emotional keywords of users’ attitude towards a certain event or topic as the basis and uses recurrent neural networks to construct emotional steady-state features of the temporal relationship, enabling the collective emotions to have temporally consistent features with strong expressiveness, which can reflect the convergence effect of the collective emotions over time. The heterogeneous graph neural network is utilized to establish the connections between users and keywords, as well as between texts and keywords so that the collective emotions possess the fine-grained collective emotional steady-state features of the spatial relationship. Finally, the two types of local steady-state features are fused, possessing globality and improving the feature expression. Further classification can obtain the rumor detection results. The proposed method is run on two internationally publicly available and widely used Twitter datasets. Compared with the best-performing method in the baselines, the accuracy is improved by 3.4% and 3.2% respectively; the T-F1 value is improved by 3.0% and 1.8% respectively; the N-F1 value is improved by 2.7% and 2.3% respectively; the U-F1 value is improved by 2.3% and 1.0% respectively.
    Available online:  February 19, 2025 , DOI: 10.13328/j.cnki.jos.007307
    Abstract:
    In terms of point-in-polygon tests, a grid method proposed recently exhibits high computational efficiency. This method organizes the polygon fragments within each grid cell into stripe structures, ensuring that edges in each stripe intersect with both the left and right boundaries of the stripe. In this way, localization computation is enhanced, and GPUs are used for convenient parallel computation, resulting in a detection efficiency superior to that of various previous methods. However, stripe structures constructed based on grid cells generate redundant stripes. Besides, the method has a high space requirement for stripe construction, making it inconvenient to construct stripe structures on GPUs. In response to this, this study proposes to construct stripe structures via grid rows. Thus, redundant strips can be eliminated, and the space requirement for the creation of computation is reduced, due to which stripe structures can be constructed on GPUs, and work efficiency is improved. Experimental results show that, compared with the original method, the new method significantly accelerates the construction of stripe structures, even by over 40 times. Moreover, it has a faster detection speed and can handle dynamic polygons more efficiently.
    Available online:  February 19, 2025 , DOI: 10.13328/j.cnki.jos.007296
    Abstract:
    Offline reinforcement learning has yielded significant results in tasks with continuous and intensive rewards. However, since the training process does not interact with the environment, the generalization ability is reduced, and the performance is difficult to guarantee in a discrete and sparse reward environment. The diffusion model combines the information in the neighborhood of the sample data with noise addition to generate actions that are close to the distribution of the sample data, which strengthens the learning and generalization ability of the agents. To this end, offline reinforcement learning with diffusion models and expectation maximization (DMEM) is proposed. The method updates the objective function by maximizing the expectation of the maximum likelihood logarithm to make the strategy more generalizable. Additionally, the diffusion model is introduced into the strategy network to utilize the diffusion characteristics to enhance the ability of the strategy to learn data samples. Meanwhile, the expectile regression is employed to update the value function from the perspective of high-dimensional space, and a penalty term is introduced to make the evaluation of the value function more accurate. DMEM is applied to a series of tasks with discrete and sparse rewards, and experiments show that DMEM has a large advantage in performance over other classical offline reinforcement learning methods.
    Available online:  February 19, 2025 , DOI: 10.13328/j.cnki.jos.007297
    Abstract:
    In recent years, as an algorithm for identifying bug-introducing changes, SZZ has been widely employed in just-in-time software defect prediction. Previous studies show that the SZZ algorithm may mislabel data during data annotation, which could influence the dataset quality and consequently the performance of the defect prediction model. Therefore, researchers have made improvements to the SZZ algorithm and proposed multiple variants of SZZ. However, there is no empirical study to explore the effect of data annotation quality by SZZ on the performance and interpretability of just-in-time defect prediction for mobile APP. To investigate the influence of mislabeled changes by SZZ on just-in-time defect prediction for mobile APP, this study conducts an extensive and in-depth empirical comparison of four SZZ algorithms. Firstly, 17 large-scale mobile APP projects are selected from the GitHub repository, and software metrics are extracted by adopting the PyDriller tool. Then, B-SZZ (original SZZ), AG-SZZ, MA-SZZ, and RA-SZZ are employed for data annotation. Then, the just-in-time defect prediction models are built with random forest, naive Bayes, and logistic regression classifiers based on the time-series data partitioning. Finally, the performance of the models is evaluated by traditional measures of AUC, MCC, and G-mean, and effort-aware measures of F-measure@20% and IFA, and a statistical significance test and interpretability analysis are conducted on the results by employing SKESD and SHAP respectively. By comparing the annotation performance of the four SZZ algorithms, the results are as follows. (1) The data annotation quality conforms to the progressive relationship among SZZ variants. (2) The mislabeled changes by B-SZZ, AG-SZZ, and MA-SZZ can cause performance reduction of AUC and MCC of different levels, but cannot lead to performance reduction of G-mean. (3) B-SZZ is likely to cause a performance reduction of F-measure@20%, while B-SZZ, AG-SZZ, and MA-SZZ are unlikely to increase effort during code inspection. (4) In terms of model interpretation, different SZZ algorithms will influence the three metrics with the largest contribution during the prediction, and the la metric has a significant influence on the prediction results.
    Available online:  January 24, 2025 , DOI: 10.13328/j.cnki.jos.007300
    Abstract:
    Existing adversarial example detection methods based on image transformation employ the characteristic that the image transformation can significantly change the feature distribution of adversarial examples but slightly change the feature distribution of benign examples. Adversarial examples can be detected by calculating the feature distance before and after image transformation. However, with the deepening research on adversarial attacks, researchers pay more attention to enhancing the robustness of adversarial examples, so that some attacks can be “immune” to the effect exerted by image transformation. Existing methods are difficult to detect robust adversarial examples effectively. This paper observes that the existing adversarial examples are too robust, and the feature distribution distance of robust adversarial examples under image transformation is much smaller than that of benign examples, which is not consistent with the feature distribution laws of benign examples. Based on this key observation, this study proposes a dual-threshold adversarial example detection based on image transformation, which sets a lower threshold combining existing single-threshold methods to form a dual-threshold detection interval. An example whose feature distribution is not within the dual-threshold detection interval will be judged as an adversarial example. Additionally, this study conducts extensive experiments on VGG19, DenseNet, and ConvNeXt models for image classification. The results show that the proposed approach is compatible with the detection ability of existing single-threshold detection schemes, and yields outstanding detection performance against robust adversarial examples.
    Available online:  January 24, 2025 , DOI: 10.13328/j.cnki.jos.007301
    Abstract:
    Scalar multiplication is the core operation in traditional elliptic curve cryptography (ECC). Scalar representations determine the iterations in scalar multiplication algorithms, which directly affect the security and efficiency of the algorithms. This study proposes two new scalar representation algorithms. One algorithm is ordered window width non-adjacent form (OWNAF) which combines traditional window non-adjacent form with random key segmentation and can resist energy analysis attacks while yielding better efficiency. The other is called window joint regular form (wJRF), which is improved from the traditional joint regular form. The wJRF algorithm is applicable to multi-scalar multiplication algorithms, which can reduce computational costs and ensure sound security compared with the existing algorithms.
    Available online:  January 24, 2025 , DOI: 10.13328/j.cnki.jos.007291
    Abstract:
    Deep stochastic configuration network (DSCN) adopts a feedforward learning approach and randomly assigns node parameters based on a unique supervisory mechanism, which has universal approximation. However, in actual scenarios, the potential outliers and noise during data collection can negatively affect the classification results. To improve the performance of DSCN in solving binary classification problems, this study introduces the idea of intuitionistic fuzzy numbers based on DSCN and proposes an intuitionistic fuzzy deep stochastic configuration network (IFDSCN). Different from the standard DSCN, IFDSCN assigns an intuitionistic fuzzy number to each sample by calculating the sample membership and non-membership, and generates the optimal classifier by a weighting method to overcome the negative effect of noise and outliers on data classification. The experimental results on eight benchmark datasets show that compared to other learning models including the intuitionistic fuzzy twin support vector machine (IFTWSVM), kernel ridge regression (KRR), intuitionistic fuzzy kernel ridge regression (IFKRR), random vector functional link neural network (RVFL), and SCN, IFDSCN has better binary classification performance.
    Available online:  January 24, 2025 , DOI: 10.13328/j.cnki.jos.007259
    Abstract:
    High-quality training data is instrumental in pre-trained language models (PLMs), yet privacy concerns often preclude the centralized collection of data from many professional domains. Federated learning offers a solution by enabling model training while safeguarding data privacy. However, the limited resources of federated learning clients pose a challenge to the training of pre-trained language models. This study addresses this issue through several steps. Firstly, it defines the problem of completing model training with limited resources and explores strategies to balance computational and communication costs for optimizing training efficiency. Secondly, it introduces an efficient federated learning framework for BERT further pre-training and fine-tuning (FedBT). FedBT facilitates the training of the BERT model on federated learning clients, encompassing both further pre-training and downstream task fine-tuning. Depending on the application context, FedBT selectively trains key parameters of the BERT model at the clients, uploading only the updated parameters to the server for aggregation. This approach significantly reduces both computational and communication overhead during training. Finally, extensive experiments are conducted on datasets from multiple professional domains. Results demonstrate that FedBT reduces client-side computational costs to 34.31% and communication costs to 7.04% during further pre-training. In downstream task fine-tuning, it reduces client-side computational costs to 48.26% and communication costs to 20.19%. The accuracy achieved in both pre-training and downstream task fine-tuning is comparable to traditional federated learning methods that train the entire model.
    Available online:  January 24, 2025 , DOI: 10.13328/j.cnki.jos.007267
    Abstract:
    Bug triaging is the process of assigning bug reports to developers suitable for resolving the reported bugs, ensuring timely fixes. Current research in bug triaging mainly focuses on the text classification of bug reports. However, according to the Pareto principle, the data distribution of bug reports used for classification is unbalanced, which may lead to ineffective triaging for inactive developers. Additionally, existing classification models often neglect to model developers and struggle to capture the correlations between bugs and developers, affecting the efficiency of bug triaging. To address these issues, this study proposes a collaborative bug triaging method based on multimodal fusion (CBT-MF). This method first preprocesses bug reports and constructs a bug-developer bipartite graph. To mitigate the impact of the unbalanced distribution of bug fix records, the bipartite graph data is enhanced using K-means clustering and positive-negative sampling. To represent developer information, node features are extracted from the bipartite graph using a graph convolutional network model. Finally, correlations between bugs and developers are captured by matching inner products, and Bayesian personalized ranking (BPR) is utilized for bug report recommendation and triaging. Comprehensive experiments conducted on publicly available datasets demonstrate that CBT-MF outperforms several state-of-the-art methods in bug triaging.
    Available online:  January 24, 2025 , DOI: 10.13328/j.cnki.jos.007269
    Abstract:
    Rollup is an emerging off-chain transaction processing solution for blockchains. With the continuous development of applications, the need for interoperability among different types of Rollups is increasingly growing. Existing cross-Rollup interoperability solutions typically rely on third-party service providers to assist in completion, which brings about security risks such as new trust assumptions and single-point-of-failure issues. Completing interoperability among Rollups based on the native chain does not require introducing new trust assumptions, but will consume the computing and storage resources of the native chain, reduce the transaction throughput of the native chain, and thus seriously affect the performance of cross-Rollup. Based on this, this study proposes a cross-Rollup mechanism based on a native blockchain. By aggregating and processing transactions in batches, it effectively reduces the on-chain average computation and storage resource costs of individual transactions. Specifically, a transaction validity proof scheme based on zero-knowledge proof is proposed to significantly reduce the on-chain computation overhead of transaction validity verification. A transaction storage scheme based on index table data compression is proposed, reducing the average on-chain storage overhead of cross-Rollup transactions. An aggregation scale balance adjustment algorithm is proposed, which obtains the optimal aggregation scale, achieving a balance between on-chain resource consumption and processing latency. Finally, this study validates the proposed solution through experiments. The experimental results demonstrate that under the condition of complete trustlessness, the proposed solution reduces on-chain computing and storage overheads while achieving a balance between on-chain resource consumption and processing latency. Moreover, compared to existing cross-Rollup solutions, the proposed solution exhibits good system throughput.
    Available online:  January 15, 2025 , DOI: 10.13328/j.cnki.jos.007264
    Abstract:
    Mainstream methods for scene text detection often use complex networks with plenty of layers to improve detection accuracy, which requires high computational costs and large storage space, thus making them difficult to deploy on embedded devices with limited computing resources. Knowledge distillation assists in training lightweight student networks by introducing soft target information related to teacher networks, thus achieving model compression. However, existing knowledge distillation methods are mostly designed for image classification and extract the soft probability distributions from teacher networks as knowledge. The amount of information carried by such methods is highly correlated with the number of categories, resulting in insufficient information when directly applied to the binary classification task in text detection. To address the problem of scene text detection, this study introduces a novel concept of information entropy and proposes a knowledge distillation method based on mask entropy transfer (MaskET). MaskET combines information entropy with traditional knowledge distillation methods to increase the amount of information transferred to student networks. Moreover, to eliminate the interference of background information in images, MaskET only extracts the knowledge within the text area by adding mask operations. Experiments conducted on six public benchmark datasets, namely ICDAR 2013, ICDAR 2015, TD500, TD-TR, Total-Text and CASIA-10K, show that MaskET outperforms the baseline model and other knowledge distillation methods. For example, MaskET improves the F1 score of MobileNetV3-based DBNet from 65.3% to 67.2% on the CASIA-10K dataset.
    Available online:  January 15, 2025 , DOI: 10.13328/j.cnki.jos.007286
    Abstract:
    Social network link prediction can help to reveal the potential connections between network nodes, and has important practical application value in friend recommendation and cooperation prediction. However, existing link prediction algorithms ignore the medium and long-term development trend of social network time series, and do not consider the interaction relationship between nodes in the network from a long-term perspective. To address the above-mentioned problems, a spatiotemporal attention-based multi-granularity link prediction algorithm is proposed, which can integrate the spatiotemporal features of social network time series with different granularities to improve the accuracy of link prediction. Firstly, the weights of the social network snapshot graph are constructed with the time decay function, and a graph-weighted moving average strategy is proposed to generate social network time series with different granularities reflecting short-term, medium-term, and long-term trends. Then, a neural network based on the multi-head attention mechanism is used to extract the global temporal features of social network sequences. Next, the historical interaction information of nodes within social network sequences is combined, and the neural network based on the mask attention mechanism is used to adaptively construct the network topology from a long-term perspective to dynamically adjust the interactions between nodes and is combined with graph convolutional network to model spatial information. Finally, the fusion attention neural network is proposed to extract useful short-term, medium-term and long-term information from short-term, medium-term and long-term spatiotemporal features, and perform feature fusion to accurately predict the future links of social networks. Experimental comparisons with seven existing link prediction algorithms on four social network public datasets confirm the effectiveness and superiority of the proposed method.
    Available online:  January 08, 2025 , DOI: 10.13328/j.cnki.jos.007285
    Abstract:
    Face anti-spoofing is a powerful guarantee for the practical security of facial recognition technology. However, the constant evolution of live attack methods poses significant challenges to existing detection methods. To address the increasing number of unknown scenarios and attack methods, a two-stream face anti-spoofing model based on visual attention and domain feature fusion is proposed. First, a visual attention-based feature extraction module is proposed to strengthen the model’s capacity to extract content features based on global information. Second, a novel style feature fusion module is designed to optimize the feature representation of the sample by fusing content features with low-level textural style features. Third, a feature mapping strategy based on the Siamese network is developed and the contrast loss function is modified to improve the model robustness and avoid easy gradient oscillation during training, respectively. Furthermore, domain adversarial training (DAT) is used to reduce the sensitivity of the model to differences between sample data domains and further improve its generalization. Extensive experimental results verify the generality and strong robustness of the proposed method, demonstrating that it outperforms existing models in cross-domain performance on mainstream datasets.
    Available online:  January 08, 2025 , DOI: 10.13328/j.cnki.jos.007265
    Abstract:
    Image-level weakly supervised semantic segmentation usually uses convolutional neural networks (CNNs) to generate class activation maps to accurately locate targets. However, CNNs have a limited capacity to perceive global information, which results in excessively narrow foregrounds. Recently, Transformer-based weakly supervised semantic segmentation has utilized self-attention mechanisms to capture global dependencies, addressing the inherent defects of CNNs. Nevertheless, the initial class activation map generated by a Transformer often introduces a lot of background noise around the target area, resulting in unsatisfactory performance if used directly. This study comprehensively utilizes both class-to-patch and patch-to-patch attention generated by a Transformer to optimize the initial class activation map. At the same time, a semantic modulation strategy is designed to correct errors in the class-to-patch attention, using the semantic context information of the patch-to-patch attention. Finally, a class activation map that accurately covers more target areas is obtained. On this basis, a novel model for weakly supervised semantic segmentation based on a Transformer is constructed. The mIoU of the proposed method reaches 72.7% and 71.9% on the PASCAL VOC 2012 validation and test sets, respectively, and 42.3% on the MS COCO 2014 validation set, demonstrating that the proposed method achieves improved performance in weakly supervised semantic segmentation.
    Available online:  January 08, 2025 , DOI: 10.13328/j.cnki.jos.007258
    Abstract:
    Molecular dynamics simulation plays an important role in material simulation, biopharmaceuticals, and other areas. In recent years, the development of AI-for-Science has greatly improved the accuracy of neural network force fields in predicting energy, force, and other properties, compared to traditional methods using potential functions. Neural network force field models may challenges such as hyperparameter settings and gradient explosion when trained by the first-order method. Based on an optimizer named reorganized layer extended Kalman filtering, this study provides several strategies to avoid hyperparameters and offers theoretical evidence for preventing gradient explosion. This study also proposes an alternate training method and analyzes its accuracy gains and time costs. A performance model of block thresholding is proposed, and its effectiveness is explored. Additionally, the property of preventing gradient explosion is proven, and the optimizer’s robustness with respect to activation functions and weight initialization is validated. Four typical neural network force field models are tested on 11 representative datasets. Results show that when the proposed optimizer and the first-order optimizer achieve comparable prediction accuracy, the proposed optimizer is 8 to 10 times faster than the first-order optimizer. It is believed that the proposed training method can inspire other AI-for-Science applications.
    Available online:  December 31, 2024 , DOI: 10.13328/j.cnki.jos.007262
    Abstract:
    Commonsense knowledge is usually not explicitly expressed in natural languages but is implicitly understood in human cognition. Providing machines with commonsense knowledge has been a longstanding aim in artificial intelligence. Initially, this study manually constructs a high-precision, event-centric commonsense knowledge graph (ECKG) for seed events in Chinese. It contains 26 606 commonsense event triples encompassing causal, temporal, conditional, and other common event relationships. Although the constructed ECKG holds considerable value, its limited scale curtails practical applications. Besides, large-scale event commonsense knowledge graphs are rare in current studies. To overcome these challenges, this paper uses large language models from the GPT series to expand the above-mentioned three event relationships and sub-events of the proposed ECKG. The expansion method involves three primary steps. Firstly, specific prompts for event knowledge (ek-prompts) are designed by combining the events in the ECKG with four relationships, and GPT-4-Turbo is used to generate corresponding event triples. Secondly, the triples of the ECKG are integrated with accurate triples obtained by ek-prompts to create a specialized dataset. Additionally, GPT-3.5-Turbo is fine-tuned on the dataset to generate more specific event triples and validate the accuracy of new triples. Lastly, by analyzing the similarities among events in the ECKG and implementing an event-sharing mechanism, similar events within the same relationship are interconnected, ensuring consistency across similar event triples. Experimental results show that the newly acquired triples are of high quality, particularly those of the temporal relationships, with an accuracy rate of 98.2%. Ultimately, the proposed expansion method appends 2 433 012 commonsense event triples to the original ECKG, significantly expanding its scale and providing more commonsense knowledge for many applications in artificial intelligence.
    Available online:  December 31, 2024 , DOI: 10.13328/j.cnki.jos.007241
    Abstract:
    The minimum load coloring problem (MLCP) is an important NP-complete problem arising from wavelength division multiplexing (WDM), a technology used for building optical communication networks. The solutions to NP-complete problems grow exponentially as the size of the problems expands, so heuristic algorithms are often used to solve such problems. Analysis of research at home and abroad shows that among the existing heuristic algorithms for solving the MLCP, local search algorithms exhibit the best performance. This study proposes two optimization strategies to overcome the limitations of existing local search algorithms in data preprocessing and neighborhood space search. First, during data preprocessing, a one-degree vertex rule is proposed to reduce the size of data and thus reduce the search space of the MLCP. Second, in the search phase of the algorithm, a strategy termed two-stage best from multiple selections (TSBMS) is proposed to help local search algorithms efficiently select a high-quality neighborhood solution for neighborhood space with different sizes, which effectively improves the performance of local search algorithms for processing data of different sizes. This optimized local search algorithm is named IRLTS. Seventy-four classic test instances are adopted to validate the effectiveness of the IRLTS algorithm. Experimental results demonstrate that the IRLTS algorithm outperforms the three best local search algorithms on most test instances in terms of both optimal and average solutions. Furthermore, the effectiveness of the proposed strategy is validated through experiments, and the influence of key parameters on the IRLTS algorithm is analyzed.
    Available online:  December 31, 2024 , DOI: 10.13328/j.cnki.jos.007244
    Abstract:
    With the rapid development of Transformer-based large models, computing power has gradually become a bottleneck in the development of this field. Research hotspots rely on how to accelerate and optimize the training performance of large language models based on the structural characteristics of accelerator hardware. This study proposes and implements MTTorch, a PyTorch extension library for the CPU+DSP heterogeneous architecture, which is applicable to the MT-3000 accelerator chip of the new generation of the Tianhe supercomputer. The core of MTTorch is a multi-core parallel operator library that vectorizes and optimizes the core operators during the training of Transformer-based models. Additionally, this study innovatively proposes a high-performance reduction algorithm and a ping-pong algorithm for multi-core DSP, significantly improving the computational performance of the operators. MTTorch also has good generality as it can be loaded as a dynamic link library for different versions of PyTorch without changing the native implementation of PyTorch. Extensive experiments show that the core operators implemented in this study have excellent performance on MT-3000 chip, achieving 8 times acceleration on a single DSP cluster. Using MTTorch for training tasks on multiple nodes achieves nearly linear acceleration, greatly improving the training efficiency of Transformer-based models on MT-3000 chip.
    Available online:  December 31, 2024 , DOI: 10.13328/j.cnki.jos.007247
    Abstract:
    As large language models (LLMs) continue to evolve, they have shown impressive performance in open-domain tasks. However, they exhibit limited effectiveness in domain-specific question-answering due to a lack of domain-specific knowledge. This limitation has attracted widespread attention from researchers in the field. Current research attempts to infuse domain knowledge into LLMs through a retrieve-answer approach to enhance their performance. However, this method often retrieves additional, irrelevant data, leading to a degradation in LLM effectiveness. Therefore, this study proposes a method for knowledge graph question answering based on the relevance of knowledge. This method focuses on distinguishing essential knowledge required for specific questions from noisy data. Under a framework of retrieval-relevance assessment-answering, this method guides LLMs to select appropriate knowledge for accurate answers. Moreover, this study introduces a dataset named Mecha-QA for question-answering using a mechanical domain knowledge graph, covering traditional machinery manufacturing and additive manufacturing, to promote research that integrates LLMs with knowledge graph question answering in this field. To validate the effectiveness of the proposed method, experiments are conducted on the Aero-QA dataset in the aerospace domain and the Mecha-QA dataset. Results demonstrate that the proposed method significantly improves the performance of LLMs in knowledge graph question answering in vertical domains.
    Available online:  December 31, 2024 , DOI: 10.13328/j.cnki.jos.007249
    Abstract:
    Traditional detection and defense mechanisms for distributed denial-of-service (DDoS) attacks require traffic mirroring, collection, and centralized remote analysis, which introduces extra performance overhead and fails to achieve real-time protection in high-performance networks. With the development of network devices such as programmable switches, the programmable data plane has emerged as a solid foundation for achieving high-performance DDoS attack detection. However, existing detection methods based on the programmable data plane cannot guarantee accuracy and are difficult to deploy directly in programmable switches (such as Intel Tofino) due to programming constraints. To this end, this paper proposes a programmable switch-based mechanism for detecting and defending against DDoS attacks. First, the mechanism uses the difference between the entropy of source and destination addresses to determine whether DDoS attacks occur. When DDoS attacks occur, a traffic filtration mechanism based on the difference in counts between source and destination address will defend against DDoS attacks in real time. Experimental results indicate that the proposed mechanism effectively identifies and defends against DDoS attacks. Compared with the benchmark method, the accuracy of this mechanism in window-level attack detection is increased by 17.75% on average, and the accuracy of packet-level attack filtration is increased by 3.7% on average.
    Available online:  December 31, 2024 , DOI: 10.13328/j.cnki.jos.007255
    Abstract:
    As the Internet of Things and mobile Internet technologies continue to advance, a wide range of mobile devices are connected to the Internet. To identify and authenticate these devices, it is necessary to verify the digital signatures they submit. However, many mobile devices have limited computing power and typically use software modules to store keys locally or on smart chips, which increases the risk of key exposure. To avoid this risk, threshold signatures are commonly employed in real-world applications. These signatures rely on multi-party cooperation to decentralize risks and enhance device availability. The SM2 digital signature algorithm, an elliptic curve public key cryptographic algorithm developed independently by China, was adopted as the national cryptography standard in 2016. It finds extensive use in various sectors including government agencies, financial institutions, and electronic authentication service providers. While there has been interest in constructing SM2 threshold signatures with high availability, there are still limited schemes available, and participant weights have not been adequately considered. This study proposes a flexible SM2 weighted threshold signature scheme. In this scheme, signers are assigned different weights, and multiple signers collaborate to generate a valid signature. The key of the SM2 digital signature is divided based on the weighted threshold secret sharing of the Chinese remainder theorem. Participants do not acquire a signing key only by meeting the threshold value. They have to meet the corresponding secret threshold t and the reconstruction threshold T by calculating the sum of the weights of participants to obtain part of the key information or recover the signing key. During secret segmentation, the private signing key of the SM2 digital signature algorithm is transformed to complete the inversion of the SM2 key during the signing stage. Finally, the proposed scheme is compared with other schemes such as SM2 threshold signatures and joint SM2 signatures. The proposed scheme not only reduces computational overhead but also enhances the functionality of the SM2 signature.
    Available online:  December 31, 2024 , DOI: 10.13328/j.cnki.jos.007256
    Abstract:
    In recent years, streaming graph analysis has gained increasing importance in both research and industry. A streaming graph is a continuous sequence of edges received from a data source at a high speed. Those edges form a dynamic graph that is continuously changing. Various analyses can be performed on streaming graphs. Among them, triangle counting is one of the most basic operations. However, the large volume and high update speed of streaming graphs make it inefficient to count triangles accurately on them. It is also unnecessary, as most applications for triangle counting can tolerate small errors. Therefore, approximate triangle counting in streaming graphs has been a hot research topic. This study focuses on sample-based approximate triangle counting in streaming graphs with a sliding window model. Sliding window models focus on the most recent edges in a streaming graph and consider older edges as expired. They are widely applied in various industrial scenarios and research. This study combines a count-before-sample strategy with the state-of-the-art approximate triangle counting algorithm and designs a set of novel strategies to deal with the difficulty brought by edge expiration. Extensive experiments are conducted on real-world datasets to evaluate the proposed algorithm. Results prove that the algorithm decreases the estimation error of the state-of-the-art method by more than 70%.
    Available online:  December 25, 2024 , DOI: 10.13328/j.cnki.jos.007257
    Abstract:
    Due to the difficulty in determining the structure and training the parameters of recurrent neural network (RNN), an incremental-construction random RNN (IRRNN) is proposed to realize the incremental construction of RNN structures and the random learning of network parameters. The IRRNN establishes an incremental constraint mechanism for hidden nodes and uses the candidate node pool strategy to realize the optimal selection of hidden nodes, avoiding the blindness of random construction of the network. Two incremental random learning methods, termed IR-1 and IR-2, are designed for local and global optimization of model parameters. Additionally, their universal approximation property is proved. Meanwhile, the dynamic property of the IRRNN model is studied to analyze its generalization performance. Experiments validated that the IRRNN exhibits favorable dynamic properties, compactness, and accuracy.
    Available online:  December 25, 2024 , DOI: 10.13328/j.cnki.jos.007260
    Abstract:
    Fuzz testing automatically uncovers vulnerabilities in software. However, existing fuzz testing tools for network protocols are not able to fully explore their internal state space, resulting in limited coverage. Finite state machines comprehensively model the implementation of network protocols to provide an in-depth understanding of their system behavior and internal state space. This study proposes a fuzz testing method for network protocols based on finite state machines. It focuses on the commonly used TLS protocol, using finite state machine learning to model the implementation of the TLS protocol, reflecting the protocol’s internal state space and system behavior. Subsequently, guided by finite state machines, the fuzz testing of the TLS protocol achieves deeper depth and broader code coverage. This study also implements a prototype system, SNETFuzzer, which outperforms existing methods in important metrics such as coverage in a series of comparative experiments. SNETFuzzer successfully discovers multiple vulnerabilities, including two new ones, demonstrating its practicality and effectiveness.
    Available online:  December 25, 2024 , DOI: 10.13328/j.cnki.jos.007261
    Abstract:
    Causal discovery aims to uncover causal relationships among variables from observational data, serving as a crucial method for understanding various phenomena and changes in natural, social, and technological systems. A mainstream approach for causal discovery is a constraint-based algorithm, which determines the causal structure among variables by examining their conditional independence. However, data collection in the real world often faces challenges such as limited sample sizes and high variance among nodes due to resource or technical constraints. In these scenarios, the accuracy of conditional independence tests is greatly affected, leading to erroneous deletion of causal edges of some variables in learned causal graphs, thereby impacting the accuracy of the algorithm’s output. To address this issue, this study proposes an enhanced method for conditional independence testing, which focuses on minimizing the interference of irrelevant external noise on the variables being tested, thereby improving the accuracy of conditional independence tests. Based on this enhanced method, the paper introduces a structure learning algorithm based on heuristic search, which iteratively searches for mistakenly deleted causal edges on a graph with an initial structure. This algorithm reconstructs the causal structure by combining enhanced conditional independence tests with score optimization. Experimental results show that, compared to existing methods, the proposed algorithm significantly improves both the F1 score and the structural Hamming distance (SHD) on simulated, Bayesian network, and real data, demonstrating its ability to more accurately reveal underlying causal structures in observational data with limited samples and high-variance nodes.
    Available online:  December 25, 2024 , DOI: 10.13328/j.cnki.jos.007274
    Abstract:
    In the industrial field, currently used access permission control technologies are increasingly struggling to address access control issues of distributed systems deployed in wide-area internet scenarios. This situation is particularly exacerbated when dealing with large-scale information systems distributed across multiple trust domains, thereby engendering an escalating proliferation of vulnerabilities. Consensus-based access control policy sharing technologies can facilitate the secure and expeditious attainment of consensus decisions among access control nodes deployed across trust domains. This study first proposes a consensus-based access permission control model for multiple nodes and presents the Super-Dumbo consensus algorithm for access control engines, which features robust security and high performance. Super-Dumbo surmounts the performance bottlenecks of Dumbo2 by optimizing the design of key steps encompassing message broadcasting, random coin toss procedures, and consensus algorithm constructs. Notably, it reduces computational overhead such as digital signature verification, thereby effectively enhancing bandwidth utilization. This achieves a substantial improvement in performance metrics, such as throughput and latency, aligning seamlessly with the performance prerequisites of the CBAC access control model, which demands low latency and high throughput from the underlying consensus algorithm.
    Available online:  December 11, 2024 , DOI: 10.13328/j.cnki.jos.007266
    Abstract:
    With the proliferation of massive data and the ever-growing demand for intelligent applications, ensuring data security has become a critical measure for enhancing data quality and realizing data value. The cloud-edge-client architecture has emerged as a promising technology for efficient data processing and optimization. Federated learning (FL), an efficient decentralized machine learning paradigm that can provide privacy protection for data, has garnered extensive attention from academia and industry in recent years. However, FL has demonstrated inherent vulnerabilities that render it highly susceptible to poisoning attacks. Most existing methods for defending against poisoning attacks rely on continuously updated space, but in practical scenarios, those methods may be less robust when facing flexible attack strategies and varied attack scenarios. Therefore, this study proposes FedDiscrete, a defense method for resisting poisoning attacks in cloud-edge FL (CEFL) systems. The key idea is to compute local rankings on the client side using the scores of network model edges to create discrete update space. To ensure fairness among clients participating in the FL task, this study also introduces a contribution metric. In this way, FedDiscrete can penalize potential attackers by allocating updated global rankings. Extensive experiments demonstrate that the proposed method exhibits significant advantages and robustness against poisoning attacks, and is applicable to both independent and identically distributed (IID) and non-IID scenarios, providing protection for CEFL systems.
    Available online:  December 11, 2024 , DOI: 10.13328/j.cnki.jos.007271
    Abstract:
    Cloud storage has become an important part of the digital economy as it brings great convenience to users’ data management. However, complex and diverse network environments and third parties that are not fully trusted pose great threats to users' privacy. To protect users’ privacy, data is usually encrypted before storage, but the ciphertext generated by traditional encryption techniques hinders subsequent data retrieval. Public-key encryption with keyword search (PEKS) technology can provide a confidential retrieval function while guaranteeing data encryption, but the traditional PEKS scheme is vulnerable to keyword guessing attacks due to the small number of common keywords. Public-key authenticated encryption with keyword search (PAEKS) introduces authentication technology based on PEKS, which can further improve security. However, most of the existing PAEKS schemes are designed based on foreign cryptographic algorithms, which do not meet the development needs of independent innovation of cryptography in China. This study proposes an SM9-PAEKS scheme, which can effectively improve user-side retrieval efficiency by redesigning algorithm structure and transferring time-consuming operations to a resource-rich cloud server. Scheme security is also proved under the random oracle model based on q-BDHI and Gap-q-BCCA1 security assumptions. Finally, theoretical analysis and experimental results show that compared with the optimal communication cost among similar schemes, SM9-PAEKS can reduce the total computational overhead by at least 59.34% with only 96 bytes of additional communication cost, and the computational overhead reduction of keyword trapdoor generation is particularly significant, about 77.55%. This study not only helps to enrich national security algorithm applications but also provides theoretical and technical support for data encryption and retrieval in cloud storage.
    Available online:  December 04, 2024 , DOI: 10.13328/j.cnki.jos.007251
    Abstract:
    Dynamic searchable symmetric encryption has attracted much attention because it allows users to securely search and dynamically update encrypted documents stored in a semi-trusted cloud server. However, most searchable symmetric encryption schemes only support single-keyword search, failing to achieve conjunctive search while protecting forward and backward privacy. In addition, most schemes are not robust, which means that they cannot handle irrational update requests from a client, such as adding or deleting a certain keyword/file identifier pair, or deleting non-existent keywords/file identifier pairs. To address these challenges, this study proposes a robust scheme for conjunctive dynamic symmetric searchable encryption that preserves both forward and backward privacy, called RFBC. In this scheme, the server constructs two Bloom filters for each keyword, which are used to store the relevant hash values of the keyword/file identifier pair to be added and deleted, respectively. When the client sends update requests, the server uses the two Bloom filters to determine and filter irrational update requests, so as to guarantee the robustness of the scheme. In addition, by combining the status information of the lowest frequency keywords among multiple keywords, the Bloom filters, and the update counter, RFBC realizes conjunctive search by filtering out file identifiers that do not contain the rest keywords. Finally, by defining the leakage function, RFBC is proved to be forward private and Type-III backward private through a series of security analyses. Experimental results show that compared with related schemes, RFBC greatly improves computation and communication efficiency. Specifically, the computational overhead of update operations in RFBC is about 28% and 61.7% of that in ODXT and BDXT, respectively. The computational overhead of search operations in RFBC is about 21.9% and 27.3% of that in ODXT and BDXT, respectively. The communication overhead of search operations in RFBC is about 19.7% and 31.6% of that in ODXT and BDXT, respectively. Moreover, as the proportion of irrational updates gradually increases, RFBC exhibits significantly higher improvement in search efficiency compared to both BDXT and ODXT.
    Available online:  December 04, 2024 , DOI: 10.13328/j.cnki.jos.007252
    Abstract:
    Code comment generation is an important research task in software engineering. Mainstream methods for comment generation train deep learning models to generate comments, relying on metrics such as BLEU to evaluate comment quality on open code comment datasets. These evaluations mainly reflect the similarity between generated comments and manual reference comments in the datasets. However, the quality of the manual reference comments in open comment datasets varies widely, which leads to more and more doubts about the effectiveness of these metrics. Therefore, for code comment generation tasks, there is an urgent need for direct and effective methods to evaluate code comment quality. Such methods can improve the quality of open comment datasets and enhance the evaluation of generated comments. This study conducts research and analysis on existing quantifiable methods for code comment quality evaluation and applies a set of multi-dimensional metrics to directly evaluate the quality of code comments in mainstream open datasets, comments generated by traditional methods, and comments generated by ChatGPT. The study reveals the following findings. 1) The quality of code comments in mainstream open datasets needs improvement, with issues such as inaccuracy, poor readability, excessive simplicity, and a lack of useful information. 2) Comments generated by traditional methods are more lexically and semantically similar to the code but lack information that is more useful to developers, such as high-level intentions of the code. 3) One important reason for the low BLEU scores of generated comments is the large number of poor-quality reference comments in datasets, which lack relevance with the code or exhibit poor naturalness. These kinds of reference comments should be filtered or improved. 4) Comments generated by LLMs like ChatGPT are rich in content but tend to be lengthy. Their quality evaluation needs to be tailored to developer intentions and specific scenarios. Based on these findings, this study provides several suggestions for future research in code comment generation and comment quality evaluation.
    Available online:  December 04, 2024 , DOI: 10.13328/j.cnki.jos.007253
    Abstract:
    Software concept drift means that the structure and composition of the same type of software will change over time. In malware classification, concept drift means that the structure and composition characteristics of malware samples from the same family can change over time. This will cause a decline in the performance of fixed-mode malware classification algorithms over time. Existing methods for static malware classification experience significant performance degradation when faced with concept drift scenarios, making it difficult to meet the needs of practical applications. To address this problem, given the commonalities between natural language understanding and binary byte stream analysis, a highly accurate and robust malware classification method is proposed based on BERT and a custom autoencoder architecture. This method extracts execution-oriented malware opcode sequences through disassembly analysis to reduce redundant information. Then, it uses BERT to understand the contextual semantics of the sequences and perform vector embedding to effectively understand the deep program semantics of the malware samples. It also screens effective task-related features through the geometric median subspace projection and bottleneck autoencoders. Finally, a classifier composed of fully connected layers is used to output the classification results. The practical effectiveness of the proposed method is validated through comparative experiments with nine state-of-the-art malware classification methods in both normal and concept drift scenarios. Experimental results show that the proposed method achieves an F1 score of 99.49% in normal scenarios, outperforming those nine methods. Moreover, in concept drift scenarios, the F1 score is improved by 10.78% to 43.71% compared to the nine methods.
    Available online:  November 18, 2024 , DOI: 10.13328/j.cnki.jos.007238
    Abstract:
    As the scale of cities continues to increase, urban transportation systems are facing more and more challenges, such as traffic congestion and traffic safety. Traffic simulation is a method to solve urban traffic problems. It uses virtual and real computing technologies to process real-time traffic data and optimize urban traffic efficiency. It is an important method to achieve the parallel city theory in intelligent transportation. However, traditional computing systems often encounter problems such as insufficient computing resources and long simulation delays when running large-scale urban traffic simulations. To solve the above problems, this study proposes a parallel algorithm for traffic simulation of parallel cities based on the parallel city theory and the heterogeneous architecture of China’s new-generation supercomputer, Tianhe. This algorithm accurately simulates traffic elements such as vehicles, roads, and traffic signals, and applies methods such as road network division, parallel driving of vehicles, and parallel control of signal lights to achieve high-performance traffic simulation. The algorithm runs on Tianhe, a supercomputing platform with 16 nodes and more than 25 000 cores, and simulates real traffic scenarios involving 2.4 million vehicles, 7 797 intersections, and 170 000 lanes within the Fifth Ring Road in Beijing. Compared with traditional single-node simulation, the proposed algorithm reduces the simulation time of each step from 2.21 s to 0.37 s, achieving nearly 6 times acceleration. An urban traffic simulation with a scale of one million vehicles has been successfully implemented on a domestic heterogeneous supercomputing platform.
    Available online:  November 18, 2024 , DOI: 10.13328/j.cnki.jos.007239
    Abstract:
    Traffic flow prediction is an important foundation and a hot research direction for traffic management in intelligent transportation systems (ITS). Traditional methods for traffic flow prediction typically rely on a large amount of high-quality historical observation data to achieve accurate predictions, but the prediction accuracy significantly decreases in more common scenarios with data scarcity in traffic networks. To address this problem, a transfer learning model is proposed based on spatial-temporal graph convolutional networks (TL-STGCN), which leverages traffic flow features from a source network with abundant data to assist in predicting future traffic flow in a target network with data scarcity. Firstly, a spatial-temporal graph convolutional network based on time attention is employed to learn the spatial and temporal features of the traffic flow data in both the source and target networks. Secondly, domain-invariant spatial-temporal features are extracted from the representations of the two networks using transfer learning techniques. Lastly, these domain-invariant features are utilized to predict the future traffic flow in the target network. To validate the effectiveness of the proposed model, experiments are conducted on real-world datasets. The results demonstrate that TL-STGCN outperforms existing methods by achieving the highest accuracy in mean absolute error, root mean square error, and mean absolute percentage error, which proves that TL-STGCN provides more accurate traffic flow predictions for scenarios with data scarcity in traffic networks.
    Available online:  September 30, 2024 , DOI: 10.13328/j.cnki.jos.007234
    [Abstract] (169) [HTML] (0) [PDF 3.97 M] (1127)
    Abstract:
    The minimum weakly connected dominating set problem is a classic NP-hard problem that has wide applications in various fields. This study proposes an efficient local search algorithm to solve this problem. The algorithm employs a method to construct an initial solution based on locked vertices and frequency feedback. This method ensures the inclusion of vertices that are certain or highly likely to be in the optimal solution, resulting in a high-quality initial solution. Furthermore, the study introduces a method to avoid cycling based on two-hop configuration checking, age properties, and tabu strategies. A perturbation strategy is also proposed to enable the algorithm to effectively escape from the local optimum. Additionally, effective vertex selection methods are presented to assist the algorithm in choosing vertices suitable for addition to or removal from the candidate solution by combining two scoring functions, Dscore and Nscore, with strategies for avoiding cycling. Finally, the proposed local search algorithm is evaluated on four benchmark test instances and compared with four state-of-the-art algorithms and the CPELX solver. Experimental results demonstrate that the proposed algorithm achieves better performance.
    Available online:  July 03, 2024 , DOI: 10.13328/j.cnki.jos.007220
    [Abstract] (105) [HTML] (0) [PDF 6.30 M] (1890)
    Abstract:
    Learned indexes are assisting or gradually replacing traditional index structures due to their low memory usage and high query performance. However, the online retraining caused by data updates makes it unable to adapt to the scenario of frequent data updates. To avoid index reconstruction due to frequent data updates without significantly increasing memory consumption, this study proposes an adaptive update-distribution-aware learned index named DRAMA. It uses an LSM-Tree-like delayed learning method to actively learn the characteristics of the data update distribution, approximate fitting techniques to quickly establish the update-distribution model, a model merging strategy to replace the frequent retraining, and a hybrid compression technique to reduce the memory usage of model parameters in the index. The index is constructed and validated on both real and synthetic datasets. The results show that, compared to traditional indexes and state-of-the-art learned indexes, the proposed index can effectively reduce query delay in a data update environment without additional memory consumption.
    Available online:  July 03, 2024 , DOI: 10.13328/j.cnki.jos.007222
    Abstract:
    Smart contracts are scripts running on the Ethereum blockchain capable of handling intricate business logic with most written in the Solidity. As security concerns surrounding smart contracts intensify, a formal verification method employing the modeling, simulation, and verification language (MSVL) alongside propositional projection temporal logic (PPTL) is proposed. A SOL2M converter is developed, facilitating semi-automatic modeling from the Solidity to MSVL programs. However, the proof of operational semantic equivalence of Solidity and MSVL is lacking. This study initially defines Solidity’s operational semantics using big-step semantics across four levels: semantic elements, evaluation rules, expressions, and statements. Subsequently, it establishes equivalence relations between states, expressions, and statements in Solidity and MSVL. In addition, leveraging the operational semantics of both languages, it employs structural induction to prove expression equivalence and rule induction to establish statement equivalence.
    Available online:  October 18, 2017
    [Abstract] (3000) [HTML] (0) [PDF 525.21 K] (6387)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,315-325页. 原文链接如下:https://doi.org/10.1145/3106237.3106242, 读者如需引用该文请标引原文出处。
    Available online:  October 18, 2017
    [Abstract] (2926) [HTML] (0) [PDF 352.38 K] (7248)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,303-314页. 原文链接如下:https://doi.org/10.1145/3106237.3106239, 读者如需引用该文请标引原文出处。
    Available online:  September 11, 2017
    [Abstract] (3499) [HTML] (0) [PDF 276.42 K] (4558)
    Abstract:
    GitHub, a popular social-software-development platform, has fostered a variety of software ecosystems where projects depend on one another and practitioners interact with each other. Projects within an ecosystem often have complex inter-dependencies that impose new challenges in bug reporting and fixing. In this paper, we conduct an empirical study on cross-project correlated bugs, i.e., causally related bugs reported to different projects, focusing on two aspects: 1) how developers track the root causes across projects; and 2) how the downstream developers coordinate to deal with upstream bugs. Through manual inspection of bug reports collected from the scientific Python ecosystem and an online survey with developers, this study reveals the common practices of developers and the various factors in fixing cross-project bugs. These findings provide implications for future software bug analysis in the scope of ecosystem, as well as shed light on the requirements of issue trackers for such bugs.
    Available online:  June 21, 2017
    [Abstract] (3509) [HTML] (0) [PDF 169.43 K] (4389)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在IEEE Transactions on Software Engineering 2017 已录用待发表. 原文链接如下:http://ieeexplore.ieee.org/document/7792694, 读者如需引用该文请标引原文出处。
    Available online:  June 13, 2017
    [Abstract] (4764) [HTML] (0) [PDF 174.91 K] (4823)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 39th International Conference on Software Engineering, Pages 27-37, Buenos Aires, Argentina — May 20 - 28, 2017, IEEE Press Piscataway, NJ, USA ?2017, ISBN: 978-1-5386-3868-2 原文链接如下:http://dl.acm.org/citation.cfm?id=3097373, 读者如需引用该文请标引原文出处。
    Available online:  January 25, 2017
    [Abstract] (3602) [HTML] (0) [PDF 254.98 K] (4361)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 871-882. DOI: https://doi.org/10.1145/2950290.2950364 原文链接如下:http://dl.acm.org/citation.cfm?id=2950364, 读者如需引用该文请标引原文出处。
    Available online:  January 18, 2017
    [Abstract] (4105) [HTML] (0) [PDF 472.29 K] (4558)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Pages 133—143, Seattle WA, USA, November 2016. 原文链接如下:http://dl.acm.org/citation.cfm?id=2950327, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017
    [Abstract] (3822) [HTML] (0) [PDF 293.93 K] (4058)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE'16), 810 – 821, November 13 - 18, 2016. 原文链接如下:https://doi.org/10.1145/2950290.2950310, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017
    [Abstract] (4160) [HTML] (0) [PDF 244.61 K] (4649)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE 2016, 原文链接如下:http://dl.acm.org/citation.cfm?doid=2950290.2950313, 读者如需引用该文请标引原文出处。
    Available online:  December 12, 2016
    [Abstract] (3674) [HTML] (0) [PDF 358.69 K] (4472)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE'16会议上Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 原文链接如下:http://dl.acm.org/citation.cfm?id=2950340, 读者如需引用该文请标引原文出处。
    Available online:  September 30, 2016
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在ASE2016会议上http://ase2016.org/ 原文链接如下:http://dl.acm.org/citation.cfm?id=2970366 读者如需引用该文请标引原文出处。
    Available online:  September 09, 2016
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 俊杰的文章发表在ASE2016会议上,http://ase2016.org/。 原文链接如下:http://dl.acm.org/citation.cfm?doid=2970276.2970300 请读者标引时请引注原文出处。
    Available online:  September 07, 2016
    Abstract:
    CCF 软件工程专业委员会白晓颖教授(清华大学)推荐。 原文发表在 ASE 2016 Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering。 全文链接:http://dx.doi.org/10.1145/2970276.2970307。 重要提示:读者如引用该文时请标注原文出处。
    Available online:  August 29, 2016
    Abstract:
    CCF软件工程专业委员会白晓颖教授(清华大学)推荐。 该论文发表在ACM Transactions on Software Engineering and Methodology (TOSEM, Vol. 25, No. 2, Article 13, May 2016),被ICSE 2016主会邀请为“Journal first”报告, 全文参见http://dl.acm.org/citation.cfm?id=2876443。 论文作者包括北京大学的周明辉,马秀娟,张路和梅宏,以及田纳西大学的Audris Mockus。 重要提示:读者如引用该文时请标注原文出处。
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291
    [Abstract] (37761) [HTML] (0) [PDF 832.28 K] (83770)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2010,21(3):427-437
    [Abstract] (33348) [HTML] (0) [PDF 308.76 K] (41970)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2016,27(1):45-71 , DOI: 10.13328/j.cnki.jos.004914
    [Abstract] (30444) [HTML] (4856) [PDF 880.96 K] (35547)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2011,22(1):71-83 , DOI: 10.3724/SP.J.1001.2011.03958
    [Abstract] (30442) [HTML] (0) [PDF 781.42 K] (60910)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2008,19(1):48-61
    [Abstract] (28835) [HTML] (0) [PDF 671.39 K] (65258)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2009,20(5):1337-1348
    [Abstract] (28564) [HTML] (0) [PDF 1.06 M] (47840)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2009,20(2):271-289
    [Abstract] (27613) [HTML] (0) [PDF 675.56 K] (47594)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2005,16(1):1-7
    [Abstract] (22718) [HTML] (0) [PDF 614.61 K] (23981)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2010,21(8):1834-1848
    [Abstract] (21461) [HTML] (0) [PDF 682.96 K] (60914)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2004,15(3):428-442
    [Abstract] (20884) [HTML] (0) [PDF 1009.57 K] (19545)
    Abstract:
    With the rapid development of e-business, web applications based on the Web are developed from localization to globalization, from B2C(business-to-customer) to B2B(business-to-business), from centralized fashion to decentralized fashion. Web service is a new application model for decentralized computing, and it is also an effective mechanism for the data and service integration on the web. Thus, web service has become a solution to e-business. It is important and necessary to carry out the research on the new architecture of web services, on the combinations with other good techniques, and on the integration of services. In this paper, a survey presents on various aspects of the research of web services from the basic concepts to the principal research problems and the underlying techniques, including data integration in web services, web service composition, semantic web service, web service discovery, web service security, the solution to web services in the P2P (Peer-to-Peer) computing environment, and the grid service, etc. This paper also presents a summary of the current art of the state of these techniques, a discussion on the future research topics, and the challenges of the web services.
    2005,16(5):857-868
    [Abstract] (20013) [HTML] (0) [PDF 489.65 K] (33350)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2009,20(1):54-66
    [Abstract] (19996) [HTML] (0) [PDF 1.41 M] (53983)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2012,23(4):962-986 , DOI: 10.3724/SP.J.1001.2012.04175
    [Abstract] (19169) [HTML] (0) [PDF 2.09 M] (35942)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):32-45 , DOI: 10.3724/SP.J.1001.2012.04091
    [Abstract] (18910) [HTML] (0) [PDF 408.86 K] (34656)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2009,20(3):524-545
    [Abstract] (17620) [HTML] (0) [PDF 1.09 M] (25960)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2009,20(1):124-137
    [Abstract] (17252) [HTML] (0) [PDF 1.06 M] (25253)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2015,26(1):62-81 , DOI: 10.13328/j.cnki.jos.004701
    [Abstract] (16829) [HTML] (5479) [PDF 1.04 M] (34097)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2009,20(2):350-362
    [Abstract] (16732) [HTML] (0) [PDF 1.39 M] (43930)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2004,15(8):1208-1219
    [Abstract] (16700) [HTML] (0) [PDF 948.49 K] (17473)
    Abstract:
    With the explosive growth of network applications and complexity, the threat of Internet worms against network security becomes increasingly serious. Especially under the environment of Internet, the variety of the propagation ways and the complexity of the application environment result in worm with much higher frequency of outbreak, much deeper latency and more wider coverage, and Internet worms have been a primary issue faced by malicious code researchers. In this paper, the concept and research situation of Internet worms, exploration function component and execution mechanism are first presented, then the scanning strategies and propagation model are discussed, and finally the critical techniques of Internet worm prevention are given. Some major problems and research trends in this area are also addressed.
    2009,20(11):2965-2976
    [Abstract] (16636) [HTML] (0) [PDF 442.42 K] (18503)
    Abstract:
    This paper studies uncertain graph data mining and especially investigates the problem of mining frequent subgraph patterns from uncertain graph data. A data model is introduced for representing uncertainties in graphs, and an expected support is employed to evaluate the significance of subgraph patterns. By using the apriori property of expected support, a depth-first search-based mining algorithm is proposed with an efficient method for computing expected supports and a technique for pruning search space, which reduces the number of subgraph isomorphism testings needed by computing expected support from the exponential scale to the linear scale. Experimental results show that the proposed algorithm is 3 to 5 orders of magnitude faster than a na?ve depth-first search algorithm, and is efficient and scalable.
    2009,20(5):1226-1240
    [Abstract] (16574) [HTML] (0) [PDF 926.82 K] (20181)
    Abstract:
    This paper introduces the concrete details of combining the automated reasoning techniques with planning methods, which includes planning as satisfiability using propositional logic, Conformant planning using modal logic and disjunctive reasoning, planning as nonmonotonic logic, and Flexible planning as fuzzy description logic. After considering experimental results of International Planning Competition and relevant papers, it concludes that planning methods based on automated reasoning techniques is helpful and can be adopted. It also proposes the challenges and possible hotspots.
    2003,14(10):1717-1727
    [Abstract] (16418) [HTML] (0) [PDF 839.25 K] (18244)
    Abstract:
    Sensor networks are integration of sensor techniques, nested computation techniques, distributed computation techniques and wireless communication techniques. They can be used for testing, sensing, collecting and processing information of monitored objects and transferring the processed information to users. Sensor network is a new research area of computer science and technology and has a wide application future. Both academia and industries are very interested in it. The concepts and characteristics of the sensor networks and the data in the networks are introduced, and the issues of the sensor networks and the data management of sensor networks are discussed. The advance of the research on sensor networks and the data management of sensor networks are also presented.
    2014,25(4):839-862 , DOI: 10.13328/j.cnki.jos.004558
    [Abstract] (15751) [HTML] (4089) [PDF 1.32 M] (24123)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2012,23(1):1-20 , DOI: 10.3724/SP.J.1001.2012.04100
    [Abstract] (14849) [HTML] (0) [PDF 1017.73 K] (35570)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2009,20(10):2729-2743
    [Abstract] (14588) [HTML] (0) [PDF 1.12 M] (13578)
    Abstract:
    In a multi-hop wireless sensor network (WSN), the sensors closest to the sink tend to deplete their energy faster than other sensors, which is known as an energy hole around the sink. No more data can be delivered to the sink after an energy hole appears, while a considerable amount of energy is wasted and the network lifetime ends prematurely. This paper investigates the energy hole problem, and based on the improved corona model with levels, it concludes that the assignment of transmission ranges of nodes in different coronas is an effective approach for achieving energy-efficient network. It proves that the optimal transmission ranges for all areas is a multi-objective optimization problem (MOP), which is NP hard. The paper proposes an ACO (ant colony optimization)-based distributed algorithm to prolong the network lifetime, which can help nodes in different areas to adaptively find approximate optimal transmission range based on the node distribution. Furthermore, the simulation results indicate that the network lifetime under this solution approximates to that using the optimal list. Compared with existing algorithms, this ACO-based algorithm can not only make the network lifetime be extended more than two times longer, but also have good performance in the non-uniform node distribution.
    2012,23(5):1148-1166 , DOI: 10.3724/SP.J.1001.2012.04195
    [Abstract] (14557) [HTML] (0) [PDF 946.37 K] (20441)
    Abstract:
    With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here, the features, influence and related products of cloud databases are first discussed. Then, research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.
    2000,11(11):1460-1466
    [Abstract] (14550) [HTML] (0) [PDF 520.69 K] (13532)
    Abstract:
    Intrusion detection is a highlighted topic of network security research in recent years. In this paper, first the necessity o f intrusion detection is presented, and its concepts and models are described. T hen, many intrusion detection techniques and architectures are summarized. Final ly, the existing problems and the future direction in this field are discussed.
    2015,26(1):26-39 , DOI: 10.13328/j.cnki.jos.004631
    [Abstract] (14516) [HTML] (3616) [PDF 763.52 K] (20148)
    Abstract:
    In recent years, transfer learning has provoked vast amount of attention and research. Transfer learning is a new machine learning method that applies the knowledge from related but different domains to target domains. It relaxes the two basic assumptions in traditional machine learning: (1) the training (also referred as source domain) and test data (also referred target domain) follow the independent and identically distributed (i.i.d.) condition; (2) there are enough labeled samples to learn a good classification model, aiming to solve the problems that there are few or even not any labeled data in target domains. This paper surveys the research progress of transfer learning and introduces its own works, especially the ones in building transfer learning models by applying generative model on the concept level. Finally, the paper introduces the applications of transfer learning, such as text classification and collaborative filtering, and further suggests the future research direction of transfer learning.
    2002,13(7):1228-1237
    [Abstract] (14321) [HTML] (0) [PDF 500.04 K] (17664)
    Abstract:
    Software architecture (SA) is emerging as one of the primary research areas in software engineering recently and one of the key technologies to the development of large-scale software-intensive system and software product line system. The history and the major direction of SA are summarized, and the concept of SA is brought up based on analyzing and comparing the several classical definitions about SA. Based on summing up the activities about SA, two categories of study about SA are extracted out, and the advancements of researches on SA are subsequently introduced from seven aspects.Additionally,some disadvantages of study on SA are discussed,and the causes are explained at the same.Finally,it is concluded with some singificantly promising tendency about research on SA.
    2013,24(8):1786-1803 , DOI: 10.3724/SP.J.1001.2013.04416
    [Abstract] (14223) [HTML] (0) [PDF 1.04 M] (21902)
    Abstract:
    Many specific application oriented NoSQL database systems are developed for satisfying the new requirement of big data management. This paper surveys researches on typical NoSQL database based on key-value data model. First, the characteristics of big data, and the key technique issues supporting big data management are introduced. Then frontier efforts and research challenges are given, including system architecture, data model, access mode, index, transaction, system elasticity, load balance, replica strategy, data consistency, flash cache, MapReduce based data process and new generation data management system etc. Finally, research prospects are given.
    2006,17(7):1588-1600
    [Abstract] (13973) [HTML] (0) [PDF 808.73 K] (17476)
    Abstract:
    Routing technology at the network layer is pivotal in the architecture of wireless sensor networks. As an active branch of routing technology, cluster-based routing protocols excel in network topology management, energy minimization, data aggregation and so on. In this paper, cluster-based routing mechanisms for wireless sensor networks are analyzed. Cluster head selection, cluster formation and data transmission are three key techniques in cluster-based routing protocols. As viewed from the three techniques, recent representative cluster-based routing protocols are presented, and their characteristics and application areas are compared. Finally, the future research issues in this area are pointed out.
    2011,22(1):115-131 , DOI: 10.3724/SP.J.1001.2011.03950
    [Abstract] (13956) [HTML] (0) [PDF 845.91 K] (31282)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2004,15(4):571-583
    [Abstract] (13914) [HTML] (0) [PDF 1005.17 K] (12399)
    Abstract:
    For most peer-to-peer file-swapping applications, sharing is a volunteer action, and peers are not responsible for their irresponsible bartering history. This situation indicates the trust between participants can not be set up simply on the traditional trust mechanism. A reasonable trust construction approach comes from the social network analysis, in which trust relations between individuals are set up upon recommendations of other individuals. Current p2p trust model could not promise the convergence of iteration for trust computation, and takes no consideration for model security problems, such as sybil attack and slandering. This paper presents a novel recommendation-based global trust model and gives a distributed implementation method. Mathematic analyses and simulations show that, compared to the current global trust model, the proposed model is more robust on trust security problems and more complete on iteration for computing peer trust.
    2009,20(1):11-29
    [Abstract] (13811) [HTML] (0) [PDF 787.30 K] (17807)
    Abstract:
    Constrained optimization problems (COPs) are mathematical programming problems frequently encountered in the disciplines of science and engineering application. Solving COPs has become an important research area of evolutionary computation in recent years. In this paper, the state-of-the-art of constrained optimization evolutionary algorithms (COEAs) is surveyed from two basic aspects of COEAs (i.e., constraint-handling techniques and evolutionary algorithms). In addition, this paper discusses some important issues of COEAs. More specifically, several typical algorithms are analyzed in detail. Based on the analyses, it concluded that to obtain competitive results, a proper constraint-handling technique needs to be considered in conjunction with an appropriate search algorithm. Finally, the open research issues in this field are also pointed out.
    2013,24(1):50-66 , DOI: 10.3724/SP.J.1001.2013.04276
    [Abstract] (13661) [HTML] (0) [PDF 0.00 Byte] (20253)
    Abstract:
    As an important application of acceleration in the cloud, the distributed caching technology has received considerable attention in industry and academia. This paper starts with a discussion on the combination of cloud computing and distributed caching technology, giving an analysis of its characteristics, typical application scenarios, stages of development, standards, and several key elements, which have promoted its development. In order to systematically know the state of art progress and weak points of the distributed caching technology, the paper builds a multi-dimensional framework, DctAF. This framework is constituted of 6 dimensions through analyzing the characteristics of cloud computing and boundary of the caching techniques. Based on DctAF, current techniques have been analyzed and summarized; comparisons among several influential products have also been made. Finally, the paper describes and highlights the several challenges that the cache system faces and examines the current research through in-depth analysis and comparison.
    2008,19(zk):112-120
    [Abstract] (13655) [HTML] (0) [PDF 594.29 K] (17340)
    Abstract:
    An ad hoc network is a collection of wireless mobile nodes dynamically forming a temporary network without the use of any existing network infrastructure or centralized administration. Due to bandwidth constraint and dynamic topology of mobile ad hoc networks, multipath supported routing is a very important research issue. In this paper, we present an entropy-based metric to support stability multipath on-demand routing (SMDR). The key idea of SMDR protocol is to construct the new metric-entropy and select the stability multipath with the help of entropy metric to reduce the number of route reconstruction so as to provide QoS guarantee in the ad hoc network whose topology changes continuously. Simulation results show that, with the proposed multipath routing protocol, packet delivery ratio, end-to-end delay, and routing overhead ratio can be improved in most of cases. It is an available approach to multipath routing decision.
    2002,13(10):1952-1961
    [Abstract] (13489) [HTML] (0) [PDF 570.96 K] (16059)
    Abstract:
    The crucial technologies related to personalization are introduced in this paper, which include the representation and modification of user profile, the representation of resource, the recommendation technology, and the architecture of personalization. By comparing with some existing prototype systems, the key technologies about how to implement personalization are discussed in detail. In addition, three representative personalization systems are analyzed. At last, some research directions for personalization are presented.
    2003,14(9):1621-1628
    [Abstract] (13441) [HTML] (0) [PDF 680.35 K] (23048)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2003,14(9):1635-1644
    [Abstract] (13327) [HTML] (0) [PDF 622.06 K] (14946)
    Abstract:
    Computer forensics is the technology field that attempts to prove thorough, efficient, and secure means to investigate computer crime. Computer evidence must be authentic, accurate, complete and convincing to juries. In this paper, the stages of computer forensics are presented, and the theories and the realization of the forensics software are described. An example about forensic practice is also given. The deficiency of computer forensics technique and anti-forensics are also discussed. The result comes out that it is as the improvement of computer science technology, the forensics technique will become more integrated and thorough.
    2008,19(7):1565-1580
    [Abstract] (13293) [HTML] (0) [PDF 815.02 K] (19500)
    Abstract:
    Software defect prediction has been one of the active parts of software engineering since it was developed in 1970's. It plays a very important role in the analysis of software quality and balance of software cost. This paper investigates and discusses the motivation, evolvement, solutions and challenges of software defect prediction technologies, and it also categorizes, analyzes and compares the representatives of these prediction technologies. Some case studies for software defect distribution models are given to help understanding.
    2012,23(1):82-96 , DOI: 10.3724/SP.J.1001.2012.04101
    [Abstract] (13174) [HTML] (0) [PDF 394.07 K] (17828)
    Abstract:
    Botnets are one of the most serious threats to the Internet. Researchers have done plenty of research and made significant progress. However, botnets keep evolving and have become more and more sophisticated. Due to the underlying security limitation of current system and Internet architecture, and the complexity of botnet itself, how to effectively counter the global threat of botnets is still a very challenging issue. This paper first introduces the evolving of botnet’s propagation, attack, command, and control mechanisms. Then the paper summarizes recent advances of botnet defense research and categorizes into five areas: Botnet monitoring, botnet infiltration, analysis of botnet characteristics, botnet detection and botnet disruption. The limitation of current botnet defense techniques, the evolving trend of botnet, and some possible directions for future research are also discussed.
    2008,19(8):1947-1964
    [Abstract] (13152) [HTML] (0) [PDF 811.11 K] (12687)
    Abstract:
    Wide-Spread deployment for interactive information visualization is difficult. Non-Specialist users need a general development method and a toolkit to support the generic data structures suited to tree, network and multi-dimensional data, special visualization techniques and interaction techniques, and well-known generic information tasks. This paper presents a model driven development method for interactive information visualization. First, an interactive information visualization interface model (IIVM) is proposed. Then, the development method for interactive information visualization based on IIVM is presented. The Daisy toolkit is introduced, which includes Daisy model builder, Daisy IIV generator and runtime framework with Daisy library. Finally, an application example is given. Experimental results show that Daisy can provide a general solution for development for interactive information visualization.
    2008,19(8):1902-1919
    [Abstract] (13069) [HTML] (0) [PDF 521.73 K] (15800)
    Abstract:
    Visual language techniques have exhibited more advantages in describing various software artifacts than one-dimensional textual languages during software development, ranging from the requirement analysis and design to testing and maintenance, as diagrammatic and graphical notations have been well applied in modeling system. In addition to an intuitive appearance, graph grammars provide a well-established foundation for defining visual languages with the power of precise modeling and verification on computers. This paper discusses the issues and techniques for a formal foundation of visual languages, reviews related practical graphical environments, presents a spatial graph grammar formalism, and applies the spatial graph grammar to defining behavioral semantics of UML diagrams and developing a style-driven framework for software architecture design.
    2006,17(9):1848-1859
    [Abstract] (12917) [HTML] (0) [PDF 770.40 K] (23612)
    Abstract:
    In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
    2010,21(2):231-247
    [Abstract] (12846) [HTML] (0) [PDF 1.21 M] (19269)
    Abstract:
    In this paper, a framework is proposed for handling fault of service composition through analyzing fault requirements. Petri nets are used in the framework for fault detecting and its handling, which focuses on targeting the failure of available services, component failure and network failure. The corresponding fault models are given. Based on the model, the correctness criterion of fault handling is given to analyze fault handling model, and its correctness is proven. Finally, CTL (computational tree logic) is used to specify the related properties and enforcement algorithm of fault analysis. The simulation results show that this method can ensure the reliability and consistency of service composition.
    2017,28(1):1-16 , DOI: 10.13328/j.cnki.jos.005139
    [Abstract] (12756) [HTML] (5171) [PDF 1.75 M] (13164)
    Abstract:
    Knapsack problem (KP) is a well-known combinatorial optimization problem which includes 0-1 KP, bounded KP, multi-constraint KP, multiple KP, multiple-choice KP, quadratic KP, dynamic knapsack KP, discounted KP and other types of KPs. KP can be considered as a mathematical model extracted from variety of real fields and therefore has wide applications. Evolutionary algorithms (EAs) are universally considered as an efficient tool to solve KP approximately and quickly. This paper presents a survey on solving KP by EAs over the past ten years. It not only discusses various KP encoding mechanism and the individual infeasible solution processing but also provides useful guidelines for designing new EAs to solve KPs.
    2010,21(7):1620-1634
    [Abstract] (12653) [HTML] (0) [PDF 765.23 K] (22586)
    Abstract:
    As an application of mobile ad hoc networks (MANET) on Intelligent Transportation Information System, the most important goal of vehicular ad hoc networks (VANET) is to reduce the high number of accidents and fatal consequences dramatically. One of the most important factors that would contribute to the realization of this goal is the design of effective broadcast protocols. This paper introduces the characteristics and application fields of VANET briefly. Then, it discusses the characteristics, performance, and application areas with analysis and comparison of various categories of broadcast protocols in VANET. According to the characteristic of VANET and its application requirement, the paper proposes the ideas and breakthrough direction of information broadcast model design of inter-vehicle communication.
    2010,21(5):916-929
    [Abstract] (12612) [HTML] (0) [PDF 944.50 K] (20996)
    Abstract:
    Data deduplication technologies can be divided into two categories: a) identical data detection techniques, and b) similar data detection and encoding techniques. This paper presents a systematic survey on these two categories of data deduplication technologies and analyzes their advantages and disadvantages. Besides, since data deduplication technologies can affect the reliability and performance of storage systems, this paper also surveys various kinds of technologies proposed to cope with these two aspects of problems. Based on the analysis of the current state of research on data deduplication technologies, this paper makes several conclusions as follows: a) How to mine data characteristic information in data deduplication has not been completely solved, and how to use data characteristic information to effectively eliminate duplicate data also needs further study; b) From the perspective of storage system design, it still needs further study how to introduce proper mechanisms to overcome the reliability limitations of data deduplication techniques and reduce the additional system overheads caused by data deduplication techniques.
    2009,20(6):1393-1405
    [Abstract] (12449) [HTML] (0) [PDF 831.86 K] (22366)
    Abstract:
    Combinatorial testing can use a small number of test cases to test systems while preserving fault detection ability. However, the complexity of test case generation problem for combinatorial testing is NP-complete. The efficiency and complexity of this testing method have attracted many researchers from the area of combinatorics and software engineering. This paper summarizes the research works on this topic in recent years. They include: various combinatorial test criteria, the relations between the test generation problem and other NP-complete problems, the mathematical methods for constructing test cases, the computer search techniques for test generation and fault localization techniques based on combinatorial testing.
    2008,19(10):2706-2719
    [Abstract] (12293) [HTML] (0) [PDF 778.29 K] (14285)
    Abstract:
    Web search engine has become a very important tool for finding information efficiently from the massive Web data. With the explosive growth of the Web data, traditional centralized search engines become harder to catch up with the growing step of people's information needs. With the rapid development of peer-to-peer (P2P) technology, the notion of P2P Web search has been proposed and quickly becomes a research focus. The goal of this paper is to give a brief summary of current P2P Web search technologies in order to facilitate future research. First, some main challenges for P2P Web search are presented. Then, key techniques for building a feasible and efficient P2P Web search engine are reviewed, including system topology, data placement, query routing, index partitioning, collection selection, relevance ranking and Web crawling. Finally, three recently proposed novel P2P Web search prototypes are introduced.
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291
    [Abstract] (37761) [HTML] (0) [PDF 832.28 K] (83770)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2008,19(1):48-61
    [Abstract] (28835) [HTML] (0) [PDF 671.39 K] (65258)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2010,21(8):1834-1848
    [Abstract] (21461) [HTML] (0) [PDF 682.96 K] (60914)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2011,22(1):71-83 , DOI: 10.3724/SP.J.1001.2011.03958
    [Abstract] (30442) [HTML] (0) [PDF 781.42 K] (60910)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2009,20(1):54-66
    [Abstract] (19996) [HTML] (0) [PDF 1.41 M] (53983)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2009,20(5):1337-1348
    [Abstract] (28564) [HTML] (0) [PDF 1.06 M] (47840)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2009,20(2):271-289
    [Abstract] (27613) [HTML] (0) [PDF 675.56 K] (47594)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2014,25(9):1889-1908 , DOI: 10.13328/j.cnki.jos.004674
    [Abstract] (12159) [HTML] (4763) [PDF 550.98 K] (44554)
    Abstract:
    This paper first introduces the key features of big data in different processing modes and their typical application scenarios, as well as corresponding representative processing systems. It then summarizes three development trends of big data processing systems. Next, the paper gives a brief survey on system supported analytic technologies and applications (including deep learning, knowledge computing, social computing, and visualization), and summarizes the key roles of individual technologies in big data analysis and understanding. Finally, the paper lays out three grand challenges of big data processing and analysis, i.e., data complexity, computation complexity, and system complexity. Potential ways for dealing with each complexity are also discussed.
    2009,20(2):350-362
    [Abstract] (16732) [HTML] (0) [PDF 1.39 M] (43930)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2004,15(10):1493-1504
    [Abstract] (9286) [HTML] (0) [PDF 937.72 K] (41990)
    Abstract:
    Graphics processing unit (GPU) has been developing rapidly in recent years at a speed over Moor抯 law, and as a result, various applications associated with computer graphics advance greatly. At the same time, the highly processing power, parallelism and programmability available nowadays on the contemporary GPU provide an ideal platform on which the general-purpose computation could be made. Starting from an introduction to the development history and the architecture of GPU, the technical fundamentals of GPU are described in the paper. Then in the main part of the paper, the development of various applications on general purpose computation on GPU is introduced, and among those applications, fluid dynamics, algebraic computation, database operations, and spectrum analysis are introduced in detail. The experience of our work on fluid dynamics has been also given, and the development of software tools in this area is introduced. Finally, a conclusion is made, and the future development and the new challenge on both hardware and software in this subject are discussed.
    2010,21(3):427-437
    [Abstract] (33348) [HTML] (0) [PDF 308.76 K] (41970)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2021,32(2):349-369 , DOI: 10.13328/j.cnki.jos.006138
    [Abstract] (9115) [HTML] (10647) [PDF 2.36 M] (40947)
    Abstract:
    Few-shot learning is defined as learning models to solve problems from small samples. In recent years, under the trend of training model with big data, machine learning and deep learning have achieved success in many fields. However, in many application scenarios in the real world, there is not a large amount of data or labeled data for model training, and labeling a large number of unlabeled samples will cost a lot of manpower. Therefore, how to use a small number of samples for learning has become a problem that needs to be paid attention to at present. This paper systematically combs the current approaches of few-shot learning. It introduces each kind of corresponding model from the three categories: fine-tune based, data augmentation based, and transfer learning based. Then, the data augmentation based approaches are subdivided into unlabeled data based, data generation based, and feature augmentation based approaches. The transfer learning based approaches are subdivided into metric learning based, meta-learning based, and graph neural network based methods. In the following, the paper summarizes the few-shot datasets and the results in the experiments of the aforementioned models. Next, the paper summarizes the current situation and challenges in few-shot learning. Finally, the future technological development of few-shot learning is prospected.
    2022,33(7):2464-2481 , DOI: 10.13328/j.cnki.jos.006585
    [Abstract] (1387) [HTML] (2955) [PDF 2.00 M] (39496)
    Abstract:
    Symbolic propagation methods based on linear abstraction play a significant role in neural network verification. This study proposes the notion of multi-path back-propagation for these methods. Existing methods are viewed as using only a single back-propagation path to calculate the upper and lower bounds of each node in a given neural network, being specific instances of the proposed notion. Leveraging multiple back-propagation paths effectively improves the accuracy of this kind of method. For evaluation, the proposed method is quantitatively compared using multiple back-propagation paths with the state-of-the-art tool DeepPoly on benchmarks ACAS Xu, MNIST, and CIFAR10. The experiment results show that the proposed method achieves significant accuracy improvement while introducing only a low extra time cost. In addition, the multi-path back-propagation method is compared with the Optimized LiRPA based on global optimization, on the dataset MNIST. The results show that the proposed method still has an accuracy advantage.
    2013,24(11):2476-2497 , DOI: 10.3724/SP.J.1001.2013.04486
    [Abstract] (10784) [HTML] (0) [PDF 1.14 M] (39460)
    Abstract:
    Probabilistic graphical models are powerful tools for compactly representing complex probability distributions, efficiently computing (approximate) marginal and conditional distributions, and conveniently learning parameters and hyperparameters in probabilistic models. As a result, they have been widely used in applications that require some sort of automated probabilistic reasoning, such as computer vision and natural language processing, as a formal approach to deal with uncertainty. This paper surveys the basic concepts and key results of representation, inference and learning in probabilistic graphical models, and demonstrates their uses in two important probabilistic models. It also reviews some recent advances in speeding up classic approximate inference algorithms, followed by a discussion of promising research directions.
    2018,29(5):1471-1514 , DOI: 10.13328/j.cnki.jos.005519
    [Abstract] (6591) [HTML] (6706) [PDF 4.38 M] (37033)
    Abstract:
    Computer aided detection/diagnosis (CAD) can improve the accuracy of diagnosis,reduce false positive,and provide decision supports for doctors.The main purpose of this paper is to analyze the latest development of computer aided diagnosis tools.Focusing on the top four fatal cancer's incidence positions,major recent publications on CAD applications in different medical imaging areas are reviewed in this survey according to different imaging techniques and diseases.Further more,multidimentional analysis is made on the researches from image data sets,algorithms and evaluation methods.Finally,existing problems,research trend and development direction in the field of medical image CAD system are discussed.
    2012,23(4):962-986 , DOI: 10.3724/SP.J.1001.2012.04175
    [Abstract] (19169) [HTML] (0) [PDF 2.09 M] (35942)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):1-20 , DOI: 10.3724/SP.J.1001.2012.04100
    [Abstract] (14849) [HTML] (0) [PDF 1017.73 K] (35570)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2016,27(1):45-71 , DOI: 10.13328/j.cnki.jos.004914
    [Abstract] (30444) [HTML] (4856) [PDF 880.96 K] (35547)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2012,23(1):32-45 , DOI: 10.3724/SP.J.1001.2012.04091
    [Abstract] (18910) [HTML] (0) [PDF 408.86 K] (34656)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2015,26(1):62-81 , DOI: 10.13328/j.cnki.jos.004701
    [Abstract] (16829) [HTML] (5479) [PDF 1.04 M] (34097)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2005,16(5):857-868
    [Abstract] (20013) [HTML] (0) [PDF 489.65 K] (33350)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2011,22(1):115-131 , DOI: 10.3724/SP.J.1001.2011.03950
    [Abstract] (13956) [HTML] (0) [PDF 845.91 K] (31282)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2013,24(1):77-90 , DOI: 10.3724/SP.J.1001.2013.04339
    [Abstract] (11388) [HTML] (0) [PDF 0.00 Byte] (30244)
    Abstract:
    Task parallel programming model is a widely used parallel programming model on multi-core platforms. With the intention of simplifying parallel programming and improving the utilization of multiple cores, this paper provides an introduction to the essential programming interfaces and the supporting mechanism used in task parallel programming models and discusses issues and the latest achievements from three perspectives: Parallelism expression, data management and task scheduling. In the end, some future trends in this area are discussed.
    2017,28(4):959-992 , DOI: 10.13328/j.cnki.jos.005143
    [Abstract] (9589) [HTML] (6858) [PDF 3.58 M] (30028)
    Abstract:
    The development of mobile internet and the popularity of mobile terminals produce massive trajectory data of moving objects under the era of big data. Trajectory data has spatio-temporal characteristics and rich information. Trajectory data processing techniques can be used to mine the patterns of human activities and behaviors, the moving patterns of vehicles in the city and the changes of atmospheric environment. However, trajectory data also can be exploited to disclose moving objects' privacy information (e.g., behaviors, hobbies and social relationships). Accordingly, attackers can easily access moving objects' privacy information by digging into their trajectory data such as activities and check-in locations. In another front of research, quantum computation presents an important theoretical direction to mine big data due to its scalable and powerful storage and computing capacity. Applying quantum computing approaches to handle trajectory big data could make some complex problem solvable and achieve higher efficiency. This paper reviews the key technologies of processing trajectory data. First the concept and characteristics of trajectory data is introduced, and the pre-processing methods, including noise filtering and data compression, are summarized. Then, the trajectory indexing and querying techniques, and the current achievements of mining trajectory data, such as pattern mining and trajectory classification, are reviewed. Next, an overview of the basic theories and characteristics of privacy preserving with respect to trajectory data is provided. The supporting techniques of trajectory big data mining, such as processing framework and data visualization, are presented in detail. Some possible ways of applying quantum computation into trajectory data processing, as well as the implementation of some core trajectory mining algorithms by quantum computation are also described. Finally, the challenges of trajectory data processing and promising future research directions are discussed.
    2010,21(2):344-358
    [Abstract] (8543) [HTML] (0) [PDF 1.01 M] (28999)
    Abstract:
    In this paper, the existing intrusion tolerance and self-destruction technology are integrated into autonomic computing in order to construct an autonomic dependability model based on SM-PEPA (semi-Markov performance evaluation process algebra) which is capable of formal analysis and verification. It can hierarchically anticipate Threats to dependability (TtD) at different levels in a self-management manner to satisfy the special requirements for dependability of mission-critical systems. Based on this model, a quantification approach is proposed on the view of steady-state probability to evaluate autonomic dependability. Finally, this paper analyzes the impacts of parameters of the model on autonomic dependability in a case study, and the experimental results demonstrate that improving the detection rate of TtD as well as the successful rate of self-healing will greatly increase the autonomic dependability.
    2014,25(1):37-50 , DOI: 10.13328/j.cnki.jos.004497
    [Abstract] (10533) [HTML] (5219) [PDF 929.87 K] (27645)
    Abstract:
    This paper surveys the state of the art of speech emotion recognition (SER), and presents an outlook on the trend of future SER technology. First, the survey summarizes and analyzes SER in detail from five perspectives, including emotion representation models, representative emotional speech corpora, emotion-related acoustic features extraction, SER methods and applications. Then, based on the survey, the challenges faced by current SER research are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, and presents detailed comparison and analysis between these methods.
    2011,22(6):1299-1315 , DOI: 10.3724/SP.J.1001.2011.03993
    [Abstract] (11659) [HTML] (0) [PDF 987.90 K] (27632)
    Abstract:
    Attribute-Based encryption (ABE) scheme takes attributes as the public key and associates the ciphertext and user’s secret key with attributes, so that it can support expressive access control policies. This dramatically reduces the cost of network bandwidth and sending node’s operation in fine-grained access control of data sharing. Therefore, ABE has a broad prospect of application in the area of fine-grained access control. After analyzing the basic ABE system and its two variants, Key-Policy ABE (KP-ABE) and Ciphertext-Policy ABE (CP-ABE), this study elaborates the research problems relating to ABE systems, including access structure design for CP-ABE, attribute key revocation, key abuse and multi-authorities ABE with an extensive comparison of their functionality and performance. Finally, this study discusses the need-to-be solved problems and main research directions in ABE.
    2020,31(7):2245-2282 , DOI: 10.13328/j.cnki.jos.006037
    [Abstract] (3265) [HTML] (6312) [PDF 967.02 K] (26653)
    Abstract:
    Ultrasonography is the first choice of imaging examination and preoperative evaluation for thyroid and breast cancer. However, ultrasonic characteristics of benign and malignant nodules are commonly overlapped. The diagnosis heavily relies on operator's experience other than quantitative and stable methods. In recent years, medical imaging analysis based on computer technology has developed rapidly, and a series of landmark breakthroughs have been made, which provides effective decision supports for medical imaging diagnosis. In this work, the research progress of computer vision and image recognition technologies in thyroid and breast ultrasound images is studied. A series of key technologies involved in automatic diagnosis of ultrasound images is the main lines of the work. The major algorithms in recent years are summarized and analyzed, such as ultrasound image preprocessing, lesion localization and segmentation, feature extraction and classification. Moreover, multi-dimensional analysis is made on the algorithms, data sets, and evaluation methods. Finally, existing problems related to automatic analysis of those two kinds of ultrasound imaging are discussed, research trend and development direction in the field of ultrasound images analysis are discussed.
    2009,20(3):524-545
    [Abstract] (17620) [HTML] (0) [PDF 1.09 M] (25960)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2018,29(10):2966-2994 , DOI: 10.13328/j.cnki.jos.005551
    [Abstract] (10558) [HTML] (6760) [PDF 610.06 K] (25875)
    Abstract:
    In recent years, the rapid development of Internet technology and Web applications has triggered the explosion of various data on the Internet, which generates a large amount of valuable knowledge. How to organize, represent and analyze these knowledge has attracted much attention. Knowledge graph was thus developed to organize these knowledge in a semantical and visualized manner. Knowledge reasoning over knowledge graph then becomes one of the hot research topics and plays an important role in many applications such as vertical search and intelligent question-answer. The goal of knowledge reasoning over knowledge graph is to infer new facts or identify erroneous facts according to existing ones. Unlike traditional knowledge reasoning, knowledge reasoning over knowledge graph is more diversified, due to the simplicity, intuitiveness, flexibility, and richness of knowledge representation in knowledge graph. Starting with the basic concept of knowledge reasoning, this paper presents a survey on the recently developed methods for knowledge reasoning over knowledge graph. Specifically, the research progress is reviewed in detail from two aspects:One-Step reasoning and multi-step reasoning, each including rule based reasoning, distributed embedding based reasoning, neural network based reasoning and hybrid reasoning. Finally, future research directions and outlook of knowledge reasoning over knowledge graph are discussed.
    2013,24(4):825-842 , DOI: 10.3724/SP.J.1001.2013.04369
    [Abstract] (9012) [HTML] (0) [PDF 1.09 M] (25332)
    Abstract:
    Honeypot is a proactive defense technology, introduced by the defense side to change the asymmetric situation of a network attack and defensive game. Through the deployment of the honeypots, i.e. security resources without any production purpose, the defenders can deceive attackers to illegally take advantage of the honeypots and capture and analyze the attack behaviors to understand the attack tools and methods, and to learn the intentions and motivations. Honeypot technology has won the sustained attention of the security community to make considerable progress and get wide application, and has become one of the main technical means of the Internet security threat monitoring and analysis. In this paper, the origin and evolution process of the honeypot technology are presented first. Next, the key mechanisms of honeypot technology are comprehensively analyzed, the development process of the honeypot deployment structure is also reviewed, and the latest applications of honeypot technology in the directions of Internet security threat monitoring, analysis and prevention are summarized. Finally, the problems of honeypot technology, development trends and further research directions are discussed.
    2009,20(1):124-137
    [Abstract] (17252) [HTML] (0) [PDF 1.06 M] (25253)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2018,29(10):3068-3090 , DOI: 10.13328/j.cnki.jos.005607
    [Abstract] (9326) [HTML] (9676) [PDF 2.28 M] (24914)
    Abstract:
    Designing problems are ubiquitous in science research and industry applications. In recent years, Bayesian optimization, which acts as a very effective global optimization algorithm, has been widely applied in designing problems. By structuring the probabilistic surrogate model and the acquisition function appropriately, Bayesian optimization framework can guarantee to obtain the optimal solution under a few numbers of function evaluations, thus it is very suitable to solve the extremely complex optimization problems in which their objective functions could not be expressed, or the functions are non-convex, multimodal and computational expensive. This paper provides a detailed analysis on Bayesian optimization in methodology and application areas, and discusses its research status and the problems in future researches. This work is hopefully beneficial to the researchers from the related communities.
    2019,30(2):440-468 , DOI: 10.13328/j.cnki.jos.005659
    [Abstract] (9376) [HTML] (7546) [PDF 3.27 M] (24875)
    Abstract:
    Recent years, applying Deep Learning (DL) into Image Semantic Segmentation (ISS) has been widely used due to its state-of-the-art performances and high-quality results. This paper systematically reviews the contribution of DL to the field of ISS. Different methods of ISS based on DL (ISSbDL) are summarized. These methods are divided into ISS based on the Regional Classification (ISSbRC) and ISS based on the Pixel Classification (ISSbPC) according to the image segmentation characteristics and segmentation granularity. Then, the methods of ISSbPC are surveyed from two points of view:ISS based on Fully Supervised Learning (ISSbFSL) and ISS based on Weakly Supervised Learning (ISSbWSL). The representative algorithms of each method are introduced and analyzed, as well as the basic workflow, framework, advantages and disadvantages of these methods are detailedly analyzed and compared. In addition, the related experiments of ISS are analyzed and summarized, and the common data sets and performance evaluation indexes in ISS experiments are introduced. Finally, possible research directions and trends are given and analyzed.
    2004,15(11):1583-1594
    [Abstract] (9331) [HTML] (0) [PDF 1.57 M] (24584)
    Abstract:
    Uncertainty exists widely in the subjective and objective world. In all kinds of uncertainty, randomness and fuzziness are the most important and fundamental. In this paper, the relationship between randomness and fuzziness is discussed. Uncertain states and their changes can be measured by entropy and hyper-entropy respectively. Taken advantage of entropy and hyper-entropy, the uncertainty of chaos, fractal and complex networks by their various evolution and differentiation are further studied. A simple and effective way is proposed to simulate the uncertainty by means of knowledge representation which provides a basis for the automation of both logic and image thinking with uncertainty. The AI (artificial intelligence) with uncertainty is a new cross-discipline, which covers computer science, physics, mathematics, brain science, psychology, cognitive science, biology and philosophy, and results in the automation of representation, process and thinking for uncertain information and knowledge.
    2014,25(4):839-862 , DOI: 10.13328/j.cnki.jos.004558
    [Abstract] (15751) [HTML] (4089) [PDF 1.32 M] (24123)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2005,16(1):1-7
    [Abstract] (22718) [HTML] (0) [PDF 614.61 K] (23981)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2006,17(9):1848-1859
    [Abstract] (12917) [HTML] (0) [PDF 770.40 K] (23612)
    Abstract:
    In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
    2012,23(8):2058-2072 , DOI: 10.3724/SP.J.1001.2012.04237
    [Abstract] (10323) [HTML] (0) [PDF 800.05 K] (23419)
    Abstract:
    The Distributed denial of service (DDoS) attack is a major threat to the current network. Based on the attack packet level, the study divides DDoS attacks into network-level DDoS attacks and application-level DDoS attacks. Next, the study analyzes the detection and control methods of these two kinds of DDoS attacks in detail, and it also analyzes the drawbacks of different control methods implemented in different network positions. Finally, the study analyzes the drawbacks of the current detection and control methods, the development trend of the DDoS filter system, and corresponding technological challenges are also proposed.
    2013,24(2):295-316 , DOI: 10.3724/SP.J.1001.2013.04336
    [Abstract] (10013) [HTML] (0) [PDF 0.00 Byte] (23403)
    Abstract:
    Under the new application mode, the traditional hierarchy data centers face several limitations in size, bandwidth, scalability, and cost. In order to meet the needs of new applications, data center network should fulfill the requirements with low-cost, such as high scalability, low configuration overhead, robustness and energy-saving. First, the shortcomings of the traditional data center network architecture are summarized, and new requirements are pointed out. Secondly, the existing proposals are divided into two categories, i.e. server-centric and network-centric. Then, several representative architectures of these two categories are overviewed and compared in detail. Finally, the future directions of data center network are discussed.
    2005,16(10):1743-1756
    [Abstract] (10474) [HTML] (0) [PDF 545.62 K] (23300)
    Abstract:
    This paper presents a survey on the theory of provable security and its applications to the design and analysis of security protocols. It clarifies what the provable security is, explains some basic notions involved in the theory of provable security and illustrates the basic idea of random oracle model. It also reviews the development and advances of provably secure public-key encryption and digital signature schemes, in the random oracle model or the standard model, as well as the applications of provable security to the design and analysis of session-key distribution protocols and their advances.
    2023,34(2):625-654 , DOI: 10.13328/j.cnki.jos.006696
    [Abstract] (3767) [HTML] (5113) [PDF 3.04 M] (23274)
    Abstract:
    Source code bug (vulnerability) detection is a process of judging whether there are unexpected behaviors in the program code. It is widely used in software engineering tasks such as software testing and software maintenance, and plays a vital role in software functional assurance and application security. Traditional vulnerability detection research is based on program analysis, which usually requires strong domain knowledge and complex calculation rules, and faces the problem of state explosion, resulting in limited detection performance, and there is room for greater improvement in the rate of false positives and false negatives. In recent years, the open source community's vigorous development has accumulated massive amounts of data with open source code as the core. In this context, the feature learning capabilities of deep learning can automatically learn semantically rich code representations, thereby providing a new way for vulnerability detection. This study collected the latest high-level papers in this field, systematically summarized and explained the current methods from two aspects:vulnerability code dataset and deep learning vulnerability detection model. Finally, it summarizes the main challenges faced by the research in this field, and looks forward to the possible future research focus.
    2021,32(2):496-518 , DOI: 10.13328/j.cnki.jos.006140
    [Abstract] (6289) [HTML] (9732) [PDF 2.20 M] (23237)
    Abstract:
    Deep learning has achieved great success in the field of computer vision, surpassing many traditional methods. However, in recent years, deep learning technology has been abused in the production of fake videos, making fake videos represented by Deepfakes flooding on the Internet. This technique produces pornographic movies, fake news, political rumors by tampering or replacing the face information of the original videos and synthesizes fake speech. In order to eliminate the negative effects brought by such forgery technologies, many researchers have conducted in-depth research on the identification of fake videos and proposed a series of detection methods to help institutions or communities to identify such fake videos. Nevertheless, the current detection technology still has many limitations such as specific distribution data, specific compression ratio, and so on, far behind the generation technology of fake video. In addition, different researchers handle the problem from different angles. The data sets and evaluation indicators used are not uniform. So far, the academic community still lacks a unified understanding of deep forgery and detection technology. The architecture of deep forgery and detection technology research is not clear. In this review, the development of deep forgery and detection technologies are reviewed. Besides, existing research works are systematically summarize and scientifically classified. Finally, the social risks posed by the spread of Deepfakes technology are discussed, the limitations of detection technology are analyzed, and the challenges and potential research directions of detection technology are discussed, aiming to provide guidance for follow-up researchers to further promote the development and deployment of Deepfakes detection technology.
    2003,14(9):1621-1628
    [Abstract] (13441) [HTML] (0) [PDF 680.35 K] (23048)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2018,29(7):2092-2115 , DOI: 10.13328/j.cnki.jos.005589
    [Abstract] (11003) [HTML] (7590) [PDF 2.52 M] (22996)
    Abstract:
    Blockchain is a distributed public ledger technology that originates from the digital cryptocurrency, bitcoin. Its development has attracted wide attention in industry and academia fields. Blockchain has the advantages of de-centralization, trustworthiness, anonymity and immutability. It breaks through the limitation of traditional center-based technology and has broad development prospect. This paper introduces the research progress of blockchain technology and its application in the field of information security. Firstly, the basic theory and model of blockchain are introduced from five aspects:Basic framework, key technology, technical feature, and application mode and area. Secondly, from the perspective of current research situation of blockchain in the field of information security, this paper summarizes the research progress of blockchain in authentication technology, access control technology and data protection technology, and compares the characteristics of various researches. Finally, the application challenges of blockchain technology are analyzed, and the development outlook of blockchain in the field of information security is highlighted. This study intends to provide certain reference value for future research work.
    2010,21(7):1605-1619
    [Abstract] (10129) [HTML] (0) [PDF 856.25 K] (22721)
    Abstract:
    The rapid development of Internet leads to an increase in system complexity and uncertainty. Traditional network management can not meet the requirement, and it shall evolve to fusion based Cyberspace Situational Awareness (CSA). Based on the analysis of function shortage and development requirement, this paper introduces CSA as well as its origin, conception, objective and characteristics. Firstly, a CSA research framework is proposed and the research history is investigated, based on which the main aspects and the existing issues of the research are analyzed. Meanwhile, assessment methods are divided into three categories: Mathematics model, knowledge reasoning and pattern recognition. Then, this paper discusses CSA from three aspects: Model, knowledge representation and assessment methods, and then goes into detail about main idea, assessment process, merits and shortcomings of novel methods. Many typical methods are compared. The current application research of CSA in the fields of security, transmission, survivable, system evaluation and so on is presented. Finally, this paper points the development directions of CSA and offers the conclusions from issue system, technical system and application system.
    2016,27(11):2855-2869 , DOI: 10.13328/j.cnki.jos.004932
    [Abstract] (3257) [HTML] (2584) [PDF 1.85 M] (22710)
    Abstract:
    With the proliferation of the Chinese social network (especially the rise of weibo), the productivity and lifestyle of the country's society is more and more profoundly influenced by the Chinese internet public events. Due to the lack of the effective technical means, the efficiency of information processing is limited. This paper proposes a public event information entropy calculation method. First, a mathematical modeling of event information content is built. Then, multidimensional random variable information entropy of the public events is calculated based on Shannon information theory. Furthermore, a new technical index of quantitative analysis to the internet public events is put forward, laying out a foundation for further research work.
    2010,21(7):1620-1634
    [Abstract] (12653) [HTML] (0) [PDF 765.23 K] (22586)
    Abstract:
    As an application of mobile ad hoc networks (MANET) on Intelligent Transportation Information System, the most important goal of vehicular ad hoc networks (VANET) is to reduce the high number of accidents and fatal consequences dramatically. One of the most important factors that would contribute to the realization of this goal is the design of effective broadcast protocols. This paper introduces the characteristics and application fields of VANET briefly. Then, it discusses the characteristics, performance, and application areas with analysis and comparison of various categories of broadcast protocols in VANET. According to the characteristic of VANET and its application requirement, the paper proposes the ideas and breakthrough direction of information broadcast model design of inter-vehicle communication.
    2017,28(1):17-34 , DOI: 10.13328/j.cnki.jos.005151
    [Abstract] (8628) [HTML] (4751) [PDF 2.02 M] (22508)
    Abstract:
    The vigorous development of positioning technology and pervasive computing has given rise to trajectory big data, i.e. the high speed trajectory data stream that originated from positioning devices. Analyzing trajectory big data timely and effectively enables us to discover the abnormal patterns that hide in trajectory data streams, and therefore to provide effective support to applications such as urban planning, traffic management, and security controlling. The traditional anomaly detection algorithms cannot be applied to outlier detection in trajectory big data directly due to the characteristics of trajectories such as uncertainty, un-limitedness, time-varying evolvability, sparsity and skewness distribution. In addition, most of trajectory outlier detection methods designed for static trajectory dataset usually assume a priori known data distribution while disregarding the temporal property of trajectory data, and thus are unsuitable for identifying the evolutionary trajectory outlier. When dealing with huge amount of low-quality trajectory big data, a series of issues need to be addressed. Those issues include coping with the concept drifts of time-varying data distribution in limited system resources, online detecting trajectory outliers, analyzing causal interactions among traffic outliers, identifying the evolutionary related trajectory outlier in larger spatial-temporal regions, and analyzing the hidden abnormal events and the root cause in trajectory anomalies by using application related multi-source heterogeneous data. Aiming at solving the problems mentioned above, this paper reviews the existing trajectory outlier detecting techniques from several categories, describes the system architecture of outlier detection in trajectory big data, and discusses the research directions such as outlier detection in trajectory stream, visualization and evolutionary analysis in trajectory outlier detection, benchmark for trajectory outlier detection system, and data fusion in semantic analysis for anomaly detection results.
    2009,20(6):1393-1405
    [Abstract] (12449) [HTML] (0) [PDF 831.86 K] (22366)
    Abstract:
    Combinatorial testing can use a small number of test cases to test systems while preserving fault detection ability. However, the complexity of test case generation problem for combinatorial testing is NP-complete. The efficiency and complexity of this testing method have attracted many researchers from the area of combinatorics and software engineering. This paper summarizes the research works on this topic in recent years. They include: various combinatorial test criteria, the relations between the test generation problem and other NP-complete problems, the mathematical methods for constructing test cases, the computer search techniques for test generation and fault localization techniques based on combinatorial testing.

External Links

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063