Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2025,36(9):3919-3936, DOI: 10.13328/j.cnki.jos.007292, CSTR: 32375.14.jos.007292
    [Abstract] (305) [HTML] (28) [PDF 6.72 K] (507)
    Abstract:
    The memory consistency model defines constraints on memory access orders for parallel programs in multi-core systems and is an important architectural specification jointly followed by software and hardware. Sequential consistency (SC) per location is a classic axiom of memory consistency models, which specifies that all memory access operations with the same address in a multi-core system follow SC. Meanwhile, it has been widely employed in the memory consistency models of classic architectures such as X86/TSO, Power, and ARM, and plays an important role in chip memory consistency verification, system software, and parallel program development. RISV is an open-source architectural specification, and its memory model is defined by global memory orders, preserved program orders, and three axioms (the load value axiom, atomicity axiom, and progress axiom). Additionally, it does not directly include SC per location as an axiom, which poses challenges to existing memory model verification tools and system software development. This study formalizes the SC per location as a theorem based on the defined axioms and rules in the RISC-V memory model. The proof process abstracts the construction of memory access sequences with the same arbitrary address into deterministic finite automata for inductive proof. This study is a theoretical supplement to the formal methods of RISC-V memory consistency.
    2025,36(9):3937-3953, DOI: 10.13328/j.cnki.jos.007357, CSTR: 32375.14.jos.007357
    [Abstract] (449) [HTML] (40) [PDF 6.76 K] (501)
    Abstract:
    Instruction-level parallelism is a fundamental challenge in processor architecture research. Very long instruction word (VLIW) architecture is widely used in the field of digital signal processing to enhance instruction-level parallelism. In VLIW architecture, the instruction issue order is determined by the compiler, making its performance highly dependent on the compiler’s instruction scheduling. To explore the potential of RISC-V VLIW architecture and further enrich the RISC-V ecosystem, this study focuses on optimizing instruction scheduling algorithms for RISC-V VLIW architecture. For a single scheduling region, the integer linear programming (ILP) scheduling can achieve optimal solutions but suffers from high computational complexity, whereas list scheduling offers lower complexity at the cost of potentially suboptimal solutions. To leverage the strengths of both approaches, this study proposes a hybrid instruction scheduling algorithm. The scheduling region where the list scheduling has not reached the optimal solution can be located with the IPC theoretical model, and then the integer linear programming scheduling algorithm further processes the located scheduling region. The theoretical model is based on data flow analysis, accounting for both instruction dependencies and hardware resources, and provides a theoretical upper bound for IPC with linear complexity. The accuracy of the IPC theoretical model is a critical factor for the success of hybrid scheduling and achieves 95.74% accuracy in this study. On the given benchmark, the IPC model identifies that 94.62% of scheduling regions has reached optimal solution with list scheduling, leaving only 5.38% requiring further refinement with ILP scheduling. The proposed hybrid scheduling algorithm achieves the scheduling quality of ILP scheduling while maintaining a complexity comparable to that of list scheduling.
    2025,36(9):3954-3969, DOI: 10.13328/j.cnki.jos.007358, CSTR: 32375.14.jos.007358
    [Abstract] (290) [HTML] (52) [PDF 6.76 K] (755)
    Abstract:
    Cache simulators are indispensable tools for exploring cache architectures and researching cache side channels. Spike, the standard implementation of the RISC-V instruction set, offers a comprehensive environment for RISC-V-based cache research. However, its cache model suffers from limitations, such as low simulation granularity and notable discrepancies with the cache structures of real processors. To address these limitations, this study introduces flexible cache architectural simulator (FlexiCAS), a modified and extended version of Spike’s cache model. The modified simulator, referred to as Spike-FlexiCAS, supports a wide range of cache architectures with flexible configuration and easy extensibility. It enables arbitrary combinations of cache features, including coherence protocols and implementation methods. In addition, FlexiCAS can simulate cache behavior independently of Spike. The performance evaluations demonstrate that FlexiCAS significantly outperforms the cache model of ZSim, the fastest execution-driven simulator available.
    2025,36(9):3970-3984, DOI: 10.13328/j.cnki.jos.007359, CSTR: 32375.14.jos.007359
    [Abstract] (224) [HTML] (40) [PDF 6.72 K] (528)
    Abstract:
    Memory virtualization, a core component of virtualization technology, directly impacts the overall performance of virtual machines. Current memory virtualization approaches often involve a tradeoff between the overhead of two-dimensional address translation and page table synchronization. Traditional shadow paging employs an additional software-maintained page table to achieve address translation performance comparable to native systems. However, synchronization of shadow page tables relies on write protection, frequently causing VM-exits that significantly degrade system performance. In contrast, the nested paging approach leverages hardware-assisted virtualization, allowing the guest page table and nested page table to be directly loaded into the MMU. While this eliminates page table synchronization, the two-dimensional page table traversal will seriously degrade the address translation performance. Two-dimensional page table traversal incurs substantial performance penalties for address translation due to privilege overhead. This study proposes lazy shadow paging (LSP), which reduces page table synchronization overhead while retaining the high efficiency of shadow page tables. Leveraging the privilege model and hardware features of the RISC-V architecture, LSP analyzes the access patterns of guest OS page tables and binds synchronization with translation lookaside buffer (TLB) flushes, reducing the software overhead associated with page table updates by deferring costs until the first access to a relevant page to minimize VM-exits. In addition, it introduces a fast path for handling VM-exits, exploiting the fine-grained TLB interception and privilege-level features of RISC-V to further optimize performance. Experimental results demonstrate that under the baseline RISC-V architecture, LSP reduces VM-exits by up to 50% compared to traditional shadow paging in micro-benchmark tests. For typical applications in the SPEC2006 benchmark suite, LSP reduces VM-exits by up to 25% compared to traditional shadow paging and decreases memory accesses per TLB miss by 12 compared to nested paging.
    2025,36(9):3985-4005, DOI: 10.13328/j.cnki.jos.007360, CSTR: 32375.14.jos.007360
    [Abstract] (539) [HTML] (60) [PDF 6.75 K] (611)
    Abstract:
    The performance acceleration of high-performance libraries on CPUs can be achieved by leveraging SIMD hardware through vectorization. Implementing vectorization requires programming methods tailored to the target SIMD hardware, which vary significantly across different SIMD extensions. To avoid redundant implementations of algorithm optimizations on various platforms and enhance the maintainability of algorithm libraries, a hardware abstraction layer (HAL) is often introduced. However, most existing HAL designs are based on fixed-length vector registers, aligning with the fixed-length nature of conventional SIMD extension instruction sets. This design fails to accommodate the variable-length vector register introduced by the RISC-V vector extension. Treating RISC-V vector extensions as fixed-length vectors within traditional HAL designs results in unnecessary overhead and performance degradation. To address this problem, the study proposes a HAL design method compatible with both variable-length vector extensions and fixed-length SIMD extensions. Using this approach, the universal intrinsic functions in the OpenCV library are redesigned and optimized to better support RISC-V vector extension devices while maintaining compatibility with existing SIMD platforms. Performance comparisons between the optimized and original OpenCV libraries reveal that the redesigned universal intrinsic function efficiently integrates RISC-V vector extensions into the HAL optimization framework, achieving a 3.93× performance improvement in core modules. These results validate the effectiveness of the proposed method, significantly enhancing the execution performance of high-performance libraries on RISC-V devices. In addition, the proposed approach has been open-sourced and integrated into the OpenCV repository, demonstrating its practicality and application value.
    2025,36(9):4006-4035, DOI: 10.13328/j.cnki.jos.007222, CSTR: 32375.14.jos.007222
    [Abstract] (198) [HTML] (36) [PDF 6.74 K] (1699)
    Abstract:
    Smart contracts are scripts running on the Ethereum blockchain capable of handling intricate business logic with most written in the Solidity. As security concerns surrounding smart contracts intensify, a formal verification method employing the modeling, simulation, and verification language (MSVL) alongside propositional projection temporal logic (PPTL) is proposed. A SOL2M converter is developed, facilitating semi-automatic modeling from the Solidity to MSVL programs. However, the proof of operational semantic equivalence of Solidity and MSVL is lacking. This study initially defines Solidity’s operational semantics using big-step semantics across four levels: semantic elements, evaluation rules, expressions, and statements. Subsequently, it establishes equivalence relations between states, expressions, and statements in Solidity and MSVL. In addition, leveraging the operational semantics of both languages, it employs structural induction to prove expression equivalence and rule induction to establish statement equivalence.
    2025,36(9):4036-4055, DOI: 10.13328/j.cnki.jos.007267, CSTR: 32375.14.jos.007267
    [Abstract] (214) [HTML] (16) [PDF 6.75 K] (663)
    Abstract:
    Bug triaging is the process of assigning bug reports to developers suitable for resolving the reported bugs, ensuring timely fixes. Current research in bug triaging mainly focuses on the text classification of bug reports. However, according to the Pareto principle, the data distribution of bug reports used for classification is unbalanced, which may lead to ineffective triaging for inactive developers. Additionally, existing classification models often neglect to model developers and struggle to capture the correlations between bugs and developers, affecting the efficiency of bug triaging. To address these issues, this study proposes a collaborative bug triaging method based on multimodal fusion (CBT-MF). This method first preprocesses bug reports and constructs a bug-developer bipartite graph. To mitigate the impact of the unbalanced distribution of bug fix records, the bipartite graph data is enhanced using K-means clustering and positive-negative sampling. To represent developer information, node features are extracted from the bipartite graph using a graph convolutional network model. Finally, correlations between bugs and developers are captured by matching inner products, and Bayesian personalized ranking (BPR) is utilized for bug report recommendation and triaging. Comprehensive experiments conducted on publicly available datasets demonstrate that CBT-MF outperforms several state-of-the-art methods in bug triaging.
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    Available online:  September 03, 2025 , DOI: 10.13328/j.cnki.jos.007416
    Abstract:
    Existing deep learning-based point cloud registration methods primarily focus on feature extraction and feature matching. However, the exploration of local and global graph structures during the feature extraction stage remains insufficient, and the investigation of difference information during the matching process is also limited. To address these issues, this study proposes a point cloud registration method based on local-global dynamic graph learning and complementary fusion. Specifically, the dynamic offset-based local graph learning module characterizes the underlying graph structure in the feature space by constructing proxy points that contain both geometric and semantic information, leading to more discriminative local features. In addition, a dynamic attention-based global graph learning module is designed, which adaptively adjusts attention weights based on the relationships between points, effectively capturing long-range dependencies in the point cloud. To further enhance the correspondence between the two point clouds, the attention-driven complementary fusion module utilizes the cross-attention mechanism to extract similar and distinctive information, while applying the self-attention mechanism to refine the relationships between features. Experimental results demonstrate that the proposed method achieves optimal registration performance on public datasets while maintaining acceptable computational efficiency.
    Available online:  September 03, 2025 , DOI: 10.13328/j.cnki.jos.007450
    Abstract:
    With the rapid development of technologies such as deep learning and significant breakthroughs in areas including computer hardware and cloud computing, increasingly mature artificial intelligence (AI) technologies are being applied to software systems across various fields. Software systems that incorporate AI models as core components are collectively referred to as intelligence software systems. Based on the application fields of AI technologies, these systems are categorized into image processing, natural language processing, speech processing, and other applications. Unlike traditional software systems, AI models adopt a data-driven programming paradigm in which all decision logic is learned from large-scale datasets. This paradigm shift renders traditional code-based test case generation methods ineffective for evaluating the quality of intelligence software systems. As a result, numerous testing methods tailored for intelligence software systems have been proposed in recent years, including novel approaches for test case generation and evaluation that address the unique characteristics of such systems. This study reviews 80 relevant publications, classifies existing methods according to the types of systems they target, and systematically summarizes test case generation methods for image processing, natural language processing, speech processing, point cloud processing, multimodal data processing, and deep learning models. Potential future directions for test case generation in intelligence software systems are also discussed to provide a reference for researchers in this field.
    Available online:  September 03, 2025 , DOI: 10.13328/j.cnki.jos.007459
    Abstract:
    Resource public key infrastructure (RPKI) is a key technology for enhancing border gateway protocol (BGP) security, using cryptographic verification to prevent attacks such as prefix hijacking. Since its formal deployment in 2012, RPKI has grown to cover over half of Internet prefixes. Ongoing research on RPKI deployment helps to provide insights into current trends and identify security issues. This study reviews existing works on RPKI measurement from three perspectives: RPKI data object measurement, ROV measurement, and RPKI infrastructure measurement. It analyzes RPKI data object and ROV coverage metrics, deployment trends, and the effectiveness of different measurement approaches. Moreover, key security vulnerabilities and data quality issues are identified, and recommendations to promote large-scale RPKI deployment are proposed.
    Available online:  September 02, 2025 , DOI: 10.13328/j.cnki.jos.007504
    Abstract:
    As an emerging technique in software engineering, automatic source code summarization aims to generate natural language descriptions for given code snippets. State-of-the-art code summarization techniques utilize encoder-decoder neural models; the encoder extracts the semantic representations of the source code, while the decoder translates them into human-readable code summary. However, many existing approaches treat input code snippets as standalone functions, often overlooking the context dependencies between the target function and its invoked subfunctions. Ignoring these dependencies can result in the omission of crucial semantic information, potentially reducing the quality of the generated summary. To this end, in this paper, we introduce DHCS, a dependency-aware hierarchical code summarization neural model. DHCS is designed to improve code summarization by explicitly modeling the hierarchical dependencies between the target function and its subfunctions. Our approach employs a hierarchical encoder consisting of both a subfunction encoder and a target function encoder, allowing us to capture both local and contextual semantic representations effectively. Meanwhile, we introduce a self-supervised task, namely the masked subfunction prediction, to enhance the representation learning of subfunctions. Furthermore, we propose to mine the topic distribution of subfunctions and incorporate them into a summary decoder with a topic-aware copy mechanism. Therefore, it enables the direct extraction of key information from subfunctions, facilitating more effective summary generation for the target function. Finally, we have conducted extensive experiments on three real-world datasets constructed for Python, Java and Go languages, which clearly validate the effectiveness of our approach.
    Available online:  September 02, 2025 , DOI: 10.13328/j.cnki.jos.007506
    Abstract:
    The advent of the big data era has introduced massive data applications characterized by four defining attributes—Volume, Variety, Velocity, and Value (4V)—posing revolutionary challenges to conventional data acquisition methods, management strategies, and database processing capabilities. Recent breakthroughs in artificial intelligence (AI), particularly in machine learning and deep learning, have demonstrated remarkable advancements in representation learning, computational efficiency, and model interpretability, thereby offering innovative solutions to these challenges. This convergence of AI and database systems has given rise to a new generation of intelligent database management systems, which integrate AI technologies across three core architectural layers: (1) natural language interfaces for user interaction, (2) automated database administration frameworks (including parameter tuning, index recommendation, diagnostics, and workload management), and (3) machine learning-based high-performance components (such as learned indexes, adaptive partitioning, query optimization, and scheduling). Furthermore, new intelligent component application programming interfaces (APIs) have lowered the integration barrier between AI and database systems. This work systematically investigates intelligent databases through an innovative standardization-centric framework, delineating common processing paradigms across core research themes—interaction paradigms, management architectures, and kernel design. By examining standardized processes, interfaces, and collaboration mechanisms, it uncovers the core logic enabling database self-optimization, synthesizes current research advancements, and critically assesses persistent technical challenges and prospects for future development.
    Available online:  September 02, 2025 , DOI: 10.13328/j.cnki.jos.007509
    Abstract:
    The CDCL algorithm for SAT solving is widely used in the field of hardware and software verification, with restart being one of its core components. Currently, mainstream CDCL solvers often employ the "warm restart" technique, which retains key search information such as variable order, assignment preferences, and learnt clauses, and has a very high restart frequency. The warm restart technique tends to make CDCL solvers more inclined to visit the search space that was explored before restarts, which may lead to being trapped in an unfavorable local search space for a long time, lacking exploration for another regions. This paper first tests the existing CDCL algorithms and confirms that under different initial search settings, the runtime for an instance of mainstream CDCL solvers exhibits significant fluctuations. To leverage this observation, the paper proposes the "cold restart" technique that forgets search information, specifically by periodically forgetting variable order, assignment preferences, and learned clauses. Experimental results demonstrate that this technique can effectively improve mainstream CDCL algorithms. Additionally, the paper further extends its parallel version, where each thread explores different search spaces, enhancing the performance of the parallel algorithm. Moreover, the cold restart technique primarily improves the performance of sequential and parallel solvers for the solving ability on satisfiable instances, providing new insights for designing satisfiable-oriented solvers. Specifically, our parallel cold restart techniques can improve 41.84% of the PAR2 score of Pakis on satisfiable instances. The parallel SAT solvers named ParKissat-RS including the ideas in this paper won the parallel track of SAT competitions with a significantly margin of 24% faster.
    Available online:  August 27, 2025 , DOI: 10.13328/j.cnki.jos.007452
    Abstract:
    Blockchain, as a distributed ledger technology, ensures data security, transparency, and immutability through encryption and consensus mechanisms, offering transformative solutions across various industries. In China, blockchain-based software has attracted widespread attention and application, demonstrating considerable potential in fields such as cross-border payments, supply chain finance, and government services. These applications not only enhance the efficiency and transparency of business processes but also reduce trust costs and offer new approaches for the digital transformation of traditional industries. This study investigates the development trends and core technologies of Chinese blockchain software, focusing on key technological breakthroughs, promoting integration and innovation, and providing a foundation for the formulation of technical standards. The aim is to enhance the competitiveness of Chinese blockchain technologies, broaden application scenarios, and support the standardized development of the industry. Three core research questions are addressed: (1) What are the development trends of Chinese blockchain software? (2) What are the core technologies involved? (3) What are the differences in core technologies between Chinese and foreign blockchain software? To address these questions, 1268 blockchain software entries have been collected through three channels. Based on information regarding affiliated companies and chief technology officers (CTOs), 103 Chinese blockchain software entries are identified. A statistical analysis of basic software attributes is conducted, examining development trends from three perspectives: software development history, distribution, and interrelationships. Given the importance of technical and development documentation, 39 high-quality blockchain software entries containing detailed technical information are further selected. Subsequently, a statistical and analytical evaluation of the core technologies of these 39 software systems is conducted across six technical layers of blockchain architecture. Based on this analysis, differences in core technologies between Chinese and foreign blockchain software are compared. In total, 28 phenomena and 13 insights are identified. These findings provide researchers, developers, and practitioners with a comprehensive understanding of the current state of Chinese blockchain development and offer valuable references for future adoption and improvement of Chinese blockchain software.
    Available online:  August 27, 2025 , DOI: 10.13328/j.cnki.jos.007412
    Abstract:
    Semi-supervised semantic segmentation methods typically employ various data augmentation schemes to ensure differentiation in the input of network branches, enabling mutual self-supervision. While successful, this approach faces several issues: 1) insufficient diversity in feature extraction leads to feature signal assimilation during inference; 2) inadequate diversity in supervision signals results in the assimilation of loss learning. These issues cause network branches to converge on similar solutions, degrading the functionality of multi-branch networks. To address these issues, a cross semi-supervised semantic segmentation method based on differential feature extraction is proposed. First, a differential feature extraction strategy is employed, ensuring that branches focus on distinct information, such as texture, semantics, and shapes, thus reducing reliance on data augmentation. Second, a cross-fusion pseudo-labeling method is introduced, where branches alternately generate neighboring pixel fusion pseudo-labels, enhancing the diversity of supervision signals and guiding branches toward different solutions. Experimental results demonstrate this method achieves excellent performance on the Pascal VOC 2012 and Cityscapes validation datasets, with scores of 80.2% and 76.8%, outperforming the latest methods by 0.3% and 1.3%, respectively.
    Available online:  August 27, 2025 , DOI: 10.13328/j.cnki.jos.007414
    Abstract:
    GUI testing is one of the most important measures to ensure mobile application (App) quality. With the continuous development of the mobile ecosystem, especially the strong rise of the domestic mobile ecosystem, e.g., HarmonyOS, GUI test script recording and replay has become one of the prominent challenges in GUI testing. GUI test scripts must be migrated from traditional mobile platforms to emerging mobile platforms to ensure the reliability of App quality and consistency in user experience across diverse platforms. However, differences in underlying implementations across platforms have created substantial obstacles to the cross-platform migration of mobile App test scripts. This challenge is particularly pronounced in the testing migration for emerging domestic mobile ecosystem platforms. Cross-platform test script recording and replay is essential for maintaining consistency and a high-quality user experience across different platforms and devices. Current state-of-the-art approaches only address the “one-to-one” test event matching situations. However, due to inconsistencies in development practices across platforms, the replay of test events does not always map “one-to-one”; instead, “multiple-to-multiple” mapping situations are common. This means that some test events need to be mapped to a different number of test events to fulfill the same business logic. To address these issues and challenges, this study proposes a cross-platform mobile App test script recording and replay method based on large language model semantic matching (LLMRR). The LLMRR method integrates image matching, text matching, and large language model semantic matching technologies. During the recording phase, user operation information is captured using image segmentation algorithms and saved as recorded test scripts. During the replay phase, corresponding widgets on the replay App page are located using image matching and text matching modules to execute operations. When matching fails, the large language model semantic matching module is invoked for semantic matching, ensuring efficient operation across different platforms. This study presents the first exploration of testing for domestic HarmonyOS Apps, using 20 Apps and a total of 100 test scripts for migration testing across iOS, Android, and HarmonyOS platforms. The effectiveness of the LLMRR method is compared with the current state-of-the-art cross-platform test script recording and replay approaches, LIRAT and MAPIT. The results demonstrate that the LLMRR method exhibits significant advantages in test script recording and replay.
    Available online:  August 20, 2025 , DOI: 10.13328/j.cnki.jos.007446
    Abstract:
    Smart contracts, as automatically executed computer transaction protocols, are widely applied in blockchain networks to implement various types of business logic. However, the strict immutability of blockchain poses significant challenges for smart contract maintenance, making upgradeability a prominent research topic. This study focuses on upgradeable smart contracts, systematically reviewing their development status both domestically and internationally, and introducing seven mainstream upgradeable contract models. The research is summarized from four key perspectives: upgradeable smart contracts, application requirements, upgrade frameworks, and security oversight. It covers multiple stages, including design, implementation, testing, deployment, and maintenance. The goal is to provide insights and references for the further development of blockchain applications.
    Available online:  August 20, 2025 , DOI: 10.13328/j.cnki.jos.007448
    Abstract:
    This study investigates meet-in-the-middle attacks on three types of unbalanced generalized Feistel structures and conducts quantum meet-in-the-middle attacks in Q1 model. First, for the 3-branch Type-III generalized Feistel structure, a 4-round meet-in-the-middle distinguisher is constructed using multiset and differential enumeration techniques. By expanding one round forward and one round backward, a 6-round meet-in-the-middle attack is conducted. With the help of Grover’s algorithm and the quantum claw finding algorithm, a 6-round quantum key recovery attack is performed, requiring O(23?/2·?) quantum queries, where ? is the branch length of the generalized Feistel structure. Then, for the 3-branch Type-I structure, a 9-round distinguisher is similarly extended by one round in both directions to conduct an 11-round meet-in-the-middle attack and a quantum key recovery attack with time complexities of O(22?) 11-round encryptions and O(23?/2·?) quantum queries. Finally, taking the 3-cell generalized Feistel structure as a representative case, this study explores a quantum meet-in-the-middle attack on an n-cell structure. A 2n-round meet-in-the-middle distinguisher is constructed, enabling a 2(n+1)-round meet-in-the-middle attack and quantum key recovery attack. The associated time complexities are O(22?) 2(n+1)-round encryptions and O(23?/2·?) quantum queries. The results demonstrate that the time complexity in Q1 model is significantly reduced compared with classical scenarios.
    Available online:  August 20, 2025 , DOI: 10.13328/j.cnki.jos.007449
    Abstract:
    Query optimization is a critical component in database systems, where execution costs are minimized by identifying the most efficient query execution plan. Traditional query optimizers typically rely on fixed rules or simple heuristic algorithms to refine or select candidate plans. However, with the growing complexity of relational schemas and queries in real-world applications, such optimizers struggle to meet the demands of modern applications. Learned query optimization algorithms integrate machine learning techniques into the optimization process. They capture features of query plans and complex schemas to assist traditional optimizers. These algorithms offer innovative and effective solutions in areas such as cost modeling, join optimization, plan generation, and query rewriting. This study reviews recent achievements and developments in four main categories of learned query optimization algorithms. Future research directions are also discussed, aiming to provide a comprehensive understanding of the current state of research and to support further investigation in this field.
    Available online:  August 20, 2025 , DOI: 10.13328/j.cnki.jos.007424
    Abstract:
    Accurate workload forecasting is essential for effective cloud resource management. However, existing models typically employ fixed architectures to extract sequential features from different perspectives, which limits the flexibility of combining various model structures to further improve forecasting performance. To address this limitation, a novel ensemble framework SAC-MWF is proposed based on the soft actor-critic (SAC) algorithm for multi-view workload forecasting. A set of feature sequence construction methods is developed to generate multi-view feature sequences at low computational cost from historical windows, enabling the model to focus on workload patterns from different perspectives. Subsequently, a base prediction model and several feature prediction models are trained on historical windows and their corresponding feature sequences, respectively, to capture workload dynamics from different views. Finally, the SAC algorithm is employed to integrate these models to generate the final forecast. Experimental results on three datasets demonstrate that SAC-MWF performs excellently in terms of effectiveness and computational efficiency.
    Available online:  August 20, 2025 , DOI: 10.13328/j.cnki.jos.007425
    Abstract:
    In recent years, pre-trained models that take code as input have achieved significant performance gains in various critical code-based tasks. However, these models remain susceptible to adversarial attacks implemented through semantic-preserving code transformations, which can severely compromise model robustness and pose serious security issues. Although adversarial training, leveraging adversarial examples as augmented data, has been employed to enhance robustness, its effectiveness and efficiency often fall short when facing unseen attacks with varying granularities and strategies. To address these limitations, a novel adversarial defense technique based on code normalization, named CoDefense, is proposed. This method integrates a multi-granularity code normalization approach as a preprocessing module, which normalizes both the original training data during training and the inputcode during inference. By doing so, the proposed method mitigates the impact of potential adversarial examples and effectively defends against attacks of diverse types and granularities. To evaluate the effectiveness and efficiency of CoDefense, a comprehensive experimental study is constructed, encompassing 27 scenarios across three representative adversarial attack methods, three widely-used pre-trained code models, and three code-based classification and generation tasks. Experimental results demonstrate that CoDefense significantly outperforms state-of-the-art adversarial training methods in both robustness and efficiency. Specifically, it achieves an average defense success rate of 95.33% against adversarial attacks and improves time efficiency by an average of 85.86%.
    Available online:  August 13, 2025 , DOI: 10.13328/j.cnki.jos.007444
    Abstract:
    The evolution of RFID-based passive Internet of Things (IoT) systems comprises three stages: traditional UHF RFID (also referred to as standalone or Passive 1.0), local area network-based coverage (networked or Passive 2.0), and wide-area cellular coverage (cellular or Passive 3.0). Wireless sensing in passive IoT is characterized by zero power consumption, low cost, and ease of deployment, enabling object tagging and close-proximity sensing. With the emergence of cellular passive IoT, passive IoT wireless sensing is playing an increasingly important role in enabling ubiquitous sensing within IoT systems. This study first introduces the concept and development path of passive IoT. Based on fundamental sensing principles, recent research advancements are reviewed across four representative objectives: localization and tracking, object status detection, human behavior recognition, and vital sign monitoring. Given that most existing research relies on commercial UHF RFID devices to extract signal features for data processing, the development direction of passive IoT wireless sensing technology is further examined from the perspectives of new architecture, new air interface, and new capabilities. Moreover, this study offers reflections on the integration of communication and sensing in the design of next-generation air interfaces from a sensing-oriented perspective, aiming to provide new insights into the advancements in passive IoT wireless sensing technologies.
    Available online:  August 13, 2025 , DOI: 10.13328/j.cnki.jos.007445
    Abstract:
    As concerns over data privacy continue to grow, secure multi-party computation (MPC) has gained considerable research attention due to its ability to protect sensitive information. However, the communication and memory demands of MPC protocols limit their performance in privacy-preserving machine learning (PPML). Reducing interaction rounds and memory overhead in secure computation protocols remains both essential and challenging, particularly in GPU-accelerated environments. This study focuses on the design and implementation of GPU-friendly protocols for linear and nonlinear computations. To eliminate overhead associated with integer operations, 64-bit integer matrix multiplication, and convolution are implemented using CUDA extensions in PyTorch. A most significant bit (MSB) extraction protocol with low communication rounds is proposed, based on 0-1 encoding. In addition, a low-communication-complexity hybrid multiplication protocol is introduced to reduce the communication overhead of secure comparison, enabling efficient computation of ReLU activation layers. Finally, Antelope, a GPU-based 3-party framework, is proposed to support efficient privacy-preserving machine learning. This framework significantly reduces the performance gap between secure and plaintext computation and supports end-to-end training of deep neural networks. Experimental results demonstrate that the proposed framework achieves 29×–101× speedup in training and 1.6×–35× in inference compared to the widely used CPU-based FALCON (PoPETs 2020). When compared with GPU-based approaches, training performance reaches 2.5×–3× that of CryptGPU (S&P 2021) and 1.2×–1.6× that of Piranha (USENIX Security 2022), while inference is accelerated by factors of 11× and 2.8×, respectively. Notably, the proposed secure comparison protocol exhibits significant advantages when processing small input sizes.
    Available online:  August 01, 2025 , DOI: 10.13328/j.cnki.jos.007398
    Abstract:
    Since the currently popular deep learning models are often influenced by the notorious phenomenon known as distribution shift, domain adaptation has been proposed to enhance the generalization of these models, transferring knowledge from labeled source data to unlabeled target data. Existing methods for domain adaptation primarily focus on computer vision tasks, leading to the application of models devised for image data to time series data to address the domain adaptation problem for time series data. Although these methods mitigate distribution shift to some extent, they struggle to effectively extract disentangled domain-invariant representations for time series data, resulting in suboptimal performance. To address this issue, a disentangled invariant and variant latent variable model for time series domain adaptation (DIVV) is proposed. Specifically, a causal generation process for time series data is introduced, where the latent variables are partitioned into domain-specific and domain-invariant latent variables. Based on this data generation process, the identifiability of domain-specific latent variables is established. The DIVV model, built on this identification theory, disentangles domain-specific and domain-invariant latent variables using variational influence and an orthogonal basis alignment module. Finally, the DIVV model leverages domain-invariant representations for time series classification. Experimental results demonstrate that the DIVV model outperforms existing domain adaptation methods for time series data across various benchmark datasets, highlighting its effectiveness in real-world applications.
    Available online:  July 30, 2025 , DOI: 10.13328/j.cnki.jos.007436
    Abstract:
    Test case prioritization (TCP) has gained significant attention due to its potential to reduce testing costs. Greedy algorithms based on various prioritization strategies are commonly used in TCP. However, most existing greedy algorithm-based TCP techniques rely on a single prioritization strategy and process all test cases simultaneously during each iteration, without considering the relationships between test cases. This results in excessive computational overhead when handling coverage information and performing prioritization, thus reducing overall efficiency. Among single-strategy approaches, the Additional strategy has been extensively studied but remains highly sensitive to random factors. When a tie occurs, test cases are typically selected at random, compromising prioritization effectiveness. To address these issues, a test case prioritization approach based on two-phase grouping (TPG-TCP) is proposed. In the first phase, coarse-grained grouping is conducted by mining hidden relationships among test cases, thus dividing them into a key group and an ordinary group. This lays the groundwork for applying diversity-based strategies in the next phase to enhance prioritization efficiency. In the second phase, fine-grained prioritization of test cases is performed. Key test cases are further subdivided based on the number of iterations. To mitigate the randomness inherent in the Additional strategy, a TP-Additional strategy based on test case potency is introduced to prioritize a portion of the key test cases. Meanwhile, a simple and efficient Total strategy is applied to prioritize the ordinary test cases and remaining key test cases. The results from the Total strategy are appended to those produced by the TP-Additional strategy. This method improves both the effectiveness and efficiency of test case prioritization. Experimental results on six datasets, compared with eight existing methods, demonstrate that the proposed method achieves average improvements of 1.29% in APFD and 9.54% in TETC.
    Available online:  July 30, 2025 , DOI: 10.13328/j.cnki.jos.007437
    Abstract:
    With the rapid development of lattice-based post-quantum cryptography, algorithms for hard problems in lattices have become an essential tool for evaluating the security of post-quantum cryptographic schemes. Algorithms such as enumeration, sieve, and lattice basis reduction have been developed under the classical computing model, while quantum algorithms for solving hard problems in lattices, such as quantum sieve and quantum enumeration, are gradually attracting attention. Although lattice problems possess post-quantum properties, techniques such as quantum search can accelerate a range of lattice algorithms. Given the challenges involved in solving hard problems in lattices, this study first summarizes and analyzes the research status of quantum algorithms for such problems and organizes their design principles. Then, the quantum computing techniques applied in these algorithms are introduced, followed by an analysis and comparison of their computational complexities. Finally, potential future developments and research directions for quantum algorithms addressing Lattice-based hard problems are discussed.
    Available online:  July 30, 2025 , DOI: 10.13328/j.cnki.jos.007422
    Abstract:
    Segment routing over IPv6 (SRv6), as a key enabling technology for the next-generation network architecture, introduces a flexible segment routing forwarding plane, offering revolutionary opportunities to enhance network intelligence and expand service capabilities. This study aims to provide a comprehensive review of the evolution and research status of SRv6 in recent years. First, the study systematically summarizes the applications of SRv6 in network architecture and performance, network management and operation, and emerging service support, highlighting the unique advantages of SRv6 in fine-grained scheduling, flexible programming, and service convergence. Meanwhile, the study deeply analyzes the key challenges SRv6 faces in performance and efficiency, reliability and security, and deployment and evolution strategies, and focuses on discussing the current mainstream solutions and development trends. Finally, from the perspectives of industrial ecosystem construction, artificial intelligence integration, and industry convergence innovation, the study provides forward-looking thoughts and prospects on the future development directions and challenges of SRv6. The research findings of this study will provide theoretical references and practical guidance for operators in building open, intelligent, and secure next-generation networks.
    Available online:  July 30, 2025 , DOI: 10.13328/j.cnki.jos.007423
    Abstract:
    With the development of information technology, the interaction between information networks, human society, and physical space deepens, and the phenomenon of information space risk overflow becomes more severe. Fraudulent incidents have sharply increased, making fraud detection an important research field. Fraudulent behavior has brought numerous negative impacts to society, gradually presenting emerging characteristics such as intelligence, industrialization, and high concealment. Traditional expert rules and deep graph neural network algorithms are becoming increasingly limited in addressing fraudulent activities. Current fraud detection methods often rely on local information from the nodes themselves and neighboring nodes, either focusing on individual users, analyzing the relationship between nodes and graph topology, or utilizing graph embedding technology to learn node representations. Although these approaches offer certain fraud detection capabilities, they overlook the crucial role of long-range association patterns of entities and fail to explore common patterns among massive fraudulent paths, limiting comprehensive fraud detection capabilities. In response to the limitations of existing fraud detection methods, this study proposes a graph fraud detection model called path aggregation graph neural network (PA-GNN), based on path aggregation. The model includes variable-length path sampling, position-related unified path encoding, path interaction and aggregation, and aggregation-related fraud detection. Several paths originating from a node interact globally and compare their similarities, extracting common patterns among fraudulent paths, thus more comprehensively revealing the association patterns between fraudulent behaviors, and achieving fraud detection through path aggregation. Experimental results across multiple datasets in fraud scenarios, including financial transactions, social networks, and review networks, show that the area under the curve (AUC) and average precision (AP) metrics of the proposed method have significantly improved compared to the optimal benchmark models. In addition, the proposed method uncovers potential common fraudulent path patterns for fraud detection tasks, driving nodes to learn these important patterns and obtain more expressive representations, which offers a certain level of interpretability.
    Available online:  July 30, 2025 , DOI: 10.13328/j.cnki.jos.007434
    Abstract:
    The (t, N) threshold multi-party private set intersection (TMP-PSI) protocol allows a given party’s data element x to appear in the private sets of no fewer than t–1 other parties. The data element x is then output as the intersection result, which is widely applied in scenarios such as proposal voting, financial transaction threat identification, and security assessment. Existing threshold multi-party private set intersection protocols suffer from low efficiency, high communication rounds, and a limitation that only a specific participant can obtain the intersection. To address these issues, this study proposes a threshold testing method based on robust secret sharing (RSS) and a TMP-PSI scheme combined with oblivious key-value store (OKVS), which effectively reduces both computational overhead and the number of communication rounds. To meet the demand for multiple participants to access the intersection information from their private sets, this study also proposes a second extended threshold multi-party private set intersection (ETMP-PSI) protocol, which modifies the share distribution method. Compared to the first scheme, the secret distributor and secret reconstructor do not incur additional communication rounds or computational complexity, allowing multiple participants to obtain the intersection elements from their private sets. The proposed protocol runs in 6.4 seconds (TMP-PSI) and 8.7 seconds (ETMP-PSI) in a three-party scenario with a dataset size of n=216. Compared to existing threshold multi-party private set intersection protocols, the communication complexity between the reconstructor and distributor is reduced from O(nNtlog) to O(bNλ).
    Available online:  July 23, 2025 , DOI: 10.13328/j.cnki.jos.007432
    Abstract:
    Edge servers provide low-latency, high-performance services for mobile intelligent applications. However, due to significant fluctuations in the load on edge servers over time, many edge servers remain idle during periods of low load, and their computational resources are not fully utilized. In contrast to the underutilization of edge servers, computing resources in cloud computing clusters remain relatively scarce for deep learning training tasks as artificial intelligence becomes more widely applied in daily life. Existing cluster scheduling strategies fail to efficiently utilize idle computing resources outside of cloud computing clusters. Effectively utilizing these idle resources can alleviate the resource constraints in cloud computing clusters, thus enabling more deadline-sensitive deep learning training tasks to be completed before their deadlines. To address this issue, this study proposes a cluster scheduling strategy for deadline-sensitive deep learning training tasks, which coordinates the scheduling of cloud computing resources and idle edge computing resources. This strategy fully leverages the performance characteristics of different deep learning tasks and the availability of idle edge server devices, allowing more deadline-sensitive tasks to be completed on time. Simulation results demonstrate that the cloud-edge collaborative scheduling method outperforms other benchmark methods in improving the deadline satisfaction ratio and effectively utilizes idle edge server devices.
    Available online:  July 23, 2025 , DOI: 10.13328/j.cnki.jos.007433
    Abstract:
    To perform fine-grained vulnerability detection, an ideal model must determine whether software contains vulnerabilities and identify the type of vulnerability (i.e., perform vulnerability classification). A series of deep learning models have demonstrated strong overall performance in vulnerability classification tasks. However, a severe data imbalance exists across different vulnerability types. Many vulnerability types are represented by only a small number of samples (referred to as few-shot types in this study), resulting in poor classification performance and generalization for these few-shot types. To enhance classification performance for these types, VulFewShot is proposed. This contrastive learning-based vulnerability classification framework assigns more weight to few-shot types by bringing samples of the same type closer together while keeping samples from different types further apart. Experimental results show that VulFewShot improves classification performance across all vulnerability types. The smaller the number of samples for a given type, the more significant the improvement. Therefore, VulFewShot improves classification performance for vulnerabilities with limited samples and mitigates the impact of sample size on the learning process.
    Available online:  July 23, 2025 , DOI: 10.13328/j.cnki.jos.007421
    Abstract:
    Intelligent question answering (QA) system utilizes information retrieval and natural language processing techniques to deliver automated responses to user inquiries. Like other artificial intelligence software, intelligent QA system is prone to bugs. These bugs can degrade user experience, cause financial losses, or even trigger social panic. Therefore, it is crucial to detect and fix bugs in intelligent QA system promptly. Automated testing approaches fall into two categories. The first approach synthesizes hypothetical facts based on questions and predicted answers, then generates new questions and expected answers to detect bugs. The second approach generates semantically equivalent test inputs by injecting knowledge from existing datasets, ensuring the answer to the question remains unchanged. However, both methods have limitations in practical use. They rely heavily on the intelligent QA system’s output or training set, which results in poor testing effectiveness and generalization, especially for large-language-model-based intelligent QA systems. Moreover, these methods primarily assess semantic understanding while neglecting the logical reasoning capabilities of intelligent QA system. To address this gap, a logic-guided testing technique named QALT is proposed. It designs three logically related metamorphic relations and uses semantic similarity measurement and dependency parsing to generate high-quality test cases. The experimental results show that QALT detected a total of 9247 bugs in two different intelligent QA systems, which is 3150 and 3897 more bugs than the two current state-of-the-art techniques (i.e., QAQA and QAAskeR), respectively. Based on the statistical analysis of manually labeled results, QALT detects approximately 8073 true bugs, which is 2142 more than QAQA and 4867 more than QAAskeR. Moreover, the test inputs generated by QALT successfully reduce the MR violation rate from 22.33% to 14.37% when used for fine-tuning the intelligent QA system under test.
    Available online:  July 17, 2025 , DOI: 10.13328/j.cnki.jos.007405
    Abstract:
    The tuning of database system parameters directly impacts its performance and the utilization of system resources. Relational database management systems typically offer hundreds of parameters that can be adjusted to achieve optimal performance and service capabilities. Database system performance optimization is traditionally carried out manually by experienced database administrators (DBAs). However, due to the characteristics of parameter tuning, such as the large number of parameters, their heterogeneity, and the complex correlations among them, traditional manual methods are inefficient, costly, and lack reusability. To enhance the efficiency of database system performance optimization, automated parameter tuning techniques have become a key focus in the database field. Reinforcement learning, with its ability to interact with the system environment and gradually improve through feedback, has been widely applied in the optimization of complex systems. Some related studies have applied reinforcement learning or its variants to database parameter tuning, but they have relied on single-objective optimization methods. Database system parameter tuning is a multi-objective optimization task, usually performed under resource constraints. Therefore, existing methods have several limitations: (1) transforming the multi-objective optimization problem into a single-objective optimization problem through simple linear transformations requires iterative attempts, making optimizations costly; (2) existing methods cannot adapt to the dynamic changes in database system requirements, limiting their adaptability; (3) reinforcement learning methods used in existing studies are designed for single-objective optimization, and their applications to multi-objective tasks make it difficult to effectively align preferences (the weight coefficients of current objectives) with corresponding optimal strategies, potentially leading to suboptimal solutions; (4) existing research primarily focuses on optimizing throughput and latency, while ignoring resource utilization such as memory. To address these issues, this study proposes a multi-objective deep deterministic policy gradient-based reinforcement learning algorithm (MODDPG). This method is a native multi-objective reinforcement learning approach that does not require transforming the multi-objective task of database system parameters tuning into a single-objective task, enabling it to efficiently adapt to dynamic changes in database system requirements. By improving the reward mechanism of the reinforcement learning algorithm, the alignment between preferences and optimal strategies can be quickly achieved, effectively avoiding suboptimal solutions. Consequently, the training process of the reinforcement learning model can be accelerated, and the efficiency of database system parameter tuning can be improved. To further validate the generality of the proposed method, the multi-objective optimization approach is extended to achieve a collaborative optimization goal of improving both database performance and resource utilization. Experiments using TPC-C and SYSBench benchmarks demonstrate the effectiveness and practicality of the proposed parameter tuning method. The results show significant advantages in terms of model training efficiency and the effectiveness of database parameter tuning.
    Available online:  July 17, 2025 , DOI: 10.13328/j.cnki.jos.007411
    Abstract:
    Non-uniform memory access (NUMA) is the mainstream memory access architecture for state-of-the-art multicore and multi-way processor platforms. Reducing the latency of cross-NUMA node accesses during queries is a key issue for modern in-memory database query optimization techniques. Due to the differences in NUMA architectures and NUMA latency across various processors, NUMA optimization techniques should be combined with hardware characteristics. This study focuses on the in-memory foreign key join algorithm, which has high cost and strong locality of data dependency in in-memory databases, and explores different NUMA optimization techniques, including NUMA-conscious and NUMA-oblivious implementations, on five platforms featuring ARM, Intel CLX/ICX, and AMD Zen2/Zen3 processors. The study also compares the performance of the algorithms across different processor platforms with strategies such as data storage, data partitioning, and join intermediate result caching. Experimental results show that the NUMA-conscious optimization strategy requires the integration of both software and hardware. Radix Join demonstrates neutral sensitivity to NUMA latency, with NUMA optimization gains constantly around 30%. The NPO algorithm shows higher sensitivity to NUMA latency, with NUMA optimization gains ranging from 38% to 57%. The Vector Join algorithm is sensitive to NUMA latency, but the impact is relatively minor, with NUMA optimization gains varying from 1% to 25%. For algorithm performance characteristics, cache efficiency influences the Vector Join performance more than NUMA latency. NUMA-conscious optimization techniques show significant differences on ARM platforms, while the differences are minimal on x86 platforms. The less complex NUMA-oblivious algorithms exhibit greater generality. Given hardware trends, reducing NUMA latency can effectively reduce performance gaps in NUMA-conscious optimization techniques, simplify join algorithm complexity, and improve join operation performance.
    Available online:  July 17, 2025 , DOI: 10.13328/j.cnki.jos.007431
    Abstract:
    With the widespread promotion of smart mobility, there has been increasing attention on the application of vehicular ad hoc network (VANET) in data collection. However, due to the high-speed movement of vehicles and the unpredictability of their trajectories, traditional position-based greedy forwarding strategies struggle to meet the data transmission demands of highly dynamic VANET. To address this issue, an intelligent routing algorithm driven by historical traffic data for VANET (HTD-IR) is proposed. First, an optimal forwarding table for path selection is obtained through an offline learning method based on historical traffic flow information. Then, using an online V2V transmission mechanism based on Markov prediction, the next reliable vehicle is selected according to the vehicle’s motion state. Finally, this study compares HTD-IR with other routing protocols in simulations. The results demonstrate that HTD-IR outperforms in terms of packet delivery ratio, average end-to-end delay, network yield, average successful packet transmission cost, and online computation time complexity.
    Available online:  July 17, 2025 , DOI: 10.13328/j.cnki.jos.007429
    Abstract:
    The performance and operational characteristics of the domain name system (DNS) protocol continue to attract significant attention from both the research community and network operators. In this study, data collected from a large-scale DNS recursive service is measured and analyzed to examine user access patterns and resolution behavior from the perspective of a major DNS operator. To handle the massive volume of DNS data, this study proposes a distributed parallel measurement mechanism and a big data-based storage and monitoring solution, enabling efficient processing and analysis. The characteristics of DNS data are systematically examined across several dimensions, including user request response rates, domain name request patterns, user distribution, and resolution outcomes. Several valuable insights are presented, offering meaningful guidance for DNS operation optimization and improved understanding of DNS behavior. Finally, based on the analysis of DNS cache hit rates, this study proposes a general framework for online anomaly detection tailored to large-scale DNS operators. The correctness and feasibility of the proposed framework are preliminarily verified.
    Available online:  July 09, 2025 , DOI: 10.13328/j.cnki.jos.007430
    Abstract:
    In the field of time series data analysis, cross-domain data distribution shifts significantly weaken model generalization performance. To address this, an end-to-end time series domain adaptation framework, called TPN, is developed. This framework creatively integrates a temporal pattern activation module (TPAM) with a Transformer encoder. TPAM captures spatial and temporal dependencies of sequence features through dual-layer spatio-temporal convolution operations, combines Sigmoid and Tanh activation functions for the non-linear fusion of extracted features, and restores the original channel dimensions via linear projection, thus enhancing the model’s ability to extract temporal features. TPN also introduces an enhanced adversarial paradigm (EAP), which strengthens generator-discriminator-based collaborative adversarial learning through domain classification loss and operation order prediction loss. This effectively reduces data distribution discrepancies between source and target domains, improving the model’s domain adaptability. Empirical results on three public human activity recognition datasets (Opportunity, WISDM, and HHAR) demonstrate that TPN improves accuracy and F1 by up to 6% compared to existing methods, with fewer parameters and shorter runtime. In-depth ablation and visualization experiments further validate the effectiveness of TPAM and EAP, showing TPN’s strong performance in feature extraction and domain alignment.
    Available online:  July 09, 2025 , DOI: 10.13328/j.cnki.jos.007427
    Abstract:
    Blockchain, also known as a distributed ledger, is a prominent example of next-generation information technology. It has been widely applied in various fields, including finance, healthcare, energy, and government affairs. Privacy protection technologies within the blockchain that can be regulated not only safeguard users’ privacy and enhance trust but also prevent misuse of blockchain for illegal activities, ensuring compliance with regulations. Current privacy protection schemes for regulatable blockchains are typically based on bilinear pairing, which exhibit relatively low computational efficiency and fail to meet the demands of high-concurrency scenarios. To address these issues, this study proposes an efficient regulatable identity privacy protection scheme in blockchain. By designing a zero-knowledge proof to verify the consistency of the receiver’s identity without bilinear pairing, along with a traceable ring signature scheme, this approach effectively protects the identity privacy of both parties in transactions while maintaining the effectiveness of supervision. The experimental results indicate that when the number of ring members is set to 16, as required by Monero, the execution time of all algorithms in the efficient regulatable identity privacy protection scheme in blockchain is within 5 milliseconds. Compared to similar schemes, efficiency has improved by more than 14 times, and the message length has been reduced to 50% of the original scheme, demonstrating enhanced computational efficiency and a shorter message length.
    Available online:  July 09, 2025 , DOI: 10.13328/j.cnki.jos.007428
    Abstract:
    Attribute-based searchable encryption (ABSE) enables secure and fine-grained sharing of encrypted data in multi-user environments. However, it typically encounters challenges such as high computational overhead for encryption and decryption, limited query efficiency, and the inability to update indexes dynamically. To address these limitations, this study proposes an efficient searchable scheme based on ABSE that supports dynamic index updates. The reuse of identical access policies minimizes redundant computation during encryption. Most decryption operations are securely outsourced to the cloud, thus reducing the local device’s computational load. An inverted index structure supporting multi-keyword Boolean retrieval is constructed by integrating hash tables with skip lists. BLS short signature technology is employed to verify the permissions for index updates, ensuring data owners can manage the retrieval of encrypted data. Formal security analysis confirms that the proposed scheme effectively defends against collusion attacks, chosen plaintext attacks, forged update tokens, and decryption key forgery. Experimental results demonstrate high efficiency in both retrieval and index update operations, along with a significant reduction in encryption overhead when access policy reuse occurs.
    Available online:  July 09, 2025 , DOI: 10.13328/j.cnki.jos.007402
    Abstract:
    To address the issues in current OWL representation learning methods, which lack the ability to jointly represent complex semantic information across both the concept layer and the instance layer, an OWL representation learning approach using multi-semantic views of concepts, properties, and instances is proposed. The proposed method adopts a three-stage architecture including multi-semantic views partitioning, semantic-aware self-supervised post-training, and joint multi-task representation learning. First, MSV-KRL optimizes the mapping strategy from OWL to RDF graphs based on OWL2Vec*, and five fine-grained semantic view partitioning strategies are proposed. Subsequently, serialized post-training data is generated through the random walk and annotated attribute replacement strategy. The self-supervised post-training of the pre-trained model is then carried out to enhance adaptability to multi-semantic views. Finally, by employing a multi-task learning strategy, the complex semantic representation learning of concepts, properties, and instances in OWL graphs is achieved through joint optimization loss of multi-semantic view prediction tasks. Experimental results demonstrate that MSV-KRL outperforms baseline representation learning methods on multiple benchmarks. MSV-KRL can be adapted to multiple language models, significantly improving the knowledge representation capability of OWL’s complex semantics.
    Available online:  June 25, 2025 , DOI: 10.13328/j.cnki.jos.007400
    Abstract:
    Knowledge graph (KG), with their unique approach to knowledge management and representation capabilities, have been widely applied in various knowledge computing fields, including question answering. However, incomplete information is often present in KG, which undermines their quality and limits the performance of downstream tasks. As a result, knowledge graph completion (KGC) has emerged, aiming to enhance the quality of KG by predicting the missing information in triples using different methods. In recent years, extensive research has been conducted in the field of KGC. This study classifies KGC techniques into three categories based on the number of samples used: zero-shot KGC, few-shot KGC, and multi-shot KGC. To investigate and provide a first-hand reference for the core concepts and current status of KGC research, this study offers a comprehensive review of the latest research advancements in KGC from theoretical research, experimental analysis, and practical applications, such as the Huapu system. The problems and challenges faced by the current KGC technologies are summarized, and potential research directions for the future are discussed.
    Available online:  June 25, 2025 , DOI: 10.13328/j.cnki.jos.007399
    Abstract:
    The use of computer technology for intelligent management of genealogy data plays a significant role in inheriting and popularizing Chinese traditional culture. In recent years, with the widespread application of retrieval-augmented large language model (LLM) in the knowledge question-answering (Q&A) field, presenting diverse genealogy scenarios to users through dialogues with LLMs has become a highly anticipated research direction. However, the heterogeneity, autonomy, complexity, and evolution (HACE) characteristics of genealogy data pose challenges for existing knowledge retrieval frameworks to perform comprehensive knowledge reasoning within complex genealogy information. To address this issue, Huaputong, a genealogy Q&A system based on LLMs with knowledge graph reasoning, is proposed. A knowledge graph reasoning framework, suitable for LLM-based genealogy Q&A, is constructed from two aspects: logic reasoning completeness and information filtering accuracy. In terms of the completeness of logic reasoning, knowledge graphs are used as the medium for genealogy knowledge, and a comprehensive set of genealogy reasoning rules based on the Jena framework is proposed to improve the retrieval recall of genealogy knowledge reasoning. For information filtering, scenarios involving name ambiguity and multiple kinship relations in genealogy are considered. A multi-condition matching mechanism based on problem-condition triples and a Dijkstra path ranking algorithm using a max heap are designed to filter redundant retrieval information, thus ensuring accurate prompting for LLMs. Huaputong has been deployed on the Huapu platform, a publicly available intelligent genealogical website, where its effectiveness has been validated using real-world genealogical data.
    Available online:  June 18, 2025 , DOI: 10.13328/j.cnki.jos.007403
    Abstract:
    Quality issues, such as errors or deficiencies in triplets, become increasingly prominent in knowledge graphs, severely affecting the credibility of downstream applications. Accuracy evaluation is crucial for building confidence in the use and optimization of knowledge graphs. An embedding-model-based method is proposed to reduce reliance on manually labeled data and to achieve scalable automatic evaluation. Triplet verification is formulated as an automated threshold selection problem, with three threshold selection strategies proposed to enhance the robustness of the evaluation. In addition, triplet importance indicators are incorporated to place greater emphasis on critical triplets, with importance scores defined based on network structure and relationship semantics. Experiments are conducted to analyze and compare the impact on performance from various perspectives, such as embedding model capacity, knowledge graph sparsity, and triplet importance definition. The results demonstrate that, compared to existing automated evaluation methods, the proposed method can significantly reduce evaluation errors by nearly 30% in zero-shot conditions, particularly on datasets of dense graphs with high error rates.
    Available online:  June 18, 2025 , DOI: 10.13328/j.cnki.jos.007404
    Abstract:
    Session-based recommendation aims to predict the next item a user will interact with based on a series of items. Most existing session-based recommender systems do not fully utilize the temporal interval information between items within a session, affecting the accuracy of recommendations. In recent years, graph neural networks have gained significant attention in session-based recommendation due to their strong ability to model complex relationships. However, session-based recommendations that rely solely on graph neural networks overlook the hidden high-order relationships between sessions, resulting in less rich information. In addition, data sparsity has always been a phenomenon in recommender systems, and contrastive learning is often employed to address this issue. However, most contrastive learning frameworks lack strong generalization capabilities due to their singular form. Based on this, a session-based recommendation model combined with self-supervised learning is proposed. First, the model utilizes the temporal interval information between items within user sessions to perform data augmentation, enriching the information within the sessions to improve recommendation accuracy. Second, a dual-view encoder is constructed, combining a hypergraph convolutional network encoder and a Transformer encoder to capture the hidden high-order relationships between sessions from multiple perspectives, thus enhancing the diversity of recommendations. Finally, the model integrates the augmented intra-session information, the multi-viewed inter-session information, and the original session information for contrastive learning to strengthen the model’s generalization ability. Comparisons with 11 existing classic models on 4 datasets show that the proposed model is feasible and efficient, with average improvements of 5.96% and 5.89% on HR and NDCG metrics, respectively.
    Available online:  June 11, 2025 , DOI: 10.13328/j.cnki.jos.007401
    Abstract:
    Knowledge graph completion (KGC) models require inductive ability to generalize to new entities as the knowledge graph expands. However, current approaches understand entities only from a local perspective by aggregating neighboring information, failing to capture valuable interconnections between entities across different views. This study argues that global and sequential perspectives are essential for understanding entities beyond the local view by enabling interaction between disconnected and distant entity pairs. More importantly, it emphasizes that the aggregated information must be complementary across different views to avoid redundancy. Therefore, a multi-view framework with the differentiation mechanism is proposed for inductive KGC, aimed at learning complementary entity representations from various perspectives. Specifically, in addition to aggregating neighboring information to obtain the entity’s local representation through R-GCN, an attention-based differentiation mechanism is employed to aggregate complementary information from semantically related entities and entity-related paths, thus obtaining global and sequential representations of the entities. Finally, these representations are fused and used to score the triples. Experimental results demonstrate that the proposed framework consistently outperforms state-of-the-art approaches in the inductive setting. Moreover, it retains competitive performance in the transductive setting.
    Available online:  June 11, 2025 , DOI: 10.13328/j.cnki.jos.007397
    Abstract:
    In the field of model-based diagnosis, the system description is first encoded, and all minimal conflict sets are obtained using a mature SAT solver. Finally, the minimal hitting set of the minimal conflict sets is computed as the candidate diagnosis for the equipment to be diagnosed. However, this strategy consumes a significant amount of time, as it is equivalent to solving two NP-hard problems: computing the minimal conflict set and the minimal hitting set. This study re-encodes the description of the circuit system and proposes a novel variant hitting set algorithm, HSDiag, which can directly compute the diagnosis from the encoding. Compared to state-of-the-art diagnosis algorithms that first solve conflict sets and then hitting sets, the efficiency improves by a factor of 5 to 100. As the number of circuit components increases, the encoding clauses increase linearly, while the number of diagnoses increases exponentially. Since solving all conflict sets of large-scale circuits (ISCAS-85) is impractical, the proposed HSDiag algorithm, within the same cutoff time, yields more than twice the number of solutions compared to conflict-set-based diagnosis algorithms. In addition, this study proposes an equivalence class optimization strategy, which further decomposes the conflict set by using the newly proposed set splitting rule, even if the initial conflict set is inseparable. The efficiency of the HSDiag algorithm optimized by equivalence class is improved by more than 2 times in standard Polybox and Fulladder circuits.
    Available online:  June 11, 2025 , DOI: 10.13328/j.cnki.jos.007396
    Abstract:
    Smart contracts are computer programs running on blockchain platforms, which extend the functionality of the blockchain and enable complex applications. However, the potential security vulnerabilities of smart contracts can lead to significant financial losses. Symbolic execution-based security vulnerability detection methods offer advantages such as high accuracy and the ability to generate test cases that can reproduce vulnerabilities. Nevertheless, as the code size increases, symbolic execution faces challenges such as path explosion and excessive constraint-solving overhead. To address those issues, a novel approach for detecting smart contract security vulnerabilities through target-guided symbolic execution is proposed. First, vulnerable statements identified by static analysis tools or manually are treated as targets.The statements that depend on these target statements are analyzed, and the transaction sequence is augmented with symbolic constraints for the relevant variables. Second, the control flow graph (CFG) is constructed based on the bytecode of smart contracts, with the basic blocks containing the target statements and the dependentstatements located. The CFG is then pruned to generate guidance information. Third, path exploration in symbolic execution is optimized by reducing the number of basic blocks to be analyzed and reducing the time required for solving path constraints. With the guidance information, vulnerabilities are efficiently detected, and test cases capable of reproducing the vulnerabilities are generated. Based on this approach, a prototype tool named Smart-Target is developed. Experiments conducted on the SB Curated dataset in comparison with the symbolic execution tool, Mythril, demonstrate that Smart-Target reduces time overheads by 60.76% and 92.16% in vulnerability detection and replication scenarios, respectively. In addition, the analysis of target statementdependencies enhances vulnerability detection capability by identifying 22.02% more security vulnerabilities.
    Available online:  June 11, 2025 , DOI: 10.13328/j.cnki.jos.007407
    Abstract:
    With the increasing adoption of heterogeneous integrated architectures in high-performance computing, it has become essential to harness their potential and explore new strategies for application development. Traditional static compilation methodologies are no longer sufficient to meet the complex computational demands. Therefore, dynamic programming languages, known for their flexibility and efficiency, are gaining prominence. Julia, a modern high-performance language characterized by its JIT compilation mechanism, has demonstrated significant performance in fields such as scientific computing. Targeting the unique features of the Sunway heterogeneous many-core architecture, the ORCJIT engine is introduced, along with an on-chip storage management approach specifically designed for dynamic modes. Based on these advancements, swJulia is developed as a Julia dynamic language compiler tailored for the new generation of the Sunway supercomputer. This compiler not only inherits the flexibility of the Julia compiler but also provides robust support for the SACA many-core programming model and runtime encapsulation. By utilizing the swJulia compilation system, the deployment of the NNQS-Transformer quantum chemistry simulator on the new generation of the Sunway supercomputer is successfully achieved. Comprehensive validation across multiple dimensions demonstrates the efficacy and efficiency of swJulia. Experimental results show exceptional performance in single-threaded benchmark tests and many-core acceleration, significantly improving ultra-large-scale parallel simulations for the NNQS-Transformer quantum chemistry simulator.
    Available online:  June 11, 2025 , DOI: 10.13328/j.cnki.jos.007409
    Abstract:
    Temporal logic has been extensively applied in domains such as formal verification and robotics control, yet it remains challenging for non-expert users to master. Therefore, the automated extraction of temporal logic formulas from natural language texts is crucial. However, existing efforts are hindered by issues such as sparse sample availability and the ambiguity of natural language semantics, which impede the accurate identification of implicit temporal semantics within natural language texts, thus leading to errors in the translation of the original natural language semantics into temporal logic formulas. To address this issue, a novel method for temporal logic semantic analysis based on a few-shot learning network, termed FSLNets-TLSA, is proposed. This method employs data preprocessing techniques to enhance the temporal semantic logic features of the text. The network architecture consists of an encoder, an induction module, and a relation module, which aim to capture the implicit temporal logic semantic information in the input text. In addition, an enhancement module is incorporated to improve the accuracy of monitoring semantic recognition. The effectiveness of the proposed method is validated through experimental evaluations conducted on three public datasets comprising a total of 3 533 samples, and a comparison with similar tools. The analysis demonstrates an average Accuracy, Recall, and F1-score of 96.55%, 96.29%, and 96.42%, respectively.
    Available online:  June 04, 2025 , DOI: 10.13328/j.cnki.jos.007410
    Abstract:
    In recent years, the increasing complexity of space missions has led to an exponential growth in space-generated data. However, limited satellites-to-ground bandwidth and scarce frequency resources pose significant challenges to traditional bent-pipe architecture, which faces severe transmission bottlenecks. In addition, onboard data must wait for satellites to pass over ground stations before transmission. The large-scale construction of ground stations is not only cost-prohibitive but also carries geopolitical and economic risks. Satellite edge computing has emerged as a promising solution to these bottlenecks by integrating mobile edge computing technology into satellite edges. This approach significantly enhances user experience and reduces redundant network traffic. By enabling onboard data processing, satellite edge computing shortens data acquisition times and reduces reliance on extensive ground station infrastructure. Furthermore, the integration of artificial intelligence (AI) and edge computing technologies offers an efficient and forward-looking path to address existing challenges. This study reviews the latest progress in intelligent satellite edge computing. First, the demands and applications of satellite edge computing in various typical scenarios are discussed. Next, key challenges and recent research advancements in this field are analyzed. Finally, several open research topics are highlighted, and new ideas are proposed to guide future studies. This discussion aims to provide valuable insights to promote technological innovation and the practical implementation of satellite edge computing.
    Available online:  June 04, 2025 , DOI: 10.13328/j.cnki.jos.007408
    Abstract:
    With the rapid development of autonomous driving technology, the issue of vehicle control takeover has become a prominent research topic. A car equipped with an assisted driving system cannot fully handle all driving scenarios. When the actual driving scenario exceeds the operational design domain of the assisted system, human intervention is still required to control the vehicle and ensure the safe completion of the driving task. Takeover performance is an extremely important metric for evaluating a driver’s performance during the takeover process, which includes takeover reaction time and takeover quality. The takeover reaction time refers to the time from the system’s takeover request to the driver’s control of the steering wheel. The length of the takeover response time not only reflects the driver’s current state but also affects the subsequent handling of complex scenarios. Takeover quality refers to the quality of manual vehicle operation by the driver after regaining control. This study, based on the CARLA driving simulator, constructs 6 typical driving scenarios, simulates the vehicle control takeover process, and collects physiological signals and eye movement data from 31 drivers using a multi-channel acquisition system. Based on the driver’s takeover performance, and regarding International standards, an objective takeover performance evaluation metric is proposed, incorporating the driver’s takeover reaction time, maximum horizontal and vertical accelerations, and minimum collision time, derived from multiple vehicle data. By combining driver data, vehicle data, and scenario data, a deep neural network (DNN) model predicts takeover performance, while the SHAP model analyzes the impact of each feature, improving the model’s interpretability and transparency. The experimental results show that the proposed DNN model outperforms traditional machine learning methods in predicting takeover performance, achieving an accuracy of 92.2% and demonstrating good generalization. The SHAP analysis reveals the impact of key features such as heart rate variability, driving experience, and minimum safe distance on the prediction results. This research provides a theoretical and empirical foundation for the safety optimization and human-computer interaction design of autonomous driving systems and is of great significance for improving the efficiency and safety of human-vehicle cooperation in autonomous driving technology.
    Available online:  June 04, 2025 , DOI: 10.13328/j.cnki.jos.007406
    Abstract:
    The compiler is one of the most relied-upon performance tuning tools for program developers. However, due to the limited precision encoding of floating-point numbers, many compiler optimization options can alter the semantics of floating-point calculations, leading to result inconsistency. Locating the program statements that cause compilation optimization-induced result inconsistency is crucial for performance tuning and result reproducibility. The state-of-the-art approach employs precision enhancement-based binary search to locate the code snippets causing result inconsistency but suffers from insufficient support for multi-source localization and low search efficiency. This study proposes a floating-point instruction difference-guided Delta-Debugging localization method, FI3D, which utilizes the backtracking mechanism in Delta-Debugging to better support multi-source problem code localization and exploits the differences in floating-point instruction sequences under different compiler optimization options to guide the localization. FI3D is evaluated using 6 applications from the NPB benchmark, 10 programs from the GNU scientific library, and 2 programs from the floatsmith mixed-precision benchmark. Experimental results demonstrate that FI3D successfully locates the 4 applications where PLiner fails and achieves an average 26.8% performance improvement for the 14 cases successfully located by PLiner.
    Available online:  May 22, 2025 , DOI: 10.13328/j.cnki.jos.007372
    Abstract:
    The reproducibility of scientific research results is a fundamental guarantee for the reliability of scientific research and the cornerstone of scientific and technological advancement. However, the research community is currently facing a serious reproducibility crisis, with many research results published in top journals and conferences being irreproducible. In the field of data science, the reproducibility of research results faces challenges such as heterogeneous research data from multiple sources, complex computational processes, and intricate computational environments. To address these issues, this study proposes ReproLink, a reproducibility-oriented research data management system. ReproLink constructs a unified model of research data, abstracting it into research data objects that consist of three elements: identifier, attribute set, and data entity. Through fine-grained modeling of the reproduction process, ReproLink establishes a precise method for describing multi-step, complex reproduction processes. By integrating code and operating environment modeling, ReproLink eliminates the uncertainties caused by different environments affecting code execution. Performance tests and case studies show that ReproLink performs well with data scales up to one million records, demonstrating practical value in real-world scenarios such as paper reproduction and data provenance tracking. The technical architecture of ReproLink has been integrated into Conow Software, the only integrated comprehensive management and service platform in China specifically designed for scientific research institutes, supporting the reproducibility needs of hundreds of such institutes across the country.
    Available online:  October 18, 2017
    [Abstract] (3052) [HTML] (0) [PDF 525.21 K] (6651)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,315-325页. 原文链接如下:https://doi.org/10.1145/3106237.3106242, 读者如需引用该文请标引原文出处。
    Available online:  October 18, 2017
    [Abstract] (2984) [HTML] (0) [PDF 352.38 K] (7412)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,303-314页. 原文链接如下:https://doi.org/10.1145/3106237.3106239, 读者如需引用该文请标引原文出处。
    Available online:  September 11, 2017
    [Abstract] (3559) [HTML] (0) [PDF 276.42 K] (4827)
    Abstract:
    GitHub, a popular social-software-development platform, has fostered a variety of software ecosystems where projects depend on one another and practitioners interact with each other. Projects within an ecosystem often have complex inter-dependencies that impose new challenges in bug reporting and fixing. In this paper, we conduct an empirical study on cross-project correlated bugs, i.e., causally related bugs reported to different projects, focusing on two aspects: 1) how developers track the root causes across projects; and 2) how the downstream developers coordinate to deal with upstream bugs. Through manual inspection of bug reports collected from the scientific Python ecosystem and an online survey with developers, this study reveals the common practices of developers and the various factors in fixing cross-project bugs. These findings provide implications for future software bug analysis in the scope of ecosystem, as well as shed light on the requirements of issue trackers for such bugs.
    Available online:  June 21, 2017
    [Abstract] (3561) [HTML] (0) [PDF 169.43 K] (4553)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在IEEE Transactions on Software Engineering 2017 已录用待发表. 原文链接如下:http://ieeexplore.ieee.org/document/7792694, 读者如需引用该文请标引原文出处。
    Available online:  June 13, 2017
    [Abstract] (4809) [HTML] (0) [PDF 174.91 K] (5021)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 39th International Conference on Software Engineering, Pages 27-37, Buenos Aires, Argentina — May 20 - 28, 2017, IEEE Press Piscataway, NJ, USA ?2017, ISBN: 978-1-5386-3868-2 原文链接如下:http://dl.acm.org/citation.cfm?id=3097373, 读者如需引用该文请标引原文出处。
    Available online:  January 25, 2017
    [Abstract] (3662) [HTML] (0) [PDF 254.98 K] (4555)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 871-882. DOI: https://doi.org/10.1145/2950290.2950364 原文链接如下:http://dl.acm.org/citation.cfm?id=2950364, 读者如需引用该文请标引原文出处。
    Available online:  January 18, 2017
    [Abstract] (4162) [HTML] (0) [PDF 472.29 K] (4749)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Pages 133—143, Seattle WA, USA, November 2016. 原文链接如下:http://dl.acm.org/citation.cfm?id=2950327, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017
    [Abstract] (3882) [HTML] (0) [PDF 293.93 K] (4240)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE'16), 810 – 821, November 13 - 18, 2016. 原文链接如下:https://doi.org/10.1145/2950290.2950310, 读者如需引用该文请标引原文出处。
    Available online:  January 04, 2017
    [Abstract] (4209) [HTML] (0) [PDF 244.61 K] (4931)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE 2016, 原文链接如下:http://dl.acm.org/citation.cfm?doid=2950290.2950313, 读者如需引用该文请标引原文出处。
    Available online:  December 12, 2016
    [Abstract] (3730) [HTML] (0) [PDF 358.69 K] (4638)
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在FSE'16会议上Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 原文链接如下:http://dl.acm.org/citation.cfm?id=2950340, 读者如需引用该文请标引原文出处。
    Available online:  September 30, 2016
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 文章发表在ASE2016会议上http://ase2016.org/ 原文链接如下:http://dl.acm.org/citation.cfm?id=2970366 读者如需引用该文请标引原文出处。
    Available online:  September 09, 2016
    Abstract:
    文章由CCF软件工程专业委员会白颖教授推荐。 俊杰的文章发表在ASE2016会议上,http://ase2016.org/。 原文链接如下:http://dl.acm.org/citation.cfm?doid=2970276.2970300 请读者标引时请引注原文出处。
    Available online:  September 07, 2016
    Abstract:
    CCF 软件工程专业委员会白晓颖教授(清华大学)推荐。 原文发表在 ASE 2016 Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering。 全文链接:http://dx.doi.org/10.1145/2970276.2970307。 重要提示:读者如引用该文时请标注原文出处。
    Available online:  August 29, 2016
    Abstract:
    CCF软件工程专业委员会白晓颖教授(清华大学)推荐。 该论文发表在ACM Transactions on Software Engineering and Methodology (TOSEM, Vol. 25, No. 2, Article 13, May 2016),被ICSE 2016主会邀请为“Journal first”报告, 全文参见http://dl.acm.org/citation.cfm?id=2876443。 论文作者包括北京大学的周明辉,马秀娟,张路和梅宏,以及田纳西大学的Audris Mockus。 重要提示:读者如引用该文时请标注原文出处。
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2015,26(6):1356-1372 , DOI: 10.13328/j.cnki.jos.004831
    [Abstract] (130880) [HTML] (4693) [PDF 877.35 K] (16837)
    Abstract:
    Social recommender systems have recently become one of the hottest topics in the domain of recommender systems. The main task of social recommender system is to alleviate data sparsity and cold-start problems, and improve its performance utilizing users' social attributes. This paper presents an overview of the field of social recommender systems, including trust inference algorithms, key techniques and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2015,26(1):26-39 , DOI: 10.13328/j.cnki.jos.004631
    [Abstract] (39866) [HTML] (4129) [PDF 763.52 K] (21571)
    Abstract:
    In recent years, transfer learning has provoked vast amount of attention and research. Transfer learning is a new machine learning method that applies the knowledge from related but different domains to target domains. It relaxes the two basic assumptions in traditional machine learning: (1) the training (also referred as source domain) and test data (also referred target domain) follow the independent and identically distributed (i.i.d.) condition; (2) there are enough labeled samples to learn a good classification model, aiming to solve the problems that there are few or even not any labeled data in target domains. This paper surveys the research progress of transfer learning and introduces its own works, especially the ones in building transfer learning models by applying generative model on the concept level. Finally, the paper introduces the applications of transfer learning, such as text classification and collaborative filtering, and further suggests the future research direction of transfer learning.
    2003,14(7):1282-1291
    [Abstract] (38004) [HTML] (0) [PDF 832.28 K] (85365)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2015,26(1):62-81 , DOI: 10.13328/j.cnki.jos.004701
    [Abstract] (37765) [HTML] (6512) [PDF 1.04 M] (37794)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2010,21(3):427-437
    [Abstract] (33451) [HTML] (0) [PDF 308.76 K] (44365)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2011,22(1):71-83 , DOI: 10.3724/SP.J.1001.2011.03958
    [Abstract] (30627) [HTML] (0) [PDF 781.42 K] (62570)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2016,27(1):45-71 , DOI: 10.13328/j.cnki.jos.004914
    [Abstract] (30553) [HTML] (5487) [PDF 880.96 K] (41679)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2008,19(1):48-61
    [Abstract] (29075) [HTML] (0) [PDF 671.39 K] (66864)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2009,20(5):1337-1348
    [Abstract] (28713) [HTML] (0) [PDF 1.06 M] (49436)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2009,20(2):271-289
    [Abstract] (27853) [HTML] (0) [PDF 675.56 K] (50255)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2017,28(4):959-992 , DOI: 10.13328/j.cnki.jos.005143
    [Abstract] (23219) [HTML] (8027) [PDF 3.58 M] (33071)
    Abstract:
    The development of mobile internet and the popularity of mobile terminals produce massive trajectory data of moving objects under the era of big data. Trajectory data has spatio-temporal characteristics and rich information. Trajectory data processing techniques can be used to mine the patterns of human activities and behaviors, the moving patterns of vehicles in the city and the changes of atmospheric environment. However, trajectory data also can be exploited to disclose moving objects' privacy information (e.g., behaviors, hobbies and social relationships). Accordingly, attackers can easily access moving objects' privacy information by digging into their trajectory data such as activities and check-in locations. In another front of research, quantum computation presents an important theoretical direction to mine big data due to its scalable and powerful storage and computing capacity. Applying quantum computing approaches to handle trajectory big data could make some complex problem solvable and achieve higher efficiency. This paper reviews the key technologies of processing trajectory data. First the concept and characteristics of trajectory data is introduced, and the pre-processing methods, including noise filtering and data compression, are summarized. Then, the trajectory indexing and querying techniques, and the current achievements of mining trajectory data, such as pattern mining and trajectory classification, are reviewed. Next, an overview of the basic theories and characteristics of privacy preserving with respect to trajectory data is provided. The supporting techniques of trajectory big data mining, such as processing framework and data visualization, are presented in detail. Some possible ways of applying quantum computation into trajectory data processing, as well as the implementation of some core trajectory mining algorithms by quantum computation are also described. Finally, the challenges of trajectory data processing and promising future research directions are discussed.
    2005,16(1):1-7
    [Abstract] (22821) [HTML] (0) [PDF 614.61 K] (25807)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2010,21(8):1834-1848
    [Abstract] (21654) [HTML] (0) [PDF 682.96 K] (62505)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2015,26(1):145-166 , DOI: 10.13328/j.cnki.jos.004688
    [Abstract] (21391) [HTML] (4037) [PDF 1.65 M] (11864)
    Abstract:
    The explosive growth of the digital data brings great challenges to the relational database management systems in addressing issues in areas such as scalability and fault tolerance. The cloud computing techniques have been widely used in many applications and become the standard effective approach to manage large scale data because of their high scalability, high availability and fault tolerance. The existing cloud-based data management systems can't efficiently support complex queries such as multi-dimensional queries and join queries because of lacking of index or view techniques, limiting the application of cloud computing in many respects. This paper conducts an in-depth research on the index techniques for cloud data management to highlight their strengths and weaknesses. This paper also introduces its own preliminary work on the index for massive IOT data in cloud environment. Finally, it points out some challenges in the index techniques for big data in cloud environment.
    2004,15(3):428-442
    [Abstract] (21055) [HTML] (0) [PDF 1009.57 K] (20895)
    Abstract:
    With the rapid development of e-business, web applications based on the Web are developed from localization to globalization, from B2C(business-to-customer) to B2B(business-to-business), from centralized fashion to decentralized fashion. Web service is a new application model for decentralized computing, and it is also an effective mechanism for the data and service integration on the web. Thus, web service has become a solution to e-business. It is important and necessary to carry out the research on the new architecture of web services, on the combinations with other good techniques, and on the integration of services. In this paper, a survey presents on various aspects of the research of web services from the basic concepts to the principal research problems and the underlying techniques, including data integration in web services, web service composition, semantic web service, web service discovery, web service security, the solution to web services in the P2P (Peer-to-Peer) computing environment, and the grid service, etc. This paper also presents a summary of the current art of the state of these techniques, a discussion on the future research topics, and the challenges of the web services.
    2009,20(1):54-66
    [Abstract] (20124) [HTML] (0) [PDF 1.41 M] (55709)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2005,16(5):857-868
    [Abstract] (20092) [HTML] (0) [PDF 489.65 K] (34837)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2012,23(4):962-986 , DOI: 10.3724/SP.J.1001.2012.04175
    [Abstract] (19325) [HTML] (0) [PDF 2.09 M] (37317)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):32-45 , DOI: 10.3724/SP.J.1001.2012.04091
    [Abstract] (18988) [HTML] (0) [PDF 408.86 K] (36825)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2009,20(3):524-545
    [Abstract] (17696) [HTML] (0) [PDF 1.09 M] (29316)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2013,24(1):91-108 , DOI: 10.3724/SP.J.1001.2013.04292
    [Abstract] (17602) [HTML] (0) [PDF 0.00 Byte] (17534)
    Abstract:
    Mobile recommender systems have recently become one of the hottest topics in the domain of recommender systems. The main task of mobile recommender systems is to improve the performance and accuracy along with user satisfaction utilizing mobile context, mobile social network and other information. This paper presents an overview of the field of mobile recommender systems including key techniques, evaluation and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2009,20(1):124-137
    [Abstract] (17386) [HTML] (0) [PDF 1.06 M] (26833)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2010,21(5):899-915
    [Abstract] (17148) [HTML] (0) [PDF 972.65 K] (18828)
    Abstract:
    This paper firstly presents a summary of AADL (architecture analysis and design language), including its progress over the years and its modeling elements. Then, it surveys the research and practice of AADL from a model-based perspective, such as AADL modeling, AADL formal semantics, model transformation, verification and code generation. Finally, the potential research directions are discussed.
    2014,25(2):400-418 , DOI: 10.13328/j.cnki.jos.004540
    [Abstract] (17147) [HTML] (2694) [PDF 1.24 M] (8611)
    Abstract:
    Cyber-Physical Systems (CPSs) have great potentials in several application domains. Time plays an important role in CPS and should be specified in the very early phase of requirements engineering. This paper proposes a framework to model and verify timing requirements for the CPS. To begin with, a conceptual model is presented for providing basic concepts of timing and functional requirements. Guided by this model, the CPS software timing requirement specification can be obtained from CPS environment properties and constraints. To support formal verification, formal semantics for the conceptual model is provided. Based on the semantics, the consistency properties of the timing requirements specification are defined and expressed as CTL formulas. The timing requirements specification is transformed into a NuSMV model and checked by this well-known model checker.
    2017,28(4):860-882 , DOI: 10.13328/j.cnki.jos.005190
    [Abstract] (17051) [HTML] (6105) [PDF 2.49 M] (23591)
    Abstract:
    Information flow analysis is a promising approach for protecting the confidentiality and integrity of information manipulated by computing systems. Taint analysis, as in practice, is widely used in the area of software security assurance. This survey summarizes the latest advances on taint analysis, especially the solutions applied in different platform applications. Firstly, the basic principle of taint analysis is introduced along with the general technology of taint propagation implemented by dynamic and static analyses. Then, the proposals applied in different platform frameworks, including techniques for protecting privacy leakage on Android and finding security vulnerabilities on Web, are analyzed. Lastly, further research directions and future work are discussed.
    2009,20(2):350-362
    [Abstract] (16883) [HTML] (0) [PDF 1.39 M] (45561)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2004,15(8):1208-1219
    [Abstract] (16804) [HTML] (0) [PDF 948.49 K] (18858)
    Abstract:
    With the explosive growth of network applications and complexity, the threat of Internet worms against network security becomes increasingly serious. Especially under the environment of Internet, the variety of the propagation ways and the complexity of the application environment result in worm with much higher frequency of outbreak, much deeper latency and more wider coverage, and Internet worms have been a primary issue faced by malicious code researchers. In this paper, the concept and research situation of Internet worms, exploration function component and execution mechanism are first presented, then the scanning strategies and propagation model are discussed, and finally the critical techniques of Internet worm prevention are given. Some major problems and research trends in this area are also addressed.
    2009,20(11):2965-2976
    [Abstract] (16720) [HTML] (0) [PDF 442.42 K] (20051)
    Abstract:
    This paper studies uncertain graph data mining and especially investigates the problem of mining frequent subgraph patterns from uncertain graph data. A data model is introduced for representing uncertainties in graphs, and an expected support is employed to evaluate the significance of subgraph patterns. By using the apriori property of expected support, a depth-first search-based mining algorithm is proposed with an efficient method for computing expected supports and a technique for pruning search space, which reduces the number of subgraph isomorphism testings needed by computing expected support from the exponential scale to the linear scale. Experimental results show that the proposed algorithm is 3 to 5 orders of magnitude faster than a na?ve depth-first search algorithm, and is efficient and scalable.
    2009,20(5):1226-1240
    [Abstract] (16646) [HTML] (0) [PDF 926.82 K] (21562)
    Abstract:
    This paper introduces the concrete details of combining the automated reasoning techniques with planning methods, which includes planning as satisfiability using propositional logic, Conformant planning using modal logic and disjunctive reasoning, planning as nonmonotonic logic, and Flexible planning as fuzzy description logic. After considering experimental results of International Planning Competition and relevant papers, it concludes that planning methods based on automated reasoning techniques is helpful and can be adopted. It also proposes the challenges and possible hotspots.
    2003,14(10):1717-1727
    [Abstract] (16519) [HTML] (0) [PDF 839.25 K] (19909)
    Abstract:
    Sensor networks are integration of sensor techniques, nested computation techniques, distributed computation techniques and wireless communication techniques. They can be used for testing, sensing, collecting and processing information of monitored objects and transferring the processed information to users. Sensor network is a new research area of computer science and technology and has a wide application future. Both academia and industries are very interested in it. The concepts and characteristics of the sensor networks and the data in the networks are introduced, and the issues of the sensor networks and the data management of sensor networks are discussed. The advance of the research on sensor networks and the data management of sensor networks are also presented.
    2012,23(1):1-20 , DOI: 10.3724/SP.J.1001.2012.04100
    [Abstract] (16097) [HTML] (0) [PDF 1017.73 K] (39019)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2014,25(4):839-862 , DOI: 10.13328/j.cnki.jos.004558
    [Abstract] (15861) [HTML] (4592) [PDF 1.32 M] (25623)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2009,20(10):2729-2743
    [Abstract] (14659) [HTML] (0) [PDF 1.12 M] (14957)
    Abstract:
    In a multi-hop wireless sensor network (WSN), the sensors closest to the sink tend to deplete their energy faster than other sensors, which is known as an energy hole around the sink. No more data can be delivered to the sink after an energy hole appears, while a considerable amount of energy is wasted and the network lifetime ends prematurely. This paper investigates the energy hole problem, and based on the improved corona model with levels, it concludes that the assignment of transmission ranges of nodes in different coronas is an effective approach for achieving energy-efficient network. It proves that the optimal transmission ranges for all areas is a multi-objective optimization problem (MOP), which is NP hard. The paper proposes an ACO (ant colony optimization)-based distributed algorithm to prolong the network lifetime, which can help nodes in different areas to adaptively find approximate optimal transmission range based on the node distribution. Furthermore, the simulation results indicate that the network lifetime under this solution approximates to that using the optimal list. Compared with existing algorithms, this ACO-based algorithm can not only make the network lifetime be extended more than two times longer, but also have good performance in the non-uniform node distribution.
    2000,11(11):1460-1466
    [Abstract] (14643) [HTML] (0) [PDF 520.69 K] (15046)
    Abstract:
    Intrusion detection is a highlighted topic of network security research in recent years. In this paper, first the necessity o f intrusion detection is presented, and its concepts and models are described. T hen, many intrusion detection techniques and architectures are summarized. Final ly, the existing problems and the future direction in this field are discussed.
    2012,23(5):1148-1166 , DOI: 10.3724/SP.J.1001.2012.04195
    [Abstract] (14625) [HTML] (0) [PDF 946.37 K] (21937)
    Abstract:
    With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here, the features, influence and related products of cloud databases are first discussed. Then, research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.
    2002,13(7):1228-1237
    [Abstract] (14430) [HTML] (0) [PDF 500.04 K] (19191)
    Abstract:
    Software architecture (SA) is emerging as one of the primary research areas in software engineering recently and one of the key technologies to the development of large-scale software-intensive system and software product line system. The history and the major direction of SA are summarized, and the concept of SA is brought up based on analyzing and comparing the several classical definitions about SA. Based on summing up the activities about SA, two categories of study about SA are extracted out, and the advancements of researches on SA are subsequently introduced from seven aspects.Additionally,some disadvantages of study on SA are discussed,and the causes are explained at the same.Finally,it is concluded with some singificantly promising tendency about research on SA.
    2013,24(8):1786-1803 , DOI: 10.3724/SP.J.1001.2013.04416
    [Abstract] (14308) [HTML] (0) [PDF 1.04 M] (24712)
    Abstract:
    Many specific application oriented NoSQL database systems are developed for satisfying the new requirement of big data management. This paper surveys researches on typical NoSQL database based on key-value data model. First, the characteristics of big data, and the key technique issues supporting big data management are introduced. Then frontier efforts and research challenges are given, including system architecture, data model, access mode, index, transaction, system elasticity, load balance, replica strategy, data consistency, flash cache, MapReduce based data process and new generation data management system etc. Finally, research prospects are given.
    2006,17(7):1588-1600
    [Abstract] (14137) [HTML] (0) [PDF 808.73 K] (18640)
    Abstract:
    Routing technology at the network layer is pivotal in the architecture of wireless sensor networks. As an active branch of routing technology, cluster-based routing protocols excel in network topology management, energy minimization, data aggregation and so on. In this paper, cluster-based routing mechanisms for wireless sensor networks are analyzed. Cluster head selection, cluster formation and data transmission are three key techniques in cluster-based routing protocols. As viewed from the three techniques, recent representative cluster-based routing protocols are presented, and their characteristics and application areas are compared. Finally, the future research issues in this area are pointed out.
    2011,22(1):115-131 , DOI: 10.3724/SP.J.1001.2011.03950
    [Abstract] (14039) [HTML] (0) [PDF 845.91 K] (34057)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2004,15(4):571-583
    [Abstract] (13989) [HTML] (0) [PDF 1005.17 K] (13754)
    Abstract:
    For most peer-to-peer file-swapping applications, sharing is a volunteer action, and peers are not responsible for their irresponsible bartering history. This situation indicates the trust between participants can not be set up simply on the traditional trust mechanism. A reasonable trust construction approach comes from the social network analysis, in which trust relations between individuals are set up upon recommendations of other individuals. Current p2p trust model could not promise the convergence of iteration for trust computation, and takes no consideration for model security problems, such as sybil attack and slandering. This paper presents a novel recommendation-based global trust model and gives a distributed implementation method. Mathematic analyses and simulations show that, compared to the current global trust model, the proposed model is more robust on trust security problems and more complete on iteration for computing peer trust.
    2009,20(1):11-29
    [Abstract] (13931) [HTML] (0) [PDF 787.30 K] (19322)
    Abstract:
    Constrained optimization problems (COPs) are mathematical programming problems frequently encountered in the disciplines of science and engineering application. Solving COPs has become an important research area of evolutionary computation in recent years. In this paper, the state-of-the-art of constrained optimization evolutionary algorithms (COEAs) is surveyed from two basic aspects of COEAs (i.e., constraint-handling techniques and evolutionary algorithms). In addition, this paper discusses some important issues of COEAs. More specifically, several typical algorithms are analyzed in detail. Based on the analyses, it concluded that to obtain competitive results, a proper constraint-handling technique needs to be considered in conjunction with an appropriate search algorithm. Finally, the open research issues in this field are also pointed out.
    2013,24(1):50-66 , DOI: 10.3724/SP.J.1001.2013.04276
    [Abstract] (13752) [HTML] (0) [PDF 0.00 Byte] (21657)
    Abstract:
    As an important application of acceleration in the cloud, the distributed caching technology has received considerable attention in industry and academia. This paper starts with a discussion on the combination of cloud computing and distributed caching technology, giving an analysis of its characteristics, typical application scenarios, stages of development, standards, and several key elements, which have promoted its development. In order to systematically know the state of art progress and weak points of the distributed caching technology, the paper builds a multi-dimensional framework, DctAF. This framework is constituted of 6 dimensions through analyzing the characteristics of cloud computing and boundary of the caching techniques. Based on DctAF, current techniques have been analyzed and summarized; comparisons among several influential products have also been made. Finally, the paper describes and highlights the several challenges that the cache system faces and examines the current research through in-depth analysis and comparison.
    2008,19(zk):112-120
    [Abstract] (13732) [HTML] (0) [PDF 594.29 K] (19172)
    Abstract:
    An ad hoc network is a collection of wireless mobile nodes dynamically forming a temporary network without the use of any existing network infrastructure or centralized administration. Due to bandwidth constraint and dynamic topology of mobile ad hoc networks, multipath supported routing is a very important research issue. In this paper, we present an entropy-based metric to support stability multipath on-demand routing (SMDR). The key idea of SMDR protocol is to construct the new metric-entropy and select the stability multipath with the help of entropy metric to reduce the number of route reconstruction so as to provide QoS guarantee in the ad hoc network whose topology changes continuously. Simulation results show that, with the proposed multipath routing protocol, packet delivery ratio, end-to-end delay, and routing overhead ratio can be improved in most of cases. It is an available approach to multipath routing decision.
    2002,13(10):1952-1961
    [Abstract] (13596) [HTML] (0) [PDF 570.96 K] (18257)
    Abstract:
    The crucial technologies related to personalization are introduced in this paper, which include the representation and modification of user profile, the representation of resource, the recommendation technology, and the architecture of personalization. By comparing with some existing prototype systems, the key technologies about how to implement personalization are discussed in detail. In addition, three representative personalization systems are analyzed. At last, some research directions for personalization are presented.
    2003,14(9):1621-1628
    [Abstract] (13527) [HTML] (0) [PDF 680.35 K] (24485)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
    2003,14(9):1635-1644
    [Abstract] (13428) [HTML] (0) [PDF 622.06 K] (18074)
    Abstract:
    Computer forensics is the technology field that attempts to prove thorough, efficient, and secure means to investigate computer crime. Computer evidence must be authentic, accurate, complete and convincing to juries. In this paper, the stages of computer forensics are presented, and the theories and the realization of the forensics software are described. An example about forensic practice is also given. The deficiency of computer forensics technique and anti-forensics are also discussed. The result comes out that it is as the improvement of computer science technology, the forensics technique will become more integrated and thorough.
    2008,19(7):1565-1580
    [Abstract] (13398) [HTML] (0) [PDF 815.02 K] (20847)
    Abstract:
    Software defect prediction has been one of the active parts of software engineering since it was developed in 1970's. It plays a very important role in the analysis of software quality and balance of software cost. This paper investigates and discusses the motivation, evolvement, solutions and challenges of software defect prediction technologies, and it also categorizes, analyzes and compares the representatives of these prediction technologies. Some case studies for software defect distribution models are given to help understanding.
    2019,30(1):22-32 , DOI: 10.13328/j.cnki.jos.005648
    [Abstract] (13352) [HTML] (4463) [PDF 310.24 K] (6910)
    Abstract:
    This paper presents several new insights into system software, which is one of the basic concepts in computing discipline, from three perspectives of essential features, characteristics of the times, and the future development trend. The first insight is that system software stems theoretically and technically from universal Turing machine and the idea of stored-program, with an essential feature of "manipulating the execution of a computing system". There are two typical manipulation modes:encoding and then loading, executing and controlling. The second insight is that software system is a kind of software, in the Internet age, providing substantial online services continuously, which lay the foundation for the newly emerged "software-as-a-service" paradigm. The final insight is about its development trend:system software will evolve online continuously. Driven by innovations of computing systems, integration of cyber and physical spaces, and intelligence technologies, system software will become the core of future software ecology.
    2012,23(1):82-96 , DOI: 10.3724/SP.J.1001.2012.04101
    [Abstract] (13263) [HTML] (0) [PDF 394.07 K] (19348)
    Abstract:
    Botnets are one of the most serious threats to the Internet. Researchers have done plenty of research and made significant progress. However, botnets keep evolving and have become more and more sophisticated. Due to the underlying security limitation of current system and Internet architecture, and the complexity of botnet itself, how to effectively counter the global threat of botnets is still a very challenging issue. This paper first introduces the evolving of botnet’s propagation, attack, command, and control mechanisms. Then the paper summarizes recent advances of botnet defense research and categorizes into five areas: Botnet monitoring, botnet infiltration, analysis of botnet characteristics, botnet detection and botnet disruption. The limitation of current botnet defense techniques, the evolving trend of botnet, and some possible directions for future research are also discussed.
    2008,19(8):1947-1964
    [Abstract] (13213) [HTML] (0) [PDF 811.11 K] (16152)
    Abstract:
    Wide-Spread deployment for interactive information visualization is difficult. Non-Specialist users need a general development method and a toolkit to support the generic data structures suited to tree, network and multi-dimensional data, special visualization techniques and interaction techniques, and well-known generic information tasks. This paper presents a model driven development method for interactive information visualization. First, an interactive information visualization interface model (IIVM) is proposed. Then, the development method for interactive information visualization based on IIVM is presented. The Daisy toolkit is introduced, which includes Daisy model builder, Daisy IIV generator and runtime framework with Daisy library. Finally, an application example is given. Experimental results show that Daisy can provide a general solution for development for interactive information visualization.
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2003,14(7):1282-1291
    [Abstract] (38004) [HTML] (0) [PDF 832.28 K] (85365)
    Abstract:
    Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
    2008,19(1):48-61
    [Abstract] (29075) [HTML] (0) [PDF 671.39 K] (66864)
    Abstract:
    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
    2011,22(1):71-83 , DOI: 10.3724/SP.J.1001.2011.03958
    [Abstract] (30627) [HTML] (0) [PDF 781.42 K] (62570)
    Abstract:
    Cloud Computing is the fundamental change happening in the field of Information Technology. It is a representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
    2010,21(8):1834-1848
    [Abstract] (21654) [HTML] (0) [PDF 682.96 K] (62505)
    Abstract:
    This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
    2009,20(1):54-66
    [Abstract] (20124) [HTML] (0) [PDF 1.41 M] (55709)
    Abstract:
    Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
    2009,20(2):271-289
    [Abstract] (27853) [HTML] (0) [PDF 675.56 K] (50255)
    Abstract:
    Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
    2009,20(5):1337-1348
    [Abstract] (28713) [HTML] (0) [PDF 1.06 M] (49436)
    Abstract:
    This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
    2014,25(9):1889-1908 , DOI: 10.13328/j.cnki.jos.004674
    [Abstract] (12274) [HTML] (5658) [PDF 550.98 K] (46421)
    Abstract:
    This paper first introduces the key features of big data in different processing modes and their typical application scenarios, as well as corresponding representative processing systems. It then summarizes three development trends of big data processing systems. Next, the paper gives a brief survey on system supported analytic technologies and applications (including deep learning, knowledge computing, social computing, and visualization), and summarizes the key roles of individual technologies in big data analysis and understanding. Finally, the paper lays out three grand challenges of big data processing and analysis, i.e., data complexity, computation complexity, and system complexity. Potential ways for dealing with each complexity are also discussed.
    2009,20(2):350-362
    [Abstract] (16883) [HTML] (0) [PDF 1.39 M] (45561)
    Abstract:
    This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
    2010,21(3):427-437
    [Abstract] (33451) [HTML] (0) [PDF 308.76 K] (44365)
    Abstract:
    Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
    2021,32(2):349-369 , DOI: 10.13328/j.cnki.jos.006138
    [Abstract] (9561) [HTML] (12321) [PDF 2.36 M] (43376)
    Abstract:
    Few-shot learning is defined as learning models to solve problems from small samples. In recent years, under the trend of training model with big data, machine learning and deep learning have achieved success in many fields. However, in many application scenarios in the real world, there is not a large amount of data or labeled data for model training, and labeling a large number of unlabeled samples will cost a lot of manpower. Therefore, how to use a small number of samples for learning has become a problem that needs to be paid attention to at present. This paper systematically combs the current approaches of few-shot learning. It introduces each kind of corresponding model from the three categories: fine-tune based, data augmentation based, and transfer learning based. Then, the data augmentation based approaches are subdivided into unlabeled data based, data generation based, and feature augmentation based approaches. The transfer learning based approaches are subdivided into metric learning based, meta-learning based, and graph neural network based methods. In the following, the paper summarizes the few-shot datasets and the results in the experiments of the aforementioned models. Next, the paper summarizes the current situation and challenges in few-shot learning. Finally, the future technological development of few-shot learning is prospected.
    2004,15(10):1493-1504
    [Abstract] (9369) [HTML] (0) [PDF 937.72 K] (43355)
    Abstract:
    Graphics processing unit (GPU) has been developing rapidly in recent years at a speed over Moor抯 law, and as a result, various applications associated with computer graphics advance greatly. At the same time, the highly processing power, parallelism and programmability available nowadays on the contemporary GPU provide an ideal platform on which the general-purpose computation could be made. Starting from an introduction to the development history and the architecture of GPU, the technical fundamentals of GPU are described in the paper. Then in the main part of the paper, the development of various applications on general purpose computation on GPU is introduced, and among those applications, fluid dynamics, algebraic computation, database operations, and spectrum analysis are introduced in detail. The experience of our work on fluid dynamics has been also given, and the development of software tools in this area is introduced. Finally, a conclusion is made, and the future development and the new challenge on both hardware and software in this subject are discussed.
    2013,24(11):2476-2497 , DOI: 10.3724/SP.J.1001.2013.04486
    [Abstract] (10931) [HTML] (0) [PDF 1.14 M] (42893)
    Abstract:
    Probabilistic graphical models are powerful tools for compactly representing complex probability distributions, efficiently computing (approximate) marginal and conditional distributions, and conveniently learning parameters and hyperparameters in probabilistic models. As a result, they have been widely used in applications that require some sort of automated probabilistic reasoning, such as computer vision and natural language processing, as a formal approach to deal with uncertainty. This paper surveys the basic concepts and key results of representation, inference and learning in probabilistic graphical models, and demonstrates their uses in two important probabilistic models. It also reviews some recent advances in speeding up classic approximate inference algorithms, followed by a discussion of promising research directions.
    2022,33(7):2464-2481 , DOI: 10.13328/j.cnki.jos.006585
    [Abstract] (1476) [HTML] (3502) [PDF 2.00 M] (42185)
    Abstract:
    Symbolic propagation methods based on linear abstraction play a significant role in neural network verification. This study proposes the notion of multi-path back-propagation for these methods. Existing methods are viewed as using only a single back-propagation path to calculate the upper and lower bounds of each node in a given neural network, being specific instances of the proposed notion. Leveraging multiple back-propagation paths effectively improves the accuracy of this kind of method. For evaluation, the proposed method is quantitatively compared using multiple back-propagation paths with the state-of-the-art tool DeepPoly on benchmarks ACAS Xu, MNIST, and CIFAR10. The experiment results show that the proposed method achieves significant accuracy improvement while introducing only a low extra time cost. In addition, the multi-path back-propagation method is compared with the Optimized LiRPA based on global optimization, on the dataset MNIST. The results show that the proposed method still has an accuracy advantage.
    2016,27(1):45-71 , DOI: 10.13328/j.cnki.jos.004914
    [Abstract] (30553) [HTML] (5487) [PDF 880.96 K] (41679)
    Abstract:
    Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
    2012,23(1):1-20 , DOI: 10.3724/SP.J.1001.2012.04100
    [Abstract] (16097) [HTML] (0) [PDF 1017.73 K] (39019)
    Abstract:
    Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
    2018,29(5):1471-1514 , DOI: 10.13328/j.cnki.jos.005519
    [Abstract] (6831) [HTML] (7284) [PDF 4.38 M] (38576)
    Abstract:
    Computer aided detection/diagnosis (CAD) can improve the accuracy of diagnosis,reduce false positive,and provide decision supports for doctors.The main purpose of this paper is to analyze the latest development of computer aided diagnosis tools.Focusing on the top four fatal cancer's incidence positions,major recent publications on CAD applications in different medical imaging areas are reviewed in this survey according to different imaging techniques and diseases.Further more,multidimentional analysis is made on the researches from image data sets,algorithms and evaluation methods.Finally,existing problems,research trend and development direction in the field of medical image CAD system are discussed.
    2020,31(7):2245-2282 , DOI: 10.13328/j.cnki.jos.006037
    [Abstract] (3372) [HTML] (7068) [PDF 967.02 K] (38247)
    Abstract:
    Ultrasonography is the first choice of imaging examination and preoperative evaluation for thyroid and breast cancer. However, ultrasonic characteristics of benign and malignant nodules are commonly overlapped. The diagnosis heavily relies on operator's experience other than quantitative and stable methods. In recent years, medical imaging analysis based on computer technology has developed rapidly, and a series of landmark breakthroughs have been made, which provides effective decision supports for medical imaging diagnosis. In this work, the research progress of computer vision and image recognition technologies in thyroid and breast ultrasound images is studied. A series of key technologies involved in automatic diagnosis of ultrasound images is the main lines of the work. The major algorithms in recent years are summarized and analyzed, such as ultrasound image preprocessing, lesion localization and segmentation, feature extraction and classification. Moreover, multi-dimensional analysis is made on the algorithms, data sets, and evaluation methods. Finally, existing problems related to automatic analysis of those two kinds of ultrasound imaging are discussed, research trend and development direction in the field of ultrasound images analysis are discussed.
    2015,26(1):62-81 , DOI: 10.13328/j.cnki.jos.004701
    [Abstract] (37765) [HTML] (6512) [PDF 1.04 M] (37794)
    Abstract:
    Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
    2012,23(4):962-986 , DOI: 10.3724/SP.J.1001.2012.04175
    [Abstract] (19325) [HTML] (0) [PDF 2.09 M] (37317)
    Abstract:
    Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
    2012,23(1):32-45 , DOI: 10.3724/SP.J.1001.2012.04091
    [Abstract] (18988) [HTML] (0) [PDF 408.86 K] (36825)
    Abstract:
    In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
    2005,16(5):857-868
    [Abstract] (20092) [HTML] (0) [PDF 489.65 K] (34837)
    Abstract:
    Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
    2013,24(1):77-90 , DOI: 10.3724/SP.J.1001.2013.04339
    [Abstract] (11469) [HTML] (0) [PDF 0.00 Byte] (34149)
    Abstract:
    Task parallel programming model is a widely used parallel programming model on multi-core platforms. With the intention of simplifying parallel programming and improving the utilization of multiple cores, this paper provides an introduction to the essential programming interfaces and the supporting mechanism used in task parallel programming models and discusses issues and the latest achievements from three perspectives: Parallelism expression, data management and task scheduling. In the end, some future trends in this area are discussed.
    2011,22(1):115-131 , DOI: 10.3724/SP.J.1001.2011.03950
    [Abstract] (14039) [HTML] (0) [PDF 845.91 K] (34057)
    Abstract:
    The Internet traffic model is the key issue for network performance management, Quality of Service management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
    2017,28(4):959-992 , DOI: 10.13328/j.cnki.jos.005143
    [Abstract] (23219) [HTML] (8027) [PDF 3.58 M] (33071)
    Abstract:
    The development of mobile internet and the popularity of mobile terminals produce massive trajectory data of moving objects under the era of big data. Trajectory data has spatio-temporal characteristics and rich information. Trajectory data processing techniques can be used to mine the patterns of human activities and behaviors, the moving patterns of vehicles in the city and the changes of atmospheric environment. However, trajectory data also can be exploited to disclose moving objects' privacy information (e.g., behaviors, hobbies and social relationships). Accordingly, attackers can easily access moving objects' privacy information by digging into their trajectory data such as activities and check-in locations. In another front of research, quantum computation presents an important theoretical direction to mine big data due to its scalable and powerful storage and computing capacity. Applying quantum computing approaches to handle trajectory big data could make some complex problem solvable and achieve higher efficiency. This paper reviews the key technologies of processing trajectory data. First the concept and characteristics of trajectory data is introduced, and the pre-processing methods, including noise filtering and data compression, are summarized. Then, the trajectory indexing and querying techniques, and the current achievements of mining trajectory data, such as pattern mining and trajectory classification, are reviewed. Next, an overview of the basic theories and characteristics of privacy preserving with respect to trajectory data is provided. The supporting techniques of trajectory big data mining, such as processing framework and data visualization, are presented in detail. Some possible ways of applying quantum computation into trajectory data processing, as well as the implementation of some core trajectory mining algorithms by quantum computation are also described. Finally, the challenges of trajectory data processing and promising future research directions are discussed.
    2010,21(2):344-358
    [Abstract] (8622) [HTML] (0) [PDF 1.01 M] (31314)
    Abstract:
    In this paper, the existing intrusion tolerance and self-destruction technology are integrated into autonomic computing in order to construct an autonomic dependability model based on SM-PEPA (semi-Markov performance evaluation process algebra) which is capable of formal analysis and verification. It can hierarchically anticipate Threats to dependability (TtD) at different levels in a self-management manner to satisfy the special requirements for dependability of mission-critical systems. Based on this model, a quantification approach is proposed on the view of steady-state probability to evaluate autonomic dependability. Finally, this paper analyzes the impacts of parameters of the model on autonomic dependability in a case study, and the experimental results demonstrate that improving the detection rate of TtD as well as the successful rate of self-healing will greatly increase the autonomic dependability.
    2011,22(6):1299-1315 , DOI: 10.3724/SP.J.1001.2011.03993
    [Abstract] (11855) [HTML] (0) [PDF 987.90 K] (30897)
    Abstract:
    Attribute-Based encryption (ABE) scheme takes attributes as the public key and associates the ciphertext and user’s secret key with attributes, so that it can support expressive access control policies. This dramatically reduces the cost of network bandwidth and sending node’s operation in fine-grained access control of data sharing. Therefore, ABE has a broad prospect of application in the area of fine-grained access control. After analyzing the basic ABE system and its two variants, Key-Policy ABE (KP-ABE) and Ciphertext-Policy ABE (CP-ABE), this study elaborates the research problems relating to ABE systems, including access structure design for CP-ABE, attribute key revocation, key abuse and multi-authorities ABE with an extensive comparison of their functionality and performance. Finally, this study discusses the need-to-be solved problems and main research directions in ABE.
    2009,20(3):524-545
    [Abstract] (17696) [HTML] (0) [PDF 1.09 M] (29316)
    Abstract:
    Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
    2014,25(1):37-50 , DOI: 10.13328/j.cnki.jos.004497
    [Abstract] (10789) [HTML] (6079) [PDF 929.87 K] (29149)
    Abstract:
    This paper surveys the state of the art of speech emotion recognition (SER), and presents an outlook on the trend of future SER technology. First, the survey summarizes and analyzes SER in detail from five perspectives, including emotion representation models, representative emotional speech corpora, emotion-related acoustic features extraction, SER methods and applications. Then, based on the survey, the challenges faced by current SER research are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, and presents detailed comparison and analysis between these methods.
    2018,29(10):2966-2994 , DOI: 10.13328/j.cnki.jos.005551
    [Abstract] (10765) [HTML] (7748) [PDF 610.06 K] (28048)
    Abstract:
    In recent years, the rapid development of Internet technology and Web applications has triggered the explosion of various data on the Internet, which generates a large amount of valuable knowledge. How to organize, represent and analyze these knowledge has attracted much attention. Knowledge graph was thus developed to organize these knowledge in a semantical and visualized manner. Knowledge reasoning over knowledge graph then becomes one of the hot research topics and plays an important role in many applications such as vertical search and intelligent question-answer. The goal of knowledge reasoning over knowledge graph is to infer new facts or identify erroneous facts according to existing ones. Unlike traditional knowledge reasoning, knowledge reasoning over knowledge graph is more diversified, due to the simplicity, intuitiveness, flexibility, and richness of knowledge representation in knowledge graph. Starting with the basic concept of knowledge reasoning, this paper presents a survey on the recently developed methods for knowledge reasoning over knowledge graph. Specifically, the research progress is reviewed in detail from two aspects:One-Step reasoning and multi-step reasoning, each including rule based reasoning, distributed embedding based reasoning, neural network based reasoning and hybrid reasoning. Finally, future research directions and outlook of knowledge reasoning over knowledge graph are discussed.
    2018,29(10):3068-3090 , DOI: 10.13328/j.cnki.jos.005607
    [Abstract] (9767) [HTML] (10414) [PDF 2.28 M] (27398)
    Abstract:
    Designing problems are ubiquitous in science research and industry applications. In recent years, Bayesian optimization, which acts as a very effective global optimization algorithm, has been widely applied in designing problems. By structuring the probabilistic surrogate model and the acquisition function appropriately, Bayesian optimization framework can guarantee to obtain the optimal solution under a few numbers of function evaluations, thus it is very suitable to solve the extremely complex optimization problems in which their objective functions could not be expressed, or the functions are non-convex, multimodal and computational expensive. This paper provides a detailed analysis on Bayesian optimization in methodology and application areas, and discusses its research status and the problems in future researches. This work is hopefully beneficial to the researchers from the related communities.
    2013,24(4):825-842 , DOI: 10.3724/SP.J.1001.2013.04369
    [Abstract] (9124) [HTML] (0) [PDF 1.09 M] (27242)
    Abstract:
    Honeypot is a proactive defense technology, introduced by the defense side to change the asymmetric situation of a network attack and defensive game. Through the deployment of the honeypots, i.e. security resources without any production purpose, the defenders can deceive attackers to illegally take advantage of the honeypots and capture and analyze the attack behaviors to understand the attack tools and methods, and to learn the intentions and motivations. Honeypot technology has won the sustained attention of the security community to make considerable progress and get wide application, and has become one of the main technical means of the Internet security threat monitoring and analysis. In this paper, the origin and evolution process of the honeypot technology are presented first. Next, the key mechanisms of honeypot technology are comprehensively analyzed, the development process of the honeypot deployment structure is also reviewed, and the latest applications of honeypot technology in the directions of Internet security threat monitoring, analysis and prevention are summarized. Finally, the problems of honeypot technology, development trends and further research directions are discussed.
    2004,15(11):1583-1594
    [Abstract] (9491) [HTML] (0) [PDF 1.57 M] (26971)
    Abstract:
    Uncertainty exists widely in the subjective and objective world. In all kinds of uncertainty, randomness and fuzziness are the most important and fundamental. In this paper, the relationship between randomness and fuzziness is discussed. Uncertain states and their changes can be measured by entropy and hyper-entropy respectively. Taken advantage of entropy and hyper-entropy, the uncertainty of chaos, fractal and complex networks by their various evolution and differentiation are further studied. A simple and effective way is proposed to simulate the uncertainty by means of knowledge representation which provides a basis for the automation of both logic and image thinking with uncertainty. The AI (artificial intelligence) with uncertainty is a new cross-discipline, which covers computer science, physics, mathematics, brain science, psychology, cognitive science, biology and philosophy, and results in the automation of representation, process and thinking for uncertain information and knowledge.
    2009,20(1):124-137
    [Abstract] (17386) [HTML] (0) [PDF 1.06 M] (26833)
    Abstract:
    The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
    2010,21(7):1605-1619
    [Abstract] (10196) [HTML] (0) [PDF 856.25 K] (26619)
    Abstract:
    The rapid development of Internet leads to an increase in system complexity and uncertainty. Traditional network management can not meet the requirement, and it shall evolve to fusion based Cyberspace Situational Awareness (CSA). Based on the analysis of function shortage and development requirement, this paper introduces CSA as well as its origin, conception, objective and characteristics. Firstly, a CSA research framework is proposed and the research history is investigated, based on which the main aspects and the existing issues of the research are analyzed. Meanwhile, assessment methods are divided into three categories: Mathematics model, knowledge reasoning and pattern recognition. Then, this paper discusses CSA from three aspects: Model, knowledge representation and assessment methods, and then goes into detail about main idea, assessment process, merits and shortcomings of novel methods. Many typical methods are compared. The current application research of CSA in the fields of security, transmission, survivable, system evaluation and so on is presented. Finally, this paper points the development directions of CSA and offers the conclusions from issue system, technical system and application system.
    2019,30(2):440-468 , DOI: 10.13328/j.cnki.jos.005659
    [Abstract] (9627) [HTML] (8818) [PDF 3.27 M] (26504)
    Abstract:
    Recent years, applying Deep Learning (DL) into Image Semantic Segmentation (ISS) has been widely used due to its state-of-the-art performances and high-quality results. This paper systematically reviews the contribution of DL to the field of ISS. Different methods of ISS based on DL (ISSbDL) are summarized. These methods are divided into ISS based on the Regional Classification (ISSbRC) and ISS based on the Pixel Classification (ISSbPC) according to the image segmentation characteristics and segmentation granularity. Then, the methods of ISSbPC are surveyed from two points of view:ISS based on Fully Supervised Learning (ISSbFSL) and ISS based on Weakly Supervised Learning (ISSbWSL). The representative algorithms of each method are introduced and analyzed, as well as the basic workflow, framework, advantages and disadvantages of these methods are detailedly analyzed and compared. In addition, the related experiments of ISS are analyzed and summarized, and the common data sets and performance evaluation indexes in ISS experiments are introduced. Finally, possible research directions and trends are given and analyzed.
    2011,22(3):381-407 , DOI: 10.3724/SP.J.1001.2011.03934
    [Abstract] (10711) [HTML] (0) [PDF 614.69 K] (26359)
    Abstract:
    The popularity of the Internet and the boom of the World Wide Web foster innovative changes in software technology that give birth to a new form of software—networked software, which delivers diversified and personalized on-demand services to the public. With the ever-increasing expansion of applications and users, the scale and complexity of networked software are growing beyond the information processing capability of human beings, which brings software engineers a series of challenges to face. In order to come to a scientific understanding of this kind of ultra-large-scale artificial complex systems, a survey research on the infrastructure, application services, and social interactions of networked software is conducted from a three-dimensional perspective of cyberization, servicesation, and socialization. Interestingly enough, most of them have been found to share the same global characteristics of complex networks such as “Small World” and “Scale Free”. Next, the impact of the empirical study on software engineering research and practice and its implications for further investigations are systematically set forth. The convergence of software engineering and other disciplines will put forth new ideas and thoughts that will breed a new way of thinking and input new methodologies for the study of networked software. This convergence is also expected to achieve the innovations of theories, methods, and key technologies of software engineering to promote the rapid development of software service industry in China.
    2018,29(7):2092-2115 , DOI: 10.13328/j.cnki.jos.005589
    [Abstract] (11144) [HTML] (8102) [PDF 2.52 M] (26201)
    Abstract:
    Blockchain is a distributed public ledger technology that originates from the digital cryptocurrency, bitcoin. Its development has attracted wide attention in industry and academia fields. Blockchain has the advantages of de-centralization, trustworthiness, anonymity and immutability. It breaks through the limitation of traditional center-based technology and has broad development prospect. This paper introduces the research progress of blockchain technology and its application in the field of information security. Firstly, the basic theory and model of blockchain are introduced from five aspects:Basic framework, key technology, technical feature, and application mode and area. Secondly, from the perspective of current research situation of blockchain in the field of information security, this paper summarizes the research progress of blockchain in authentication technology, access control technology and data protection technology, and compares the characteristics of various researches. Finally, the application challenges of blockchain technology are analyzed, and the development outlook of blockchain in the field of information security is highlighted. This study intends to provide certain reference value for future research work.
    2005,16(1):1-7
    [Abstract] (22821) [HTML] (0) [PDF 614.61 K] (25807)
    Abstract:
    The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
    2014,25(4):839-862 , DOI: 10.13328/j.cnki.jos.004558
    [Abstract] (15861) [HTML] (4592) [PDF 1.32 M] (25623)
    Abstract:
    Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
    2021,32(2):496-518 , DOI: 10.13328/j.cnki.jos.006140
    [Abstract] (6539) [HTML] (10880) [PDF 2.20 M] (25177)
    Abstract:
    Deep learning has achieved great success in the field of computer vision, surpassing many traditional methods. However, in recent years, deep learning technology has been abused in the production of fake videos, making fake videos represented by Deepfakes flooding on the Internet. This technique produces pornographic movies, fake news, political rumors by tampering or replacing the face information of the original videos and synthesizes fake speech. In order to eliminate the negative effects brought by such forgery technologies, many researchers have conducted in-depth research on the identification of fake videos and proposed a series of detection methods to help institutions or communities to identify such fake videos. Nevertheless, the current detection technology still has many limitations such as specific distribution data, specific compression ratio, and so on, far behind the generation technology of fake video. In addition, different researchers handle the problem from different angles. The data sets and evaluation indicators used are not uniform. So far, the academic community still lacks a unified understanding of deep forgery and detection technology. The architecture of deep forgery and detection technology research is not clear. In this review, the development of deep forgery and detection technologies are reviewed. Besides, existing research works are systematically summarize and scientifically classified. Finally, the social risks posed by the spread of Deepfakes technology are discussed, the limitations of detection technology are analyzed, and the challenges and potential research directions of detection technology are discussed, aiming to provide guidance for follow-up researchers to further promote the development and deployment of Deepfakes detection technology.
    2006,17(9):1848-1859
    [Abstract] (13073) [HTML] (0) [PDF 770.40 K] (25055)
    Abstract:
    In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
    2013,24(2):295-316 , DOI: 10.3724/SP.J.1001.2013.04336
    [Abstract] (10103) [HTML] (0) [PDF 0.00 Byte] (24974)
    Abstract:
    Under the new application mode, the traditional hierarchy data centers face several limitations in size, bandwidth, scalability, and cost. In order to meet the needs of new applications, data center network should fulfill the requirements with low-cost, such as high scalability, low configuration overhead, robustness and energy-saving. First, the shortcomings of the traditional data center network architecture are summarized, and new requirements are pointed out. Secondly, the existing proposals are divided into two categories, i.e. server-centric and network-centric. Then, several representative architectures of these two categories are overviewed and compared in detail. Finally, the future directions of data center network are discussed.
    2016,27(11):2855-2869 , DOI: 10.13328/j.cnki.jos.004932
    [Abstract] (3307) [HTML] (3212) [PDF 1.85 M] (24927)
    Abstract:
    With the proliferation of the Chinese social network (especially the rise of weibo), the productivity and lifestyle of the country's society is more and more profoundly influenced by the Chinese internet public events. Due to the lack of the effective technical means, the efficiency of information processing is limited. This paper proposes a public event information entropy calculation method. First, a mathematical modeling of event information content is built. Then, multidimensional random variable information entropy of the public events is calculated based on Shannon information theory. Furthermore, a new technical index of quantitative analysis to the internet public events is put forward, laying out a foundation for further research work.
    2023,34(2):625-654 , DOI: 10.13328/j.cnki.jos.006696
    [Abstract] (3970) [HTML] (6243) [PDF 3.04 M] (24923)
    Abstract:
    Source code bug (vulnerability) detection is a process of judging whether there are unexpected behaviors in the program code. It is widely used in software engineering tasks such as software testing and software maintenance, and plays a vital role in software functional assurance and application security. Traditional vulnerability detection research is based on program analysis, which usually requires strong domain knowledge and complex calculation rules, and faces the problem of state explosion, resulting in limited detection performance, and there is room for greater improvement in the rate of false positives and false negatives. In recent years, the open source community's vigorous development has accumulated massive amounts of data with open source code as the core. In this context, the feature learning capabilities of deep learning can automatically learn semantically rich code representations, thereby providing a new way for vulnerability detection. This study collected the latest high-level papers in this field, systematically summarized and explained the current methods from two aspects:vulnerability code dataset and deep learning vulnerability detection model. Finally, it summarizes the main challenges faced by the research in this field, and looks forward to the possible future research focus.
    2012,23(8):2058-2072 , DOI: 10.3724/SP.J.1001.2012.04237
    [Abstract] (10444) [HTML] (0) [PDF 800.05 K] (24883)
    Abstract:
    The Distributed denial of service (DDoS) attack is a major threat to the current network. Based on the attack packet level, the study divides DDoS attacks into network-level DDoS attacks and application-level DDoS attacks. Next, the study analyzes the detection and control methods of these two kinds of DDoS attacks in detail, and it also analyzes the drawbacks of different control methods implemented in different network positions. Finally, the study analyzes the drawbacks of the current detection and control methods, the development trend of the DDoS filter system, and corresponding technological challenges are also proposed.
    2005,16(10):1743-1756
    [Abstract] (10582) [HTML] (0) [PDF 545.62 K] (24807)
    Abstract:
    This paper presents a survey on the theory of provable security and its applications to the design and analysis of security protocols. It clarifies what the provable security is, explains some basic notions involved in the theory of provable security and illustrates the basic idea of random oracle model. It also reviews the development and advances of provably secure public-key encryption and digital signature schemes, in the random oracle model or the standard model, as well as the applications of provable security to the design and analysis of session-key distribution protocols and their advances.
    2013,24(8):1786-1803 , DOI: 10.3724/SP.J.1001.2013.04416
    [Abstract] (14308) [HTML] (0) [PDF 1.04 M] (24712)
    Abstract:
    Many specific application oriented NoSQL database systems are developed for satisfying the new requirement of big data management. This paper surveys researches on typical NoSQL database based on key-value data model. First, the characteristics of big data, and the key technique issues supporting big data management are introduced. Then frontier efforts and research challenges are given, including system architecture, data model, access mode, index, transaction, system elasticity, load balance, replica strategy, data consistency, flash cache, MapReduce based data process and new generation data management system etc. Finally, research prospects are given.
    2020,31(7):2127-2156 , DOI: 10.13328/j.cnki.jos.006052
    [Abstract] (6806) [HTML] (8428) [PDF 802.56 K] (24614)
    Abstract:
    Machine learning has become a core technology in areas such as big data, Internet of Things, and cloud computing. Training machine learning models requires a large amount of data, which is often collected by means of crowdsourcing and contains a large number of private data including personally identifiable information (such as phone number, id number, etc.) and sensitive information (such as financial data, health care, etc.). How to protect these data with low cost and high efficiency is an important issue. This paper first introduces the concept of machine learning, explains various definitions of privacy in machine learning and demonstrates all kinds of privacy threats encountered in machine learning, then continues to elaborate on the working principle and outstanding features of the mainstream technology of machine learning privacy protection. According to differential privacy, homomorphic encryption, and secure multi-party computing, the research achievements in the field of machine learning privacy protection are summarized respectively. On this basis, the paper comparatively analyzes the main advantages and disadvantages of different mechanisms of privacy preserving for machine learning. Finally, the developing trend of privacy preserving for machine learning is prospected, and the possible research directions in this field are proposed.
    2003,14(9):1621-1628
    [Abstract] (13527) [HTML] (0) [PDF 680.35 K] (24485)
    Abstract:
    Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.

External Links

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063