2024, 35(9):4013-4037. DOI: 10.13328/j.cnki.jos.007126
Abstract:Safety cases provide clear, comprehensive, and reliable arguments which mean that a system’s operation under a specific environment meets acceptable safety levels. In safety-critical sectors subject to regulations such as automotive, aviation, and nuclear industries, certification authorities often require the system to undergo a rigorous safety assessment process and thus demonstrate that the system complies with one or more safety standards. The safety case utilization in system development is an emerging technical means to express the safety attributes of safety-critical systems in a structured and comprehensive way. This study briefly introduces the four basic steps of building a safety case, including determining the goal, gathering evidence, constructing arguments, and evaluating the case, and then focuses on the key step of constructing arguments. Meanwhile, eight existing forms of safety case expressions are introduced in detail, containing goal structuring notation (GSN), claim-argument-evidence (CAE), and structured assurance case metamodel (SACM), with their strengths and weaknesses analyzed. Given the significant complexity of the materials required for safety cases, software tools are often adopted as practical methods for constructing and evaluating safety cases. Additionally, seven tools for developing and evaluating safety cases are compared, including astah system safety, gsn2x, NOR-STA, Socrates, ASCE, D-Case Editor, and AdvoCATE. Furthermore, this study delves into multiple challenges in building safety cases. These challenges include data reliability and integrity, complexity and uncertainty management, inconsistencies in regulations and standards, human factor engineering, rapid technological advancements, and challenges in team and interdisciplinary collaboration. Finally, a prospect is provided for the future development of safety cases to reveal their potential utilization and relevant research problems.
LIU Zong-Xin , YANG Peng-Fei , ZHANG Li-Jun , WU Zhi-Lin , HUANG Xiao-Wei
2024, 35(9):4038-4068. DOI: 10.13328/j.cnki.jos.007127
Abstract:Artificial intelligence (AI) has been widely applied to various aspects of lives. However, as the primary technique of realizing AI, neural networks can exhibit undefined behavior when faced with inputs outside of their training data or under adversarial attacks. In safety-critical fields such as autonomous driving and intelligent healthcare, undefined behavior can pose significant threats to human safety. Therefore, it is particularly crucial to employ complete verification methods to verify the properties of neural networks and ensure the correctness of the behavior. To enhance the verification efficiency, various complete neural network verification tools have proposed their optimization methods. However, the true influence of these methods has not been thoroughly explored, making it challenging for researchers to identify the most effective optimization directions. This paper introduces the common techniques in neural network verification and presents a universal framework for complete neural network verification. Within this framework, it focuses on discussing the optimization methods employed by state-of-the-art tools for constraint solving, branch selection, and boundary computation. Meanwhile, a series of experiments are designed to investigate the contributions of various acceleration techniques to the performance of each tool and to explore the most effective acceleration strategies and more promising optimization directions. Finally, valuable references can be provided for researchers in this field.
WANG Zhong-Ye , WU Shu-Shu , CAO Qin-Xiang
2024, 35(9):4069-4099. DOI: 10.13328/j.cnki.jos.007138
Abstract:Concurrent programs and systems are usually highly efficient and respond faster than serial systems, making them widely used in practice. However, concurrent programs and systems are often prone to error, which could bring fatal consequences in real world applications. Moreover, the non-determinism brought by concurrency is a major difficulty in the verification of concurrent programs. With formal verification, people could use interactive theorem provers to rigorously prove the correctness of a concurrent program. This study presents several correctness criteria for concurrent programs, which can be verified by interactive theorem proof techniques. The criteria include Hoare triple, linearizability, contextual refinement, and logical atomicity. Researchers usually use program logic to verify programs in an interactive theorem prover. This study summarizes the usage of concurrent separation logic, rely-guarantee-based logic, and relational Hoare logic in concurrent program verifications. It also surveys existing foundational verification tools and verification results by these techniques.
JIN Zhao , JIN Lu , ZHANG Bo-Wen , WU Qing-Shun , FENG Shuo , LI Guan-Feng , XU Ming-Liang
2024, 35(9):4100-4122. DOI: 10.13328/j.cnki.jos.007128
Abstract:As an efficient scheduling method, the intelligent planning of ammunition support operation of carrier-borne air craft is an important way to boost the development of advanced technology of aircraft carrier engineering. Ensuring the correctness of operation planning schemes under the high safety-critical attribute has gradually become the key technical bottleneck restricting the security of their practical application deployments. Aiming at the challenges posed by the difficulties in modeling ammunition support systems, describing operation execution behavior, and realizing formal verification tools, this study proposes a behavior model for ammunition support systems based on the separation logic idea, where the theorem prover Coq is employed to conduct formal verification on the operation planning scheme. First, a serialized two-tier resource heap model that conforms to the characteristics of ammunition support operation is proposed. Subsequently, a set of modeling language and operational semantics are constructed based on this model to describe job execution behavior. Finally, a proof assistant tool is implemented in Coq. The usability and engineering practicability of the tool are verified by interactive demonstrations of several typical ammunition support operation planning schemes.
ZHANG Kai-Wen , LIU Guan-Jun , SUN Yan-Tao , LI Xiao-Feng , GUAN Jian , XIE Yi , GU Bin
2024, 35(9):4123-4140. DOI: 10.13328/j.cnki.jos.007129
Abstract:Existing work on the analysis of real-time embedded multi-core systems using point-interval prioritized time Petri nets has the following limitations. (1) Point-interval prioritized time Petri nets only consider the case where the execution time of each task is a fixed value, but in many practical applications, the execution time of a task is generally within a range so that this kind of model cannot be used to analyze these applications. (2) There is a lack of automatic transformation from task dependency graphs to this point-interval prioritized time Petri nets, and thus it is inconvenient for engineering designers. (3) The case of mutually exclusive access to shared variables has not been considered. To address these issues, this study defines prioritized time Petri nets (Pri-TPN) to overcome the first limitation and introduces a task dependency graph with resource allocation and priority (TDG-RAP) to overcome the third limitation. It develops algorithms based on Pri-TPN for analyzing the worst-case execution time (WCET) and system deadlocks of tasks. Additionally, a tool software is developed to facilitate its use by engineering designers.
WANG Chao , JIA Qiao-Wen , Lü Yi , WU Peng
2024, 35(9):4141-4159. DOI: 10.13328/j.cnki.jos.007130
Abstract:Linearizability is universally accepted as a correctness criterion for concurrent objects. However, it has been shown that linearizability cannot be adopted as a correctness criterion for concurrent objects with random sentences. Thus, Golab et al. proposed the concept of strong linearizability, which adds prefix preservation properties based on the linearizability definition and has more constraints for concurrent objects. The research on strong linearizability focuses on the feasibility of generating strongly linearizable objects with certain basic objects, while only a few studies are about checking and verification of strong linearizability. This study investigates strong linearizability from two aspects including the verification algorithm and approach for proving non-strong linearizability of concurrent objects. First, it divides strong linearizability into fixed effective points and pure help and proves that the notion of fixed effective points is an extension of that of fixed linearizability points. Then, two verification algorithms for strong linearizability are put forward. One algorithm is based on checking the fixed linearizability points, and the other is based on the fixed effective points. Finally, an approach is provided for proving that the concurrent objects violate strong linearizability, and it helps verify that the Herlihy&Wing queue, a single-reader single-write register, and a snapshot object violate strong linearizability.
XIE Guo-Jun , YANG Huan-Huan , SHI Zheng-Pu , CHEN Gang
2024, 35(9):4160-4178. DOI: 10.13328/j.cnki.jos.007131
Abstract:The DH coordinate system plays a vital role in analyzing robot kinematics. In the robot control system built upon the DH coordinate system, the robot structure complexity poses challenges to developing a secure control system. Depending solely on manual methods can introduce system vulnerabilities and security hazards, thereby endangering the overall safety of the robot. The formal method becomes a promising direction to design, develop, and verify hardware and software systems by deductive reasoning and code extraction. Based on this, this study designs a formal verification framework for robot forward kinematics based on the DH calibration, during which the robot kinematics theory is rigorously proven and the correctness of the control algorithm in Coq is verified to ensure the motion safety of the robot. First, it formally models the DH coordinate system, defines the transformation matrix among adjacent coordinate systems, and verifies the equivalence of this transformation matrix with the composite helical motion. Then, the forward kinematics of the robotic arm is formally defined, with its motion detachability verified. Subsequently, this study formally models the common connecting rod structures and robots in industrial robots and verifies their forward kinematics. Finally, the code extraction from Coq to OCaml is implemented, and the extracted code is analyzed and verified.
ZHANG Lin-Yan , LI Xi-Meng , SHI Zhi-Ping , GUAN Yong , CAO Qin-Xiang , ZHANG Qian-Ying
2024, 35(9):4179-4192. DOI: 10.13328/j.cnki.jos.007132
Abstract:Operating systems are the key foundational components of the software stacks employed in many safety-critical scenarios. A tiny error or loophole in the operating system may cause major failures of the overall software system, resulting in huge economic losses or endangering human lives. Thus, the correctness of the operating system should be verified to reduce the number of such accidents. Traditional testing methods cannot guarantee the exhaustive detection of potential errors in the target system. Therefore, it is necessary to adopt formal methods based on strict mathematical theories for verifying operating systems. In an operating system, mutexes are utilized to coordinate the access of shared resources by tasks and they are a typical means of task synchronization. The functional correctness of mutexes is the key to the correct functioning of multi-task applications. Based on the theorem proof method, this study conducts formal verification on the code of the mutex module of a preemptive microkernel in an interactive theorem prover Coq, gives the formal specifications of the interface functions of this module, and formally proves the functional correctness of these interface functions.
CHEN Jin-Fu , FENG Qiao-Wei , CAI Sai-Hua , SHI Deng-Zhou , Rexford Nii Ayitey SOSU
2024, 35(9):4193-4217. DOI: 10.13328/j.cnki.jos.007133
Abstract:As blockchain technology is widely employed in all walks of life, the architecture of blockchain systems becomes increasingly more complex, which raises the number of security issues. At present, traditional vulnerability detection methods such as fuzz testing and symbol execution are adopted in blockchain systems, but these techniques cannot detect unknown vulnerabilities effectively. To improve the security of blockchain systems, this study proposes a vulnerability detection model for blockchain systems (VDMBS) based on the formal method. This model integrates multiple security factors including system migration state, security property and trust relationship among nodes, and provides a vulnerability model building method based on business process execution language (BPEL). Finally, the effectiveness of the proposed vulnerability detection model is verified on a blockchain-based e-voting election system by NuSMV, and the experimental results show that compared with five existing formal testing tools, the proposed VDMBS model can detect more blockchain system logic vulnerabilities and smart contract vulnerabilities.
ZUO Zheng-Kang , SUN Huan , WANG Chang-Jing , YOU Zhen , HUANG Qing , WANG Chang-Chang
2024, 35(9):4218-4241. DOI: 10.13328/j.cnki.jos.007134
Abstract:As a recursive method for finding the optimal solution to a problem, dynamic programming mainly solves the original problem by first solving the subproblems and then combining their solutions. Due to a large number of dependencies and constraints among its subproblems, the validation procedure is laborious, and especially the correctness verification of imperative dynamic programming algorithms is a challenge. Based on the functional modeling and verification of dynamic programming algorithms Isabelle/HOL, this study avoids dealing with complex dependencies and constraints in proving correctness by verifying the equivalence of imperative dynamic programming algorithms and their programs. Meanwhile, a framework for the design of imperative dynamic programming algorithmic programs and their mechanized verification are proposed. First, according to the optimization method (memo method) and properties (optimal substructure property and subproblems overlapping property) of dynamic programming algorithms, the problem specification is described, the recursive relations are inductively derived, and the loop invariants are formally constructed. Then, the IMP (minimalistic imperative programming language) code is generated based on the recursive relations. Second, the problem specification, loop invariants, and generated IMP code are fed into VCG (verification condition generator) to generate the verification condition for correctness automatically. Additionally, the verification condition is then mechanically verified in the Isabelle/HOL theorem prover. The algorithm is initially designed in the general form of an imperative dynamic programming algorithm and further instantiated to obtain specific algorithms. Finally, the effectiveness of the proposed framework is validated by case studies to provide references for automated derivation and verification of dynamic programming algorithms.
ZUO Zheng-Kang , KE Yu-Han , HUANG Qing , WANG Yue-Kun , ZENG Zhi-Cheng , WANG Chang-Jing
2024, 35(9):4242-4264. DOI: 10.13328/j.cnki.jos.007135
Abstract:A Trie structure is a type of search tree that organizes information by search keywords and can be employed to efficiently store and search a collection of strings. Meanwhile, Nipkow et al. provided Isabelle modeling and verification for Trie implementation. However, there is a significant amount of redundancy in the Trie’s storage and operation, resulting in poor space utilization, and only the English single-mode lookup is considered. Therefore, based on the idea that the index is the key value, this study proposes the Trie+ structure, which can reduce storage space by 50% compared to the traditional structure of storing the index and key value separately, and greatly improve space utilization. Furthermore, the Trie+ structure’s lookup, insertion, and deletion operations are modeled as functions and rigorously mechanized to ensure their correctness and reliability. Additionally, a generalized verification protocol for matching algorithms is proposed to solve the correctness verification and problems of a series of matching algorithms. Finally, a functional Chinese-English hybrid multi-pattern matching algorithm is modeled and verified by the Trie+ structure and the matching algorithm’s universal verification protocol, and the Bug of prefix termination of pattern strings in multi-pattern matching algorithms of existing research based on the full hash Trie is discovered and solved. The proposed Trie+ structure and verification protocol have theoretical and application significance in improving the space utilization of the Trie structure and verifying the matching algorithm.
XU Heng , HUANG Zhi-Qiu , HU Jun , TAO Chuan-Qi , WANG Jin-Yong , SHI Fan
2024, 35(9):4265-4286. DOI: 10.13328/j.cnki.jos.007136
Abstract:During the automatic flight of civil aircraft, the transition of automatic flight system modes is an important factor affecting safety. With the rapid growth of functions and complexity of modern civil aircraft airborne systems, the safety analysis and verification of automatic flight system mode transition in the requirement phase has become an important challenge. The complexity of flight mode transition is not only reflected in the interaction among multiple flight modes necessary during the automatic flight but also in the complex data and control cross-linking relationships between the mode transition process and the external environment. Additionally, these cross-linking relationships imply the safety properties of the flight mode transition process, which increases the application difficulty of formal methods. This study proposes a domain specific modeling and verification framework. First, a modeling language MTRDL for transition requirements of automatic flight system modes and a modeling method based on extended SysML language are put forward. Secondly, the safety property-assisted protocol method based on safety requirement templates is proposed. Finally, the effectiveness of the method in the requirement verification of automatic flight system mode transition is proven by a case study of a certain aircraft’s itemized requirements.
WEI Xiao-Min , DONG Yun-Wei , SUN Cong , LI Xing-Hua , MA Jian-Feng
2024, 35(9):4287-4309. DOI: 10.13328/j.cnki.jos.007137
Abstract:Many complex embedded systems are mixed-criticality systems (MCSs). MCSs are often required to operate with the specified criticality level, but they may be subject to hazards that can induce random errors and burst errors, which may result in the abortion of an executing thread or even system failures. Current research only concentrates on schedulability analysis for MCSs and fails to further analyze system safety and consider the dependency relationship among threads. Taking random errors and burst errors as the research objects, this study proposes an architecture-based MCS safety analysis method with the integration of fault propagation analysis. Meanwhile, architecture analysis and design language (AADL) is employed to characterize the dependency relations among components. To compensate for the shortcomings of AADL, this study creates new AADL properties (AADL burst error properties) and proposes new thread state machine (burst error-based thread state machine) semantics to describe the thread execution process with burst errors. Additionally, model transformation rules and assembly methods are proposed to apply probabilistic model checking for safety analysis, and PRISM models are derived from AADL models. Two formulae are also formulated to obtain quantitative safety properties for verifying occurrence probabilities of failures, and qualitative safety properties for generating corresponding witnesses to figure out propagation paths for fault propagation analysis respectively. Finally, the effectiveness of the proposed method is verified by adopting a power boat autopilot (PBA) system.
MA Run-Zhe , TIAN Cong , WANG Wen-Sheng , DUAN Zhen-Hua
2024, 35(9):4310-4323. DOI: 10.13328/j.cnki.jos.007139
Abstract:The ω automata determinization is an important part of theoretical computer research and has important applications in formal verification, sequential logic, and model checking. Since the Büchi automata was proposed for half a century, its deterministic algorithm has always been the basis. Different from the exploration of the upper and lower bounds of its size in theory at the beginning, the use of ever-changing high-performance computer technology is an effective auxiliary means. To deeply study the implementation process of the deterministic algorithm of non-deterministic Büchi automata, this study focuses on the question of whether the index can continue to be optimized and realizes the deterministic research tool NB2DR. The non-deterministic Büchi automata can be determined efficiently, and the determinization algorithm can be improved by analyzing the determinization process provided by the tool. A theoretical upper bound on correlation indexing is explored through an in-depth analysis of the indexing of generated deterministic ω automata. The tool also realizes that the deterministic Rabin automaton family can be generated according to the size of the required Büchi automaton and the alphabet parameters. It can also be reversed to generate all the Büchi automata families according to the size of the specified index required and test the equivalence of ω functions.
YANG Peng , ZHA Xian-Yu , ZHAO Guang-Zhen , LIN Xi
2024, 35(9):4324-4345. DOI: 10.13328/j.cnki.jos.006951
Abstract:Fact verification is intended to check whether a textual statement is supported by a given piece of evidence. Due to the structural dependence and implicit content of tables, the task of fact verification with tables as the evidence still faces many challenges. Existing literature has either used logical expressions to parse statements based on tabular evidence or designed table-aware neural networks to encode statement-table pairs and thereby accomplish table-based fact verification tasks. However, these approaches fail to fully utilize the implicit tabular information behind the statements, which leads to the degraded inference performance of the model. Moreover, Chinese statements based on tabular evidence have more complex syntax and semantics, which also adds to the difficulties in model inference. For this reason, the study proposes a method of fact verification with Chinese tabular data based on the capsule heterogeneous graph attention network (CapsHAN). This method can fully understand the structure and semantics of statements. On this basis, the tabular information implied by the statements is mined and utilized to effectively improve the accuracy of table-based fact verification tasks. Specifically, a heterogeneous graph is constructed by performing syntactic dependency parsing and named entity recognition of statements. Subsequently, the graph is learned and understood by the heterogeneous graph attention network and the capsule graph neural network, and the obtained textual representation of the statements is sliced together with the textual representation of the encoded tables. Finally, the result is predicted. Further, this study also attempts to address the problem that the datasets of fact verification based on Chinese tables are scarce and thus unable to support the performance evaluation of table-based fact verification methods. For this purpose, the study transforms the mainstream English table-based fact verification datasets TABFACT and INFOTABS into Chinese and constructs a dataset that is based on the uniform content label (UCL) national standard and specifically tailored to the characteristics of Chinese tabular data. This dataset, namely, UCLDS, takes Wikipedia infoboxes as evidence of manually annotated natural language statements and labels them into three classes: entailed, contradictory, and neutral. UCLDS outperforms the traditional datasets TABFACT and INFOTABS in supporting both single-table and multi-table inference. The experimental results on the above three Chinese benchmark datasets show that the proposed model outperforms the baseline model invariably, demonstrating its superiority for Chinese table-based fact verification tasks.
GUO Zheng-Xin , DAI Yu-Hao , GUI Lin-Qing , SHENG Bi-Yun , XIAO Fu
2024, 35(9):4346-4364. DOI: 10.13328/j.cnki.jos.006955
Abstract:The detection of the human respiration waveform in the sleep state is crucial for applications in intelligent health care as well as medical and healthcare in that different respiration waveform patterns can be examined to analyze sleep quality and monitor respiratory diseases. Traditional respiration sensing methods based on contact devices cause various inconveniences to users. In contrast, contactless sensing methods are more suitable for continuous monitoring. However, the randomness of the device deployment, sleep posture, and human motion during sleep severely restrict the application of contactless respiration sensing solutions in daily life. For this reason, the study proposes a detection method for the human respiration waveform in the sleep state based on impulse radio-ultra wide band (IR-UWB). On the basis of the periodic changes in the propagation path of the wireless pulse signal caused by the undulation of the human chest during respiration in the sleep state, the proposed method generates a fine-grained human respiration waveform and thereby achieves the real-time output of the respiration waveform and high-precision respiratory rate estimation. Specifically, to obtain the position of the human chest during respiration from the received wireless radio-frequency (RF) signals, this study proposes the indicator respiration energy ratio based on IR-UWB signals to estimate the target position. Then, it puts forward a vector projection method based on the in-phase/quadrature (I/Q) complex plane and a method of projection signal selection based on the circumferential position of the respiration vector to extract the characteristic human respiration waveform from the reflected signal. Finally, a variational encoder-decoder network is leveraged to achieve the fine-grained recovery of the respiratory waveform in the sleep state. Extensive experiments and tests are conducted under different conditions, and the results show that the human respiration waveforms monitored by the proposed method in the sleep state are highly similar to the actual waveforms captured by commercial respiratory belts. The average error of the proposed method in estimating the human respiratory rate is 0.229 bpm, indicating that the method can achieve high-precision detection of the human respiration waveform in the sleep state.
OUYANG Ya-Wen , GAO Yuan , ZONG Shi , BAO Yu , DAI Xin-Yu
2024, 35(9):4365-4376. DOI: 10.13328/j.cnki.jos.006956
Abstract:It is essential to detect out-of-distribution (OOD) training set samples for a safe and reliable machine learning system. Likelihood-based generative models are popular methods to detect OOD samples because they do not require sample labels during training. However, recent studies show that likelihoods sometimes fail to detect OOD samples, and the failure reason and solutions are under explored, especially for text data. Therefore, this study investigates the text failure reason from the views of the model and data: insufficient generalization of the generative model and prior probability bias of the text. To tackle the above problems, the study proposes a new OOD text detection method, namely Pobe. To address insufficient generalization of the generative model, the study increases the model generalization via KNN retrieval. Next, to address the prior probability bias of the text, the study designs a strategy to calibrate the bias and improve the influence of probability bias on OOD detection by a pre-trained language model and demonstrates the effectiveness of the strategy according to Bayes’ theorem. Experimental results over a wide range of datasets show the effectiveness of the proposed method. Specifically, the average AUROC is over 99%, and FPR95 is below 1% under eight datasets.
YE Jing , XIANG Lu , ZONG Cheng-Qing
2024, 35(9):4377-4389. DOI: 10.13328/j.cnki.jos.006963
Abstract:Aspect-level sentiment classification task, which aims to determine the sentiment polarity of a given aspect, has attracted increasing attention due to its broad applications. The key to this task is to identify contextual descriptions relevant to the given aspect and predict the aspect-related sentiment orientation of the author according to the context. Statistically, it is found that close to 30% of reviews convey a clear sentiment orientation without any explicit sentiment description of the given aspect, which is called implicit sentiment expression. Recent attention mechanism-based neural network methods have gained great achievement in sentiment analysis. However, this kind of method can only capture explicit aspect-related sentiment descriptions but fails to effectively explore and analyze implicit sentiment, and it often models aspect words and sentence contexts separately, which makes the expression of aspect words lack contextual semantics. To solve the above two problems, this study proposes an aspect-level sentiment classification method that integrates local aspect information and global sentence context information and improves the classification performance of the model by curriculum learning according to different classification difficulties of implicit and explicit sentiment sentences. Experimental results show that the proposed method not only has a high accuracy in identifying the aspect-related sentiment orientation of explicit sentiment sentences but also can effectively learn the sentiment categories of implicit sentiment sentences.
SHAO Hao , WANG Lun-Wen , ZHU Ran-Gang , LIU Hui
2024, 35(9):4390-4407. DOI: 10.13328/j.cnki.jos.006970
Abstract:Existing hypergraph network representation methods need to analyze the full batch nodes and hyperedges to recursively extend the neighbors across layers, which brings huge computational costs and leads to lower generalization accuracy due to over-expansion. To solve this problem, this study proposes a hypergraph network representation method based on importance sampling. First, the method treats nodes and hyperedges as two sets of independent identically distributed samples that satisfy specific probability measures and interprets the structural feature interactions of the hypergraph in an integral form. Second, it designs a neighbor importance sampling rule with learnable parameters and calculates sampling probabilities based on the physical relations and features of nodes and hyperedges. A fixed number of objects are recursively acquired layer by layer to construct a smaller sampled adjacency matrix. Finally, the spatial features of the entire hypergraph are approximated using Monte Carlo methods. In addition, with the advantage of physically informed neural networks, the sampling variance that needs to be reduced is added to the hypergraph neural network as a physical constraint to obtain sampling rules with better generalization capability. Extensive experiments on multiple datasets show that the method proposed in this study can obtain more accurate hypergraph representation results with a faster convergence rate.
ZHAO Xing-Wang , WANG Shu-Jun , LIU Xiao-Lin , LIANG Ji-Ye
2024, 35(9):4408-4424. DOI: 10.13328/j.cnki.jos.006995
Abstract:Multi-view clustering has attracted more and more attention in the fields of image processing, data mining, and machine learning. Existing multi-view clustering algorithms have two shortcomings. One is that in the process of graph construction, only the pairwise relationship between each view data is considered to generate an affinity matrix, which lacks the characterization of neighborhood relationships; the second is that existing methods separate the process of multi-view information fusion and clustering, thereby reducing the clustering performance of the algorithm. Therefore, this study proposes a more accurate and robust joint spectral embedding multi-view clustering algorithm based on bipartite graphs. Firstly, based on the multi-view subspace clustering idea,bipartite graphs are constructed, and similar graphs are generated.Then the spectral embedding matrix of similar graphs is used to perform graph fusion. Secondly, by considering the importance of each view during the fusion process, weight constraints are applied, and an indicator matrix is introduced to obtain the final clustering result. A model is proposed to optimize the bipartite graph, embedding matrix, and clustering indicator matrix within a single framework. In addition, a fast optimization strategy for solving the model is provided, which decomposes the optimization problem into small module subproblems and efficiently solves them through iterative steps. The proposed algorithm and existing multi-view clustering algorithms have been experimentally analyzed on real data sets. Experimental results show that the proposed algorithm is more effective and robust in dealing with multi-view clustering problems compared with existing methods.
ZHOU Yang-Tao , LI Qing-Shan , CHU Hua , LI Jia-Nan , GAO Ming-Biao , WEI Biao-Biao
2024, 35(9):4425-4447. DOI: 10.13328/j.cnki.jos.006962
Abstract:With the rapid development of Internet information technologies, the explosive growth of online learning resources has caused the problem of “information overload” and “learning disorientation”. In the absence of expert guidance, it is difficult for users to identify their learning demands and select the appropriate content from the vast amount of learning resources. Educational domain recommendation methods have received a lot of attention from researchers in recent years because they can provide personalized recommendations of learning resources based on the historical learning behaviors of users. However, the existing educational domain recommendation methods ignore the modeling of complex relationships among knowledge points in learning demand perception and fail to consider the dynamic changes of users’ learning demands, which leads to inaccurate learning resource recommendations. To address the above problems, this study proposes a knowledge point recommendation method based on static and dynamic learning demand perception, which models users’ learning behaviors under complex knowledge association by combining static perception and dynamic perception. For static learning demand perception, this study innovatively designs an attentional graph convolutional network based on the first-course-following meta-path guidance of knowledge points, which can accurately capture users’ static learning demands at the fine-grained knowledge point level by modeling the complex constraints of the first-course-following relationship between knowledge points and eliminating the interference of other non-learning demand factors. For dynamic learning demand perception, the method aggregates knowledge point embeddings to characterize users’ knowledge levels at different moments by taking courses as units and then uses a recurrent neural network to encode users’ knowledge level sequences, which can effectively explore the dynamic learning demands hidden in users’ knowledge level changes. Finally, this study fuses the obtained static and dynamic learning demands, models the compatibility between static and dynamic learning demands in the same framework, and promotes the complementarity of these two learning demands to achieve fine-grained and personalized knowledge point recommendations. Experiments show that the proposed method can effectively perceive users’ learning demands, provide personalized knowledge point recommendations on two publicly available datasets, and outperform the mainstream recommendation methods in terms of various evaluation metrics.
ZHANG An-Zhen , SI Jia-Yu , LIANG Tian-Yu , ZHU Rui , QIU Tao
2024, 35(9):4448-4468. DOI: 10.13328/j.cnki.jos.006972
Abstract:Subset repair for inconsistent data is an important research problem in the field of data cleaning. Most of the existing methods are based on integrity constraint rules and adopt the principle of the minimum number of deleted tuples for subset repair. However, these methods take no account of the quality of deleted tuples, and the repair accuracy is low. Therefore, this study proposes a subset repair method combining rules and probabilities. The probability of inconsistent tuples is modeled so that the average probability of correct tuples is greater than that of wrong tuples, and the optimal subset repair with the smallest sum of the probability of deleted tuples is calculated. In addition, in order to reduce the time overhead of calculating the probability of inconsistent tuples, this study proposes an efficient error detection method to reduce the size of inconsistent tuples. Experimental results on real data and synthetic data verify that the proposed method outperforms the state-of-the-art subset repair method in terms of accuracy.
ZHU Xiao , SHAO Xin-Yue , ZHANG Yan , WANG Hong-Zhi
2024, 35(9):4469-4492. DOI: 10.13328/j.cnki.jos.006977
Abstract:The database performance is affected by the database configuration parameters. The quality of parameter settings will directly affect the performance of the database. Therefore, the quality of the database parameter tuning method is important. However, traditional database parameter tuning methods have many limitations, such as the inability to make full use of historical parameter tuning data, wasting time and human resources, and so on. The counterfactual interpretation methods aim to change the original prediction to the expected prediction by making small modifications to the original data. The method plays a role of suggestion, and this can be used for database configuration optimization, namely, making small modifications to the database configuration to optimize the performance of the database. Therefore, this study proposes a counterfactual interpretation method for database configuration optimization. For databases with poor performance under specific load conditions, this method can modify the database configuration and generate corresponding database configuration counterfactuals to optimize database performance. This study conducts two kinds of experiments to evaluate the counterfactual interpretation method and verify the effect of optimizing the database. The experimental results show that the counterfactual interpretation methods proposed in this study are better than other typical counterfactual interpretation methods in terms of various evaluation indicators, and the generated counterfactuals can effectively improve database performance.