knowledge graph

1 Lifelong Representation Learning of Multi-sourced Knowledge Graphs via Linked Entity Replay

SUN Ze-Qun , CUI Yuan-Ning , HU Wei

2023, 34(10):4501-4517. DOI: 10.13328/j.cnki.jos.006887

[Abstract](1622) [HTML](3384) [PDF 2.88 M](4309)

Abstract:
Knowledge graphs (KGs) store a great amount of structured knowledge and semantic information. They have been widely used by many knowledge-powered intelligent applications. With the rapid development of these applications, their requirements for knowledge also change. A single KG usually suffers from the incompleteness issue and is therefore unable to meet the requirement. This suggests an urgent demand for supporting new data sources and fusing multi-sourced knowledge. The conventional paradigm for KG representation learning and application only considers a single KG while ignores the knowledge transfer between different sources. Joint representation learning on multi-sourced KGs can bring performance improvement, but it cannot support the extended representation learning of new KGs. To resolve these issues, this paper presents a new paradigm, i.e., lifelong representation learning on multi-sourced KGs. Given a sequence of multi-sourced KGs, lifelong representation learning aims at benefiting from the previously-learned KG and embedding model when learning a new KG. To this end, this study proposes a lifelong learning framework based on linked entity replay. First, it designs a Transformer-based KG embedding model that leverages relation correlations for link prediction between entities. Second, it proposes a linked subgraph generation method. It leverages the entity alignment between different sources to build the subgraph and replays the linked entities to enable lifelong learning and knowledge transfer. Finally, it uses a dynamic model structure with model parameters and embeddings stored for each KG to avoid catastrophic forgetting. Experiments on benchmarks show that the proposed KG embedding model can achieve the state-of-the-art performance in link prediction, and the lifelong representation learning framework is effective and efficient in multi-sourced knowledge transfer compared with baselines.

2 Survey on Representation Learning Methods of Knowledge Graph for Link Prediction

DU Xue-Ying , LIU Ming-Wei , SHEN Li-Wei , PENG Xin

2024, 35(1):87-117. DOI: 10.13328/j.cnki.jos.006902

[Abstract](2725) [HTML](3419) [PDF 8.85 M](5849)

Abstract:
As an important cornerstone of artificial intelligence, knowledge graphs can extract and represent a priori knowledge from massive data on the Internet, which greatly solves the bottleneck problem of the poor interpretability of cognitive decisions of intelligent systems and plays a key role in the construction and application of intelligent systems. As the application of knowledge graph technology continues to deepen, the knowledge graph completion that aims to solve the problem of the incompleteness of graphs is imminent. Link prediction is the task of predicting the missing entities and relations in the knowledge graph, which is indispensable in the construction and completion of the knowledge graph. The full exploitation of the hidden relations in the knowledge graph and the use of massive entities and relations for computation require the conversion of the symbolic representations of information into the numerical form, i.e., knowledge graph representation learning. Hence, link prediction-oriented knowledge graph representation learning has become a popular research topic in the field of knowledge graphs. This study systematically introduces the latest research progress of link prediction-oriented knowledge graph representation learning methods from the basic concepts of link prediction and representation learning. Specifically, the research progress is discussed in detail in terms of knowledge representation forms and algorithmic modeling methods. The development of the knowledge representation forms is used as a clue to introduce the mathematical modeling of link prediction tasks in the knowledge representation forms of binary relations, multi-relations, and hyper-relations. On the basis of the representation learning modeling, the existing methods are refined into four types of models: translation distance models, tensor decomposition models, traditional deep learning models, and graph neural network models. The implementation methods of each type are described in detail together with representative models for solving link prediction tasks with different relational metrics. The common datasets and criteria for link prediction are then introduced, and on this basis, the link prediction effects of the four types of knowledge representation learning models under the knowledge representation forms of binary relations, multi-relations, and hyper-relations are presented in a comparative analysis. Finally, the future development trends are given in terms of model optimization, knowledge representation forms, and problem scope.

3 FS-Net: Frequency Statistical Network for Temporal Knowledge Graph Reasoning

LIU Kang-Zheng , ZHAO Feng , JIN Hai

2023, 34(10):4518-4532. DOI: 10.13328/j.cnki.jos.006885

[Abstract](1688) [HTML](3110) [PDF 2.96 M](4453)

Abstract:
Temporal knowledge graph (TKG) reasoning has attracted significant attention of researchers. Existing TKG reasoning methods have made great progress through modeling historical information. However, the time-variability problem and unseen entity (relation) problem are still two major challenges that hinder the further improvement of this field. Moreover, since the structural information and temporal dependencies of the historical subgraph sequence have to be modeled, the traditional embedding-based methods often have high time consumption in the training and predicting processes, which greatly limits the application of the reasoning model in real-world scenarios. To address these issues, this study proposes a frequency statistical network for TKG reasoning, namely FS-Net. On the one hand, FS-Net continuously generates time-varying scores for the predictions at the changing timestamps based on the latest short-term historical fact frequency statistics. On the other hand, based on the fact frequency statistics at the current timestamp, FS-Net supplements the historical unseen entities (relations) for the predictions; specially, FS-Net does not need training, and has a very high time efficiency. The experiments on two TKG benchmark datasets demonstrate that FS-Net has a great improvement compared with the baseline models.

4 ?Multi-source Chinese Medical Knowledge Graph Entity ?Alignment via Entity Semantics and Ontology Information

DING Rui-Qing , ZHAO Jun-Feng , WANG Le-Ye

2025, 36(11):5178-5196. DOI: 10.13328/j.cnki.jos.007370

[Abstract](197) [HTML](0) [PDF 3.97 M](505)

Abstract:
Knowledge graph (KG), as structured representations of knowledge, has a wide range of applications in the medical field. Entity alignment, which involves identifying equivalent entities across different KGs, is a fundamental step in constructing large-scale KGs. Although extensive research has focused on this issue, most of it has concentrated on aligning pairs of KGs, typically by capturing the semantic and structural information of entities to generate embeddings, followed by calculating embedding similarity to identify equivalent entities. This study identifies the problem of alignment error propagation when aligning multiple KGs. Given the high accuracy requirements for entity alignment in medical contexts, we propose a multi-source Chinese medical knowledge graph entity alignment method (MSOI-Align) that integrates entity semantics and ontology information. Our method pairs multiple KGs and uses representation learning to generate entity embeddings. It also incorporates both the similarity of entity names and ontology consistency constraints, leveraging a large language model to filter a set of candidate entities. Subsequently, based on triadic closure theory and the large language model, MSOI-Align automatically identifies and corrects the propagation of alignment errors for the candidate entities. Experimental results on four Chinese medical knowledge graphs show that MSOI-Align significantly enhances the precision of the entity alignment task, with the Hits@1 metric increasing from 0.42 to 0.92 compared to the state-of-the-art baseline. The fused knowledge graph, CMKG, contains 13 types of ontologies, 190000 entities, and approximately 700000 triplets. Due to copyright restrictions on one of the KGs, we are releasing the fusion of the other three KGs, named OpenCMKG.

5 Knowledge Collaborative Fine-tuning for Low-resource Knowledge Graph Completion

ZHANG Ning-Yu , XIE Xin , CHEN Xiang , DENG Shu-Min , YE Hong-Bin , CHEN Hua-Jun

2022, 33(10):3531-3545. DOI: 10.13328/j.cnki.jos.006628

[Abstract](3219) [HTML](4521) [PDF 2.03 M](6501)

Abstract:
Knowledge graph completion can make the knowledge graph more complete. Unfortunately, most of existing methods on knowledge graph completion assume that the entities or relations in the knowledge graph have sufficient triple instances. Nevertheless, there are great deals of long-tail triple sin general domains. Furthermore, it is challenging to obtain a large amount of high-quality annotation data in vertical domains. To address these issues, a knowledge collaborative fine-tuning approach is proposed for low-resource knowledge graph completion. The structured knowledge is leveraged to construct the initial prompt template and the optimal templates, labels, and model parameters are learnt through a collaborative fine-tuning algorithm. The proposed method leverages the explicit structured knowledge in the knowledge graph and the implicit triple knowledge from the language model, which can be applied to the tasks of link prediction and relation extraction. Experimental results show that the proposed approach can obtain state-of-the-art performance on three knowledge graph reasoning datasets and five relation extraction datasets.

6 Knowledge Reasoning Over Knowledge Graph: A Survey

GUAN Sai-Ping , JIN Xiao-Long , JIA Yan-Tao , WANG Yuan-Zhuo , CHENG Xue-Qi

2018, 29(10):2966-2994. DOI: 10.13328/j.cnki.jos.005551

[Abstract](10759) [HTML](7735) [PDF 610.06 K](28036)

Abstract:
In recent years, the rapid development of Internet technology and Web applications has triggered the explosion of various data on the Internet, which generates a large amount of valuable knowledge. How to organize, represent and analyze these knowledge has attracted much attention. Knowledge graph was thus developed to organize these knowledge in a semantical and visualized manner. Knowledge reasoning over knowledge graph then becomes one of the hot research topics and plays an important role in many applications such as vertical search and intelligent question-answer. The goal of knowledge reasoning over knowledge graph is to infer new facts or identify erroneous facts according to existing ones. Unlike traditional knowledge reasoning, knowledge reasoning over knowledge graph is more diversified, due to the simplicity, intuitiveness, flexibility, and richness of knowledge representation in knowledge graph. Starting with the basic concept of knowledge reasoning, this paper presents a survey on the recently developed methods for knowledge reasoning over knowledge graph. Specifically, the research progress is reviewed in detail from two aspects:One-Step reasoning and multi-step reasoning, each including rule based reasoning, distributed embedding based reasoning, neural network based reasoning and hybrid reasoning. Finally, future research directions and outlook of knowledge reasoning over knowledge graph are discussed.

7 Temporal Knowledge Graph Reasoning Based on Diffusion Probability Distribution

ZHOU Guang-You , LI Peng-Fei , XIE Peng-Hui , LUO Chang-Yin

2024, 35(11):5083-5097. DOI: 10.13328/j.cnki.jos.007002

[Abstract](942) [HTML](999) [PDF 3.04 M](3188)

Abstract:
Temporal knowledge graph reasoning aims to fill in missing links or facts in knowledge graphs, where each fact is associated with a specific timestamp. The dynamic variational framework based on variational autoencoder is particularly effective for this task. By jointly modeling entities and relations using Gaussian distributions, this method not only offers high interpretability but also solves complex probability distribution problems. However, traditional variational autoencoder-based methods often suffer from overfitting during training, which limits their ability to accurately capture the semantic evolution of entities over time. To address this challenge, this study proposes a new temporal knowledge graph reasoning model based on a diffusion probability distribution approach. Specifically, the model uses a bi-directional iterative process to divide the entity semantic modeling process into multiple sub-modules. Each sub-module uses a forward noisy transformation and a backward Gaussian sampling to model a small-scale evolution process of entity semantics. Compared with the variational autoencoder-based method, this study can obtain more accurate modeling by learning the dynamic representation of entity semantics in the metric space over time through the joint modeling of multiple submodules. Compared with the variational autoencoder-based method, the model improves by 4.18% and 1.87% on the Yago11k dataset and Wikidata12k dataset for evaluating the MRR of the indicator and by 1.63% and 2.48% on the ICEWS14 and ICEWS05-15 datasets, respectively.

8 Task Knowledge Fusion for Multimodal Knowledge Graph Completion

CHEN Qiang , ZHANG Dong , LI Shou-Shan , ZHOU Guo-Dong

2025, 36(4):1590-1603. DOI: 10.13328/j.cnki.jos.007213

[Abstract](1194) [HTML](549) [PDF 6.26 M](2753)

Abstract:
The task of completing knowledge graphs aims to reveal the missing fact triples within the knowledge graph based on existing fact triples (head entity, relation, tail entity). Existing research primarily focuses on utilizing the structural information within the knowledge graph. However, these efforts overlook that other modal information contained within the knowledge graph may also be helpful for knowledge graph completion. In addition, since task-specific knowledge is typically not integrated into general pre-training models, the process of incorporating task-related knowledge into modal information extraction becomes crucial. Moreover, given that different modal features contribute uniquely to knowledge graph completion, effectively preserving useful multimodal information poses a significant challenge. To address these issues, this study proposes a multimodal knowledge graph completion method that incorporates task knowledge. It utilizes a fine-tuned multimodal encoder tailored to the current task to acquire entity vector representations across different modalities. Subsequently, a modal fusion-filtering module based on recurrent neural networks is utilized to eliminate task-independent multimodal features. Finally, the study utilizes a simple isomorphic graph network to represent and update all features, thus effectively accomplishing multimodal knowledge graph completion. Experimental results demonstrate the effectiveness of our approach in extracting information from different modalities. Furthermore, it shows that our method enhances entity representation capability through additional multimodal filtering and fusion, consequently improving the performance of multimodal knowledge graph completion tasks.

9 Method for Complex Question Answering Based on Global and Local Features of Knowledge Graph

CHEN Yue-He , JIA Yong-Hui , TAN Chuan-Yuan , CHEN Wen-Liang , ZHANG Min

2023, 34(12):5614-5628. DOI: 10.13328/j.cnki.jos.006799

[Abstract](850) [HTML](2019) [PDF 7.74 M](3210)

Abstract:
Several methods have been proposed to address complex questions of knowledge base question answering (KBQA). However, the complex semantic composition and the possible absence of inference paths lead to the poor reasoning effect of complex questions. To this end, this study proposes the CGL-KBQA method based on the global and local features of knowledge graphs. The method employs the knowledge embedding technique to extract the topological structure and semantic features of knowledge graphs as the global features of the candidate entity node, and models the complex questions as a composite triple classification task based on the entity representation and question composition. At the same time, the core inference paths generated by graphs during the search process are utilized as local features, which are then combined with the semantic similarity of questions to construct different dimensional features of the candidate entities and finally form a hybrid feature scorer. Since the final inference paths may be missing, this study also designs a cluster module with unsupervised multi-clustering methods to select final answer clusters directly according to the feature representation of candidate entities, thereby making reasoning under incomplete KG possible. Experimental results show that the proposed method performs well on two common KBQA datasets, especially when KG is incomplete.

10 Automatic Code Semantic Tag Generation Approach Based on Software Knowledge Graph

XING Shuang-Shuang , LIU Ming-Wei , PENG Xin

2022, 33(11):4027-4045. DOI: 10.13328/j.cnki.jos.006369

[Abstract](2589) [HTML](4801) [PDF 2.93 M](4755)

Abstract:
Code snippets in open-source and enterprise software projects and posted on various software development websites are important software development resources. However, developer's needs for code search often reflect high-level intentions and topics, which are difficult to be satisfied through code search techniques based on information retrieval. It is thus highly desirable that code snippets can be accompanied with semantic tags reflecting their high-level intentions and topics to facilitate code search and understanding. Existing tag generation technologies are mainly oriented to text content or rely on historical data, and cannot meet the needs of large-scale code semantic annotation and auxiliary code search and understanding. Targeted at the issue, this study proposes an approach based on software knowledge graph (called KGCodeTagger) that automatically generates semantic tags for code snippets. KGCodeTagger constructs a software knowledge graph based on concepts and relations extracted from API documentations and software development Q&A text and uses the knowledge graph as the basis of code semantic tag generation. Given a code snippet, KGCodeTagger identifies and extracts API invocations and concept mentions, and then links them to the corresponding concepts in the software knowledge graph. On this basis, the approach further identifies other concepts related to the linked concepts as candidates and selects semantic tags from relevant concepts based on the diversity and representativeness. The software knowledge graph construction steps of KGCodeTagger and the quality of the generated code tags are evaluated. The results show that KGCodeTagger can produce high-quality and meaningful software knowledge graph and code semantic tags, which can help developers quickly understand the intention of the code.

11 Reinforcement Learning Inference Techniques for Knowledge Graph Constrained Question Answering

BI Xin , NIE Hao-Jie , ZHAO Xiang-Guo , YUAN Ye , WANG Guo-Ren

2023, 34(10):4565-4583. DOI: 10.13328/j.cnki.jos.006889

[Abstract](2342) [HTML](3252) [PDF 2.42 M](4503)

Abstract:
Knowledge graph based question answering (KGQA) analyzes natural language questions, performs reasoning over knowledge graphs, and ultimately returns accurate answers to them. It has been widely used in intelligent information services, such as modern search engines, and personalized recommendation. Considering the high cost of manual labeling of reasoning steps as supervision in the relation-supervised learning methods, scholars began to explore weak supervised learning methods, such as reinforcement learning, to design knowledge graph based question answering models. Nevertheless, as for the complex questions with constraints, existing reinforcement learning-based KGQA methods face two major challenges: (1) multi-hop long path reasoning leads to sparsity and delay rewards; (2) existing methods cannot handle the case of reasoning path branches with constraint information. To address the above challenges in constrained question answering tasks, a reward shaping strategy with constraint information is designed to solve the sparsity and delay rewards. In addition, reinforcement learning based constrained path reasoning model named COPAR is proposed. COPAR consists of an action determination strategy based on attention mechanism and an entity determination strategy based on constraint information. Itis capable of selecting the correct relations and entities according to the question constraint information, reducing the search space of reasoning, and ultimately solving the reasoning path branching problem. Moreover, an ambiguity constraint processing strategy is proposed to effectively solve the ambiguity problem of reasoning path. The performance of COPAR is verified and compared using benchmark datasets of knowledge graph based question answering task. The experimental results indicate that, compared with the existing methods, the performance on datasets of multi-hop questions is relatively improved by 2%-7%; the performance on datasets of constrained questions is higher than the rival models, and the accuracy is improved by at least 7.8%.

12 QA-KGNet: Language Model-driven Knowledge Graph Question-answering Model

QIAO Shao-Jie , YANG Guo-Ping , YU Yong , HAN Nan , QIN Xiao , QU Lu-Lu , RAN Li-Qiong , LI He

2023, 34(10):4584-4600. DOI: 10.13328/j.cnki.jos.006882

[Abstract](2383) [HTML](3953) [PDF 2.30 M](4804)

Abstract:
The question-answering system based on knowledge graphs can analyze user questions, and has become an effective way to retrieve relevant knowledge and automatically answer the given questions. The knowledge graph-based question-answering system usually uses a neural program induction model to convert natural language question into a logical form, and the answer can be obtained by executing the logical form on the knowledge graph. However, the knowledge question-answering system by using pre-trained language models and knowledge graphs involves two challenges: (1) given the QA (question-answering) context, relevant knowledge needs to be identified from a large KG (knowledge graph); (2) it isneeded to perform the joint reasoning on QA context and KG. Based on these challenges, a language model-driven knowledge graph question-answering model is proposed, which connects the QA context and KG to form a joint graph, and uses a language model to calculate the relevance of the given QA context nodes and KG nodes, and a multi-head graph attention network is employed to update the node representation. Extensive experiments on the CommonsenseQA, OpenBookQA and MedQA-USMLE real datasets are conducted to evaluate the performance of QA-KGNet and the experimental results show that QA-KGNet outperforms existing benchmark models and exhibits excellent structured reasoning capability.

13 Cross-project Prediction Method of Security Bug Reports Based on Knowledge Graph

ZHENG Wei , LIU Cheng-Yuan , WU Xiao-Xue , CHEN Xiang , CHENG Jing-Yuan , SUN Xiao-Bing , SUN Rui-Yang

2024, 35(3):1257-1279. DOI: 10.13328/j.cnki.jos.006812

[Abstract](1145) [HTML](1572) [PDF 9.98 M](2613)

Abstract:
Security bug reports (SBRs) can describe critical security vulnerabilities in software products. SBR prediction has attracted the increasing attention of researchers to eliminate security attack risks of software products. However, in actual software development scenarios, a new company or new project may need software security bug prediction, without enough marked SBRs for building SBR prediction models in practice. A simple solution is employing the migration model, which means that marked data of other projects can be adopted to build the prediction model. Inspired by two recent studies in this field, this study puts forward a cross-project SBR prediction method integrating knowledge graphs, i.e., knowledge graph of security bug report prediction (KG-SBRP), based on the idea of security keyword filtering. The text information field in SBR is combined with common weakness enumeration (CWE) and common vulnerabilities and exposures (CVE) Details to build a triple rule entity. Then the entity is utilized to build a knowledge graph of security bugs and identify SBRs by combining the entity and relationship recognition. Finally, the data is divided into training sets and test sets for model fitting and performance evaluation. The built model conducts empirical research on seven SBR datasets with different scales. The results show that compared with the current main methods FARSEC and Keyword matrix, the proposed method can increase the performance index F1-score by an average of 11% under cross-project SBR prediction scenarios. In addition, the F1-score value can also grow by an average of 30% in SBR prediction scenarios within a project.

14 KGDB: Knowledge Graph Database System with Unified Model and Query Language

LIU Bao-Zhu , WANG Xin , LIU Peng-Kai , LI Si-Zhuo , ZHANG Xiao-Wang , YANG Ya-Jun

2021, 32(3):781-804. DOI: 10.13328/j.cnki.jos.006181

[Abstract](3615) [HTML](4202) [PDF 2.32 M](8688)

Abstract:
Knowledge graph is an important cornerstone of artificial intelligence, which currently has two main data models: RDF graph and property graph. There are several query languages on these two data models. The query language on RDF graph is SPARQL, and the query language on property graph is mainly Cypher. Over the last decade, various communities have developed different data management methods for RDF graphs and property graphs. Inconsistent data models and query languages hinder the wider application of knowledge graphs. KGDB is a knowledge graph database system with unified data model and query language. (1) Based on the relational model, a unified storage scheme is proposed, which supports the efficient storage of RDF graphs and property graphs, and meets the requirement of knowledge graph data storage and query load. (2) Using the clustering method based on characteristic sets, KGDB can handle the issue of untyped triple storage. (3) It realizes the interoperability of SPARQL and Cypher, which are two different knowledge graph query languages, and enables them to operate on the same knowledge graph. The extensive experiments on real-world datasets and synthetic datasets are carried out. The experimental results show that, compared with the existing knowledge graph database management systems, KGDB can not only provide more efficient storage management, but also has higher query efficiency. KGDB saves 30% of the storage space on average compared with gStore and Neo4j. The experimental results on basic graph pattern matching query show that, for the real-world dataset, the query efficiency of KGDB is generally higher than that of gStore and Neo4j, and can be improved by at most two orders of magnitude.

15 Knowledge Graph Embedding Combining with Hierarchical Type Information

ZHANG Jin-Dou , LI Jing

2022, 33(9):3331-3346. DOI: 10.13328/j.cnki.jos.006295

[Abstract](1202) [HTML](2042) [PDF 4.26 M](2984)

Abstract:
Knowledge graph embedding aims to embed entities and relations into a low-dimensional continuous vector space. Due to the data sparsity of knowledge graphs, the performance of knowledge graph embedding is poor in vector representation. Since the type information of entities encompasses rich semantic information, it is introduced to improve the performance. However, the existing methods either do not support the hierarchical structure of type information or the type constraint of relations or complicate the model of the hierarchical structure. This study proposes a novel knowledge graph embedding method combining with hierarchical type information. Specifically, types are embedded into different vector spaces and the hierarchical structure of types is modeled by the partial order relation. Moreover, the vector representations of entities are mapped into the type vector space so that entities and their types can be required to satisfy the partial order relation. The entities and their type constraint of relations in triples are also made to satisfy the partial order relation. Finally, experimental results of link prediction, triple classification and entity typing task on four datasets show that the proposed method outperforms the state-of-the-art baseline methods in vector representation performance.

16 Accurate and Efficient Method for Constructing Domain Knowledge Graph

YANG Yu-Ji , XU Bin , HU Jia-Wei , TONG Mei-Han , ZHANG Peng , ZHENG Li

2018, 29(10):2931-2947. DOI: 10.13328/j.cnki.jos.005552

[Abstract](8254) [HTML](9174) [PDF 2.33 M](17772)

Abstract:
In supporting semantic Web, knowledge graphs have played a vital role in many areas such as knowledge QA and semantic search. Therefore, they have become a hot topic in the field of research and engineering. However, it is often costly to build a large-scale knowledge graph with high accuracy. How to balance the accuracy and efficiency, and quickly build a high-quality domain knowledge graph, is a big challenge in the field of knowledge engineering. This paper engages a systematic study on the construction of domain knowledge graphs, and puts forward an accurate and efficient method of constructing domain knowledge graphs as "four-steps". This method has been applied to the construction of knowledge graphs of nine subjects in the k12 education of China, and the nine subject knowledge graphs have been developed with high accuracy, which demonstrates that the new method is effective. For example, the geographical knowledge graph, which is constructed using the "four-steps" method, has 670 thousand instances and 14.21 million triples. And as part of it, the annotation data's knowledge coverage and knowledge accuracy are both above 99%.

17 Knowledge Graph Question Answering Based on Relevance Prompts

MA Jie , SUN Wang-Chun , WANG Ping-Hui , ZHANG Ruo-Fei , LI Shuai-Peng , SU Zhou

2025, 36(9):4056-4071. DOI: 10.13328/j.cnki.jos.007247

[Abstract](532) [HTML](23) [PDF 6.72 K](937)

Abstract:
As large language models (LLMs) continue to evolve, they have shown impressive performance in open-domain tasks. However, they exhibit limited effectiveness in domain-specific question-answering due to a lack of domain-specific knowledge. This limitation has attracted widespread attention from researchers in the field. Current research attempts to infuse domain knowledge into LLMs through a retrieve-answer approach to enhance their performance. However, this method often retrieves additional, irrelevant data, leading to a degradation in LLM effectiveness. Therefore, this study proposes a method for knowledge graph question answering based on the relevance of knowledge. This method focuses on distinguishing essential knowledge required for specific questions from noisy data. Under a framework of retrieval-relevance assessment-answering, this method guides LLMs to select appropriate knowledge for accurate answers. Moreover, this study introduces a dataset named Mecha-QA for question-answering using a mechanical domain knowledge graph, covering traditional machinery manufacturing and additive manufacturing, to promote research that integrates LLMs with knowledge graph question answering in this field. To validate the effectiveness of the proposed method, experiments are conducted on the Aero-QA dataset in the aerospace domain and the Mecha-QA dataset. Results demonstrate that the proposed method significantly improves the performance of LLMs in knowledge graph question answering in vertical domains.

18 Knowledge Graph Construction Method via Internet-based Collective Intelligence

JIANG Yi , ZHANG Wei , WANG Pei , ZHANG Xin-Yue , MEI Hong

2022, 33(7):2646-2666. DOI: 10.13328/j.cnki.jos.006313

[Abstract](3211) [HTML](2851) [PDF 2.44 M](5943)

Abstract:
Knowledge graph is a graph-based structural representation of knowledge. One of the key problems about knowledge graph in both research and practice is how to construct large-scale high-quality knowledge graphs. This paper presents an approach to construct knowledge graphs based on Internet-based human collective intelligence. The core of this approach is a continuously executing loop, called the EIF loop or EIFL, consisting of three activities: free exploration, automatic integration, and proactive feedback. In free exploration activity, each participant tries to construct an individual knowledge graph alone. In automatic integration activity, all participants’ current individual knowledge graphs are integrated in real-time into a collective knowledge graph. In proactive feedback activity, each participant is provided with personalized feedback information from the current collective knowledge graph, in order to improve the participant’s efficiency of constructing an individual knowledge graph. In particular, a hierarchical knowledge graph representation mechanism is proposed, a knowledge graph merging algorithm is designed driven by the goal of minimizing the collective knowledge graph’s general entropy, and two ways for context-dependent and context-independent information feedback are introduced, repectively. In order to investigate the feasibility of the proposed approach, three kinds of experiment are designed and carried out: (1) the merging experiment on simulated graphs with structural information only; (2) the merging experiment on real large-scaled knowledge graphs; (3) the construction experiment of knowledge graphs with different number of participants. The experimental results show that: (1) the proposed knowledge graph merging algorithm can find high-quality merging solutions of knowledge graphs by utilizing both structural information of knowledge graphs and semantic information of elements in knowledge graphs; (2) EIFL-based collective collaboration improves both the efficiency of participants in constructing individual knowledge graphs and the scale of the collective knowledge graph merged from individual knowledge graphs, and shows sound scalability with respect to the number of participants in knowledge graph construction.

19 Bidirectional Imitation Distillation for Efficient Incremental Pre-training of E-commerce Social Knowledge Graph

ZHU Yu-Shan , ZHANG Wen , WANG Xiao-Ke , LI Zhi-Yu , CHEN Ming-Yang , YAO Zhen , CHEN Hui , CHEN Hua-Jun

2025, 36(3):1218-1239. DOI: 10.13328/j.cnki.jos.007170

[Abstract](327) [HTML](647) [PDF 10.60 M](2189)

Abstract:
Pre-training knowledge graph (KG) models facilitate various downstream tasks in e-commerce applications. However, large-scale social KGs are highly dynamic, and the pre-training models need to be updated regularly to reflect the changes in node features caused by user interactions. This study proposes an efficient incremental update framework for the pre-training KG models. The framework mainly includes a bidirectional imitation distillation method to fully use the different types of facts in new data, and a sampling strategy based on samples’ normality and abnormality is proposed to sample the most valuable facts from all new facts to reduce the training data size, and a reverse replay mechanism is proposed to generate high-quality negative facts that are more suitable for the incremental training of social KGs in e-commerce. Experimental results on real-world e-commerce datasets and related downstream tasks demonstrate that the proposed framework can incrementally update the pre-training KG models more effectively and efficiently compared to state-of-the-art methods.

20 Cross-language Bilayer Knowledge Graph with Large-scale Medical Facts

WANG Chu-Tong , LI Ming-Da , SUN Meng-Xuan , WANG Jing , YANG Xue-Bing , NIU Jing-Hao , HE Zhi-Yang , ZHANG Wen-Sheng

2025, 36(3):1240-1253. DOI: 10.13328/j.cnki.jos.007173

[Abstract](452) [HTML](777) [PDF 8.22 M](3083)

Abstract:
Benefiting from the rapid development of information technology and the widespread adoption of medical information systems, a vast amount of medical knowledge has been accumulated in medical databases, including patient clinical treatment events and medical expert consensus. It is crucial to extract knowledge from these medical facts and effectively manage and utilize them, which can advance the automation and intelligence of diagnosis and treatment. Knowledge graphs, as a novel knowledge representation tool, can effectively mine and organize information from abundant medical facts and have received extensive attention in the medical field. However, existing medical knowledge graphs often suffer from limitations such as small scale, numerous restrictions, poor scalability, and so on, leading to a limited ability to express knowledge from medical facts. To address these issues, this proposes a bilayer medical knowledge graph architecture and employs information extraction techniques on both English patient clinical treatment events and Chinese medical expert consensus to construct a billion-scale medical knowledge graph that is cross-lingual, multimodal, dynamically updated, and highly scalable, aiming to provide more accurate, intelligent medical services.

21 Research on Knowledge Graph Data Management: A Survey

WANG Xin , ZOU Lei , WANG Chao-Kun , PENG Peng , FENG Zhi-Yong

2019, 30(7):2139-2174. DOI: 10.13328/j.cnki.jos.005841

[Abstract](8122) [HTML](6537) [PDF 3.44 M](17452)

Abstract:
Knowledge graphs have become the cornerstone of artificial intelligence. The construction and publication of large-scale knowledge graphs in various domains have posed new challenges on the data management of knowledge graphs. In this paper, in accordance with the structural and operational elements of a data model, the current theories, methods, technologies, and systems of knowledge graph data management are surveyed. First, the paper introduces knowledge graph data models, including the RDF graph model and the property graph model, and also introduces 5 knowledge graph query languages, including SPARQL, Cypher, Gremlin, PGQL, and G-CORE. Second, the storage management schemes of knowledge graphs are presented, including relational-based and native approaches. Third, three kinds of query operations are discussed, which are graph pattern matching, navigational, and analytical queries. Fourth, the paper introduces mainstream knowledge graph database management systems, which are categorized into RDF triple stores and native graph databases. Meanwhile, the state-of-the-art distributed systems and frameworks that are used for processing knowledge graphs are also described, and benchmarks are presented for knowledge graphs. Finally, the future research directions of knowledge graph data management are put forward as well.

22 Overview on Knowledge Graph Embedding Technology Research

ZHANG Tian-Cheng , TIAN Xue , SUN Xiang-Hui , YU Ming-He , SUN Yan-Hong , YU Ge

2023, 34(1):277-311. DOI: 10.13328/j.cnki.jos.006429

[Abstract](8029) [HTML](6931) [PDF 5.78 M](13046)

Abstract:
Knowledge graph (KG) is a kind of technology that uses graph model to describe the relationship between knowledge and modeling things. Knowledge graph embedding (KGE), as a widely adopted knowledge representation method, its main idea is to embed entities and relationships in a knowledge graph into a continuous vector space, which is used to simplify operations while preserving the intrinsic structure of the KG. It can benefit a variety of downstream tasks, such as KG completion, relation extraction, etc. Firstly, the existing knowledge graph embedding technologies are comprehensively reviewed, including not only techniques using the facts observed in KG for embedding, but also dynamic KG embedding methods that add time dimensions, as well as KG embedding technologies that integrate multi-source information. The relevant models are analyzed, compared and summarized from the perspectives of entity embedding, relation embedding and scoring functions. Then, typical applications of KG embedding technologies in downstream tasks are briefly introduced, including question answering systems, recommendation systems and relationship extraction. Finally, the challenges of knowledge graph embedding are expounded, and the future research directions are prospected.

23 Survey on Knowledge Graph Embedding Learning

YANG Dong-Hua , HE Tao , WANG Hong-Zhi , WANG Jin-Bao

2022, 33(9):3370-3390. DOI: 10.13328/j.cnki.jos.006426

[Abstract](3646) [HTML](4703) [PDF 8.84 M](7564)

Abstract:
Knowledge graphs (KGs) serve as a kind of knowledge base by storing facts with network structure, representing each piece of fact as a triple, i.e. (head, relation, tail). Thanks to the general applications of KGs in various of fields, the embedding learning of knowledge graph has also quickly gained massive attention. This study tries to classify the existing embedding algorithms as five types: translation-based models, tensor factorization-based models, traditional deep learning-based models, graph neural network-based models, and models by fusing extra information. Then, the key ideas, algorithm features, advantages and disadvantages of different embedding models are introduced and analyzed to give the first-time researchers a guideline that can be referenced to help researchers quickly get started.

24 Text-oriented Construction for CPS Resource Capability Knowledge Graph

LI Zheng-Jie , SHEN Li-Wei , LI Yi , PENG Xin

2023, 34(5):2268-2285. DOI: 10.13328/j.cnki.jos.006410

[Abstract](1523) [HTML](2061) [PDF 6.70 M](4081)

Abstract:
Cyber-physical system (CPS) plays an increasingly important role in social life. The on-demand choreography of CPS resources is based on the software defining of CPS resources. The definition of software interfaces depends on the full description for the capabilities of CPS resources. At present, in the CPS field, there is a lack of a knowledge base that can describe resources and their capabilities, and a lack of an effective way to construct the knowledge base. For the text description of CPS resources, this study proposes to construct the CPS resource capability knowledge graph and designs a bottom-up automatic construction method. Given CPS resources, this method first extracts textual descriptions of the resources’ capabilities from code and texts, and generates a normalized expression of capability phrases based on a predefined representation pattern. Then, capability phrases are divided, aggregated and abstracted based on the key components of the verb-object structure to generate the hierarchical abstract description of capabilities for different categories of resources. Finally, the CPS knowledge graph is constructed. Based on the Home Assistant platform, this study constructs a knowledge graph containing 32 resource categories and 957 resource capabilities. In the construction experiment, the results of manual construction and automatic construction using the proposed method are compared and analyzed from different dimensions. Experimental results show that this study provides a feasible method for automatic construction of CPS Resource Capability Knowledge Graph. This method helps to reduce the workload of artificial construction, supplement the description of resource services and capabilities in the CPS field and improves the knowledge completeness.

微信服务号

微信订阅号

knowledge graph

Current Issue

Volume

Issue