一种元路径下基于频繁模式的实体集扩展方法
作者:
作者单位:

作者简介:

郑玉艳(1992-),女,山东诸城人,博士生,主要研究领域为异质信息网络数据挖掘;田莹(1996-),女,本科生,主要研究领域为数据挖掘;田莹(1996-),女,本科生,主要研究领域为数据挖掘

通讯作者:

石川,E-mail:shichuan@bupt.edu.cn

中图分类号:

基金项目:

国家重点研究和发展计划(973)(2017YFB0803304);国家自然科学基金(61772082,61375058);北京市自然科学基金(4182043)


Method of Entity Set Expansion Based on Frequent Pattern Under Meta Path
Author:
Affiliation:

Fund Project:

National Key Research and Development Program of China (973) (2017YFB0803304); National Natural Science Foundation of China (61772082, 61375058); Beijing Municipal Natural Science Foundation of China (4182043)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    实体集扩展是指已知某个特定类别的几个种子实体,根据一定的规则得到该类别的更多实体.作为一种经典的数据挖掘任务,实体集扩展已经有很多的应用,诸如字典建立、查询建议等.现有的实体集扩展主要是基于文本或网页信息,即实体之间的关系从其在文本或者网页中的共现来推断.随着知识图谱研究的兴起,根据知识图谱中知识的共现来研究实体集扩展也成为了一种可能.主要研究知识图谱中的实体集扩展问题,即:给定几个种子实体,利用知识图谱来得到更多的同类别的实体.首先,把知识图谱建模成一个异质信息网络,即含有多种实体类型或者关系类型的网络,提出了一种新的元路径下基于频繁模式的实体集扩展方法,称为FPMP_ESE.FPMP_ESE采用异质信息网络中的元路径来捕捉种子实体之间的潜在共同特征.为了找到种子实体之间重要的元路径,设计了一种新的基于频繁模式的元路径自动产生算法FPMPG.之后,为了更好地给每条元路径分配相应的权重,设计了启发式的方法和PU learning的方法.最后,在真实数据集Yago上的实验结果表明,所提出方法较其他方法在实体集扩展任务上具有更好的性能和更高的效率.

    Abstract:

    Entity set expansion (ESE) refers to getting a more complete set according to some rules, given several seed entities with specific semantic meaning. As a popular data mining task, ESE has many applications, such as dictionary construction and query suggestion. Contemporary ESE mainly utilizes text or Web information. That is, the intrinsic relations among entities are inferred from theirco-occurrences in text or Web. With the surge of knowledge graph in recent years, it is possible to extend entities according to their co-occurrences in knowledge graph. This paper studies the problem of the entity set expansion in knowledge graph. That is, given several seed entities, how to obtain more entities by leveraging knowledge graph. Firstly, the knowledge graph is modeled as a heterogeneous information network (HIN), which contains multiple types of entities or relationships. Next, a novel method of entity set expansion based on frequent pattern under Meta path, called FPMP_ESE, is proposed. FPMP_ESE employs Meta paths to capture the implicit common traits of seed entities. In order to find the important Meta paths between entities, an automatic Meta path generation method is designed based on frequent pattern called FPMPG. Then, two kinds of heuristic and PU learning methods are developed to distribute the weights of Meta paths. Finally, experiments on real dataset Yago demonstrate that the proposed method has better effectiveness and higher efficiency compared to other methods.

    参考文献
    相似文献
    引证文献
引用本文

郑玉艳,田莹,石川.一种元路径下基于频繁模式的实体集扩展方法.软件学报,2018,29(10):2915-2930

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2017-07-20
  • 最后修改日期:2017-11-08
  • 录用日期:
  • 在线发布日期: 2018-02-08
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号