###
Journal of Software:2020.31(12):3772-3786

中文文本蕴含类型及语块识别方法研究
于东,金天华,谢婉莹,张艺,荀恩东
(北京语言大学 信息科学学院, 北京 100083)
Recognition Method Based on Deep Learning for Chinese Textual Entailment Chunks and Labels
YU Dong,JIN Tian-Hua,XIE Wan-Ying,ZHANG Yi,XUN En-Dong
(College of Information Science, Beijing Language and Culture University, Beijing 100083, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 77   Download 102
Received:April 02, 2019    Revised:June 05, 2019
> 中文摘要: 文本蕴含识别(RTE)是判断两个句子语义是否具有蕴含关系的任务.近年来英文蕴含识别研究取得了较大发展,但主要是以类型判断为主,在数据中精确定位蕴含语块的研究比较少,蕴含类型识别的解释性较低.从中文文本蕴含识别(CNLI)数据中挑选12 000个中文蕴含句对,人工标注引起蕴含现象的语块,结合语块的语言学特征分析归纳了7种具体的蕴含类型.在此基础上,将中文蕴含识别任务转化为7分类的蕴含类型识别和蕴含语块边界-类型识别任务,在深度学习模型上达到69.19%和62.09%的准确率.实验结果表明,所提出的方法可以有效发现中文蕴含语块边界及与之对应的蕴含类型,为下一步研究提供了可靠的基准方法.
Abstract:Recognizing textual entailment (RTE) is a task to recognize whether two sentences have an entailment relationship. In recent years, RTE in English had made a great progress. The current researches are mainly based on type judgment, and pay less attention to locate the language chunks that lead to the entailment relationship. More over, it leads to a low interpretability of the RTE models. This study selects 12 000 Chinese entailment sentence pairs from the Chinese Natural Language Inference (CNLI) data and labeled chunks which lead to their entailment relationship. Then 7 entailment types are summarized considering Chinese linguistic features. On the basis, two tasks are proposed. One is to recognize the seven-category of entailment type for each entailment sentence pairs, another is to recognize the boundaries of the entailment chunks in it. The proposed deep learning based method reaches an accuracy of 69.19% and 62.09% in the two tasks. The experimental results show that proposed approaches can effectively identifying different types of entailment in Chinese and find the boundaries of the entailment chunks, which demonstrate that the proposed model provides a reliable benchmark for further research.
文章编号:     中图分类号:TP18    文献标志码:
基金项目:国家重点研发计划(2018YFB1005105) 国家重点研发计划(2018YFB1005105)
Foundation items:National Key Research and Development Program of China (2018YFB1005105)
Reference text:

于东,金天华,谢婉莹,张艺,荀恩东.中文文本蕴含类型及语块识别方法研究.软件学报,2020,31(12):3772-3786

YU Dong,JIN Tian-Hua,XIE Wan-Ying,ZHANG Yi,XUN En-Dong.Recognition Method Based on Deep Learning for Chinese Textual Entailment Chunks and Labels.Journal of Software,2020,31(12):3772-3786