[关键词]
[摘要]
司法人工智能中主要挑战性问题之一是案情关键要素识别, 现有方法仅将案情要素作为一个命名实体识别任务, 导致识别出的多数信息是无关的. 另外, 也缺乏对文本的全局信息和词汇局部信息的有效利用, 导致要素边界识别的效果不佳. 针对这些问题, 提出一种融合全局和局部信息的关键案情要素识别方法. 所提方法首先利用BERT模型作为司法文本的输入共享层以提取文本特征. 然后, 在共享层之上建立司法案情要素识别、司法文本分类(全局信息)、司法中文分词(局部信息)这3个子任务进行联合学习模型. 最后, 在两个公开数据集上测试所提方法的效果, 结果表明: 所提方法F1值均超过了现有的先进方法, 提高了要素实体分类的准确率并减少了识别边界错误问题.
[Key word]
[Abstract]
One of the main challenges in judicial artificial intelligence is the identification of key case elements. The existing methods only take the identification of case elements as an identification task of named entities, and thus, the recognized information is mostly irrelevant. In addition, due to the lack of effective use of global and local information in texts, the effect of element boundary recognition is poor. To solve these problems, this study proposes a recognition method of key case elements by integrating global and local information. Specifically, the BERT model is used as the input-sharing layer of judicial texts to extract text features. Then, three sub-task networks of judicial case element recognition, judicial text classification (global information), and judicial Chinese word segmentation (local information) are established on the sharing layer for joint learning. Finally, the effectiveness of this method is tested on two public data sets. The results show that the F1 value of the proposed method exceeds the existing optimal method, improves the classification accuracy of element entities, and reduces boundary recognition errors.
[中图分类号]
TP18
[基金项目]
国家重点研发计划(2020YFC0832700); 国家自然科学基金(62172449, 62006251); 湖南省自然科学基金(2022JJ30211, 2021JJ30870, 2021JJ40783); 长沙市自然科学基金(kq2202300); 长沙市科技计划(kq2107004)