软件学报  2020, Vol. 31 Issue (4): 1025-1038 PDF

1. 吉林大学 计算机科学与技术学院, 吉林 长春 130012;
2. 符号计算与知识工程教育部重点实验室(吉林大学), 吉林 长春 130012;
3. 北京大学 北京国际数学研究中心, 北京 100871

Distant Supervision Neural Network Relation Extraction Base on Noisy Observation
YE Yu-Xin1,2 , XUE Huan1 , WANG Lu3 , OUYANG Dan-Tong1,2
1. School of Computer Science and Technology, Jilin University, Changchun 130012, China;
2. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education(Jilin University), Changchun 130012, China;
3. Beijing International Center for Mathematical Research, Peking University, Beijing 100871, China
Abstract: The great advantage of distant supervision relation extraction is to generate labeled data automatically through knowledge bases and natural language texts. This simple automatic alignment mechanism liberates people from heavy labeling work, but inevitably produces various incorrect labeled data meanwhile, which would have an influential effect on the construction of high-quality relation extraction models. To handle noise labels in the distant supervision relation extraction, here it is assumed that the final label of sentence is based on noisy observations generated by some unknown factors. Based on this assumption, a new relation extraction model is constructed, which consists of encoder layer, attention based on noise distribution layer, real label output layer, and noisy observation layer. In the training phase, transformation probabilities are learned from real label to noisy label by using automatically labeled data, and in the testing phase, the real label is obtained through the real label output layer. This study proposes to combine the noise observation model with deep neural network. The attention mechanism of noise distribution is focused based on deep neural network, and unbalanced samples are denoised of under the framework of deep neural network, aiming to further improve the performance of distant supervision relation extraction based on noisy observation. To examine its performance, the proposed method is applied to a public dataset. The performance of distant supervision relation extraction model is evaluated under different distribution families. The experimental results illustrate the proposed method is more effective with higher precision and recall, compared to the existing methods.
Key words: distant supervision    relation extraction    noise label

1 相关工作

2 带噪观测的远监督关系抽取模型

 Fig. 1 Structure of the proposed model 图 1 模型整体结构

3 带噪观测与噪声分布

3.1 带噪观测与真实标签输出

 $p(y' = j|x, w, {w_{noise}}) = \sum\nolimits_{i = 1}^k p (y' = j|y = i;{w_{noise}})p(y = i|x, w)$ (1)

 $p(y = i|x;w) = \frac{{\exp (u_i^Th + {b_i})}}{{\sum\nolimits_{l = 1}^k {\exp (u_l^Th + {b_i})} }}$ (2)

 $p(y' = j|y = i, x) = \frac{{\exp (u_{ij}^Th + {b_{ij}})}}{{\sum\nolimits_{l = 1}^k {\exp (u_{il}^Th + {b_{il}})} }}$ (3)
 $p(y' = j|x) = \sum\nolimits_{l = 1}^k {p(y' = j|y = i, x)p(y = i|x)}$ (4)

 Fig. 2 Structure of training phase model 图 2 训练阶段模型结构

 $S(w, {w_{noise}}) = \sum\nolimits_t {\log p({{y'}_t}|{x_t}) = \sum\nolimits_t {\log (\sum\nolimits_i {p({{y'}_t}|{y_t} = i, {x_t};{w_{noise}})} } } p({y_t} = i|{x_t};w))$ (5)

 $p(y' = j|y = i) = \frac{{\exp ({b_{ij}})}}{{\sum\nolimits_l {\exp ({b_{il}})} }}$ (6)
 $p(y' = j|x) = \sum\nolimits_i {p(y' = j|y = i)p(y = i|x)}$ (7)

 $S(w, {w_{noise}}) = \sum\nolimits_t {\log p({{y'}_t}|{x_t})} = \sum\nolimits_t {\log \left( {\sum\nolimits_i {p({{y'}_t}|{y_t} = i;{w_{noise}})p({y_t} = i|{x_t};w)} } \right)}$ (8)

 $p(y' = j|y = i) = \theta (i, j) = \frac{{\exp ({b_{ij}})}}{{\sum\nolimits_l {\exp ({b_{il}})} }}$ (9)

 Fig. 3 Structure of testing phase model 图 3 测试阶段模型结构

3.2 噪声分布的初始化

 ${b_{ij}} = \sum\nolimits_t {{1_{\{ {{y'}_t} = j\} }}p({y_t} = i|{x_t})}$ (10)

 ${b'_{ij}} = \log \left( {\frac{{{b_{ij}}}}{{\sum\nolimits_t {p({y_t} = i|{x_t})} }}} \right)$ (11)

4 基于噪声分布的注意力机制

4.1 编码层

 Fig. 4 Location information of words relative to entities 图 4 单词相对实体的位置信息

 $A \otimes B = \sum\nolimits_{i = 1}^m {\sum\nolimits_{j = 1}^n {{a_{ij}}{b_{ij}}} }$ (12)

 Fig. 5 Encoder layer 图 5 编码层

 Fig. 6 Attention with noise distribution 图 6 基于噪声分布的注意力机制

4.2 基于噪声分布的注意力机制

 ${\alpha _i} = \frac{{\exp ({e_i})}}{{\sum\nolimits_{j = 1}^k {\exp ({e_j})} }}$ (13)

 $s = \sum\nolimits_{i = 1}^n {{\alpha _i}} {p_i}$ (14)

 $o = Ms + d$ (15)

 $p(y|s, w) = \frac{{\exp ({o_i})}}{{\sum\nolimits_{j = 1}^k {\exp ({o_j})} }}$ (16)

5 不均衡样本的降噪处理

 ${b_{ij}} = \left\{ \begin{array}{l} 1, {\rm{ }}{b_{ij}} = 0{\rm{ and }}i = j\\ 0, {\rm{ }}i = 0, j \ne 0{\rm{ or }}i \ne 0, j = 0\\ \sum\nolimits_t {{1_{\{ {{y'}_t} = j\} }}} p({y_t} = i|{x_t}), {\rm{ otherwise}} \end{array} \right.$ (17)
6 实验

6.1 数据集和评价指标

6.2 参数设置

Table 1 Parameter settings 表 1 参数设置

6.3 噪声分布情况分析

 Fig. 7 Confusion matrix 图 7 混淆矩阵

6.4 对比基线

 Fig. 8 NYT experimental results 图 8 NYT实验结果

 Fig. 9 Wiki-KBP experimental results 图 9 Wiki-KBP实验结果

Table 2 Evaluation of performance 表 2 评估结果

Table 3 Model parameters 表 3 模型参数

 Fig. 10 Experimental results of cnn_att 图 10 cnn_att实验结果

Table 4 Evaluation of performance 表 4 评估结果

 Fig. 11 Confusion matrix 图 11 混淆矩阵

 Fig. 12 Experimental results 图 12 实验结果图

 Fig. 13 Comparative experimental results 图 13 对比实验结果

7 结束语

 [1] Mintz M, Bills S, Snow R, Jurafsky D. Distant supervision for relation extraction without labeled data. In: Proc. of the Joint Conf. of the 47th Annual Meeting of the ACL and the 4th Int'l Joint Conf. on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Association for Computational Linguistics, 2009. 1003-1011. [2] Riedel S, Yao L, McCallum A. Modeling relations and their mentions without labeled text. In: Proc. of the Joint European Conf. on Machine Learning and Knowledge Discovery in Databases. Berlin, Heidelberg Springer-Verlag, 2010. 148-163. [3] Hoffmann R, Zhang C, Ling X, Zettlemoyer L, Daniel S. Knowledge-based weak supervision for information extraction of overlapping relations. In: Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011. 541-550. [4] Surdeanu M, Tibshirani J, Nallapati R, Manning CD. Multi-instance multi-label learning for relation extraction. In: Proc. of the 2012 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2012. 455-465. [5] Lin Y, Shen S, Liu Z, Luan H, Sun M. Neural relation extraction with selective attention over instances. In: Proc. of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 1: 2124-2133. [6] Fan M, Zhao D, Zhou Q, Liu Z, Zheng TF, Chang EY. Errata: Distant supervision for relation extraction with matrix completion. In: Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014, 1: 839-849. [7] E HH, Zhang WJ, Xiao SQ, Cheng R, Hu YX, Zhou XS, Niu PQ. A survey of entity relationship extraction based on deep learning. Ruan Jian Xue Bao/Journal of Software, 2019, 30(6): 1793-1818(in Chinese with English abstract). http://www.jos.org.cn/jos/ch/reader/view_abstract.aspx?file_no=5817&flag=1 [doi:10.13328/j.cnki.jos.005817] [8] Xu HY, Zhao H, Wang RB, Fu HC, Liu YL. Research on the movie recommendation system based on user similarity. Journal of Liaoning University, 2018, 45(3): 193-200(in Chinese with English abstract). http://d.old.wanfangdata.com.cn/Periodical/lndxxb201803001 [9] Socher R, Huval B, Manning CD, Ng AY. Semantic compositionality through recursive matrix-vector spaces. In: Proc. of the 2012 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2012. 1201-1211. [10] Zeng D, Liu K, Lai S, Zhou G, Zhao J. Relation classification via convolutional deep neural network. In: Proc. of the 25th Int'l Conf. on Computational Linguistics: Technical Papers. 2014. 2335-2344. [11] Santos CND, Xiang B, Zhou B. Classifying relations by ranking with convolutional neural networks. Computer Science, 2015, 86(86): 132-137. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=Arxiv000001197633 [12] Zeng D, Liu K, Chen Y, Zhao J. Distant supervision for relation extraction via piecewise convolutional neural networks. In: Proc. of the Conf. on Empirical Methods in Natural Language Processing. 2015. 1753-1762. [13] Qu J, Ouyang D, Hua W, Ye Y, Li X. Distant supervision for neural relation extraction integrated with word attention and property features. Neural Networks, 2018, 100: 59-69. [doi:10.1016/j.neunet.2018.01.006] [14] Ji G, Liu K, He S, Zhao J. Distant supervision for relation extraction with sentence-level attention and entity descriptions. In: Proc. of the 31st AAAI Conf. on Artificial Intelligence. 2017. 3060-3066. [15] Ouyang DT, Qu JF, Ye YX. Extending training set in distant supervision by ontology for relation extraction. Ruan Jian Xue Bao/Journal of Software, 2014, 25(9): 2088-2101(in Chinese with English abstract). http://www.jos.org.cn/jos/ch/reader/view_abstract.aspx?flag=1&file_no=4638&journal_id=jos [doi:10.13328/j.cnki.jos.004638] [16] Vashishth S, Joshi R, Prayaga SS, Bhattacharyya C, Talukdar P. Reside: Improving distantly-supervised neural relation extraction using side information. arXiv Preprint arXiv: 1812.04361, 2018. [17] Brodley CE, Friedl MA. Identifying mislabeled training data. Journal of Artificial Intelligence Research, 1999, 11: 131-167. [doi:10.1613/jair.606] [18] Rebbapragada U, Brodley CE. Class noise mitigation through instance weighting. In: Proc. of the European Conf. on Machine Learning. Berlin, Heidelberg: Springer-Verlag, 2007. 708-715. [19] Manwani N, Sastry PS. Noise tolerance under risk minimization. IEEE Trans. on Cybernetics, 2013, 43(3): 1146-1151. [doi:10.1109/TSMCB.2012.2223460] [20] Natarajan N, Dhillon IS, Ravikumar PK, Tewari A. Learning with noisy labels. In: Advances in Neural Information Processing Systems., 2013: 1196-1204. [21] Lawrence ND, Sch lkopf B. Estimating a kernel Fisher discriminant in the presence of label noise. ICML., 2001, 1: 306-313. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=CC026553531 [22] Li Y, Yang J, Song Y, Cao L, Luo J, Li L. Learning from noisy labels with distillation. In: Proc. of the IEEE Int'l Conf. on Computer Vision. 2017. 1910-1918. [23] Veit A, Alldrin N, Chechik G, Krasin I, Gupta A, Belongie SJ. Learning from noisy large-scale datasets with minimal supervision. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 839-847. [24] Vahdat A. Toward robustness against label noise in training deep discriminative neural networks. In: Advances in Neural Information Processing Systems. 2017. 5596-5605. [25] Yao J, Wang J, Tsang IW, Zhang Y, Sun J, Zhang C, Zhang R. Deep learning from noisy image labels with quality embedding. IEEE Trans. on Image Processing, 2019, 28(4): 1909-1922. [26] Reed S, Lee H, Anguelov D, Szegedy C, Erhan D, Rabinovich A. Training deep neural networks on noisy labels with bootstrapping. arXiv Preprint arXiv: 1412.6596, 2014. [27] Joulin A, van der Maaten L, Jabri A, Vasilache A. Learning visual features from large weakly supervised data. In: Proc. of the European Conf. on Computer Vision. Cham: Springer-Verlag, 2016. 67-84. [28] Jiang L, Zhou Z, Leung T, Li LJ, Li FF. Mentornet: Regularizing very deep neural networks on corrupted labels. arXiv Preprint arXiv: 1712.05055, 2017, 4. [29] Ghosh A, Kumar H, Sastry PS. Robust loss functions under label noise for deep neural networks. In: Proc. of the 31st AAAI Conf. on Artificial Intelligence. 2017. [30] Mnih V, Hinton GE. Learning to label aerial images from noisy data. In: Proc. of the 29th Int'l Conf. on Machine Learning. 2012. 567-574. [31] Sukhbaatar S, Bruna J, Paluri M, Bourdev L, Fergus R. Training convolutional networks with noisy labels. arXiv Preprint arXiv: 1406.2080, 2014. [32] Jindal I, Nokleby M, Chen X. Learning deep networks from noisy labels with dropout regularization. In: Proc. of the 16th IEEE Int'l Conf. on Data Mining. IEEE, 2016. 967-972. [33] Patrini G, Rozza A, Krishna MA, Nock R, Qu L. Making deep neural networks robust to label noise: A loss correction approach. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 1944-1952. [34] Han B, Yao J, Niu G, Zhou M, Tsang IW, Zhang Y, Sugiyama M. Masking: A new perspective of noisy supervision. In: Advances in Neural Information Processing Systems. 2018. 5836-5846. [35] Xiao T, Xia T, Yang Y, Huang C, Wang X. Learning from massive noisy labeled data for image classification. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2015. 2691-2699. [36] Misra I, Lawrence ZC, Mitchell M, Girshick RB. Seeing through the human reporting bias: Visual classifiers from noisy human-centric labels. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 2930-2939. [37] Bekker AJ, Goldberger J. Training deep neural-networks based on unreliable labels. In: Proc. of the 2016 IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. IEEE, 2016. 2682-2686. [38] Ling X, Weld DS. Fine-grained entity recognition. In Proc. of the 26th AAAI Conf. on Artificial Intelligence. 2012. 94-100. [7] 鄂海红, 张文静, 肖思琪, 程瑞, 胡莺夕, 周筱松, 牛佩晴. 基于深度学习的实体关系抽取研究综述. 软件学报, 2019, 30(6): 1793-1818. http://www.jos.org.cn/jos/ch/reader/view_abstract.aspx?file_no=5817&flag=1 [doi:10.13328/j.cnki.jos.005817] [8] 徐红艳, 赵宏, 王嵘冰, 付瀚臣, 刘逸伦. 融合用户相似度的影视推荐系统研究. 辽宁大学学报(自然科学版), 2018, 45(3): 193-200. http://d.old.wanfangdata.com.cn/Periodical/lndxxb201803001 [15] 欧阳丹彤, 瞿剑峰, 叶育鑫. 关系抽取中基于本体的远监督样本扩充. 软件学报, 2014, 25(9): 2088-2101. http://www.jos.org.cn/jos/ch/reader/view_abstract.aspx?flag=1&file_no=4638&journal_id=jos [doi:10.13328/j.cnki.jos.004638]