软件学报  2020, Vol. 31 Issue (4): 1025-1038 PDF

1. 吉林大学 计算机科学与技术学院, 吉林 长春 130012;
2. 符号计算与知识工程教育部重点实验室(吉林大学), 吉林 长春 130012;
3. 北京大学 北京国际数学研究中心, 北京 100871

Distant Supervision Neural Network Relation Extraction Base on Noisy Observation
YE Yu-Xin1,2 , XUE Huan1 , WANG Lu3 , OUYANG Dan-Tong1,2
1. School of Computer Science and Technology, Jilin University, Changchun 130012, China;
2. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education(Jilin University), Changchun 130012, China;
3. Beijing International Center for Mathematical Research, Peking University, Beijing 100871, China
Abstract: The great advantage of distant supervision relation extraction is to generate labeled data automatically through knowledge bases and natural language texts. This simple automatic alignment mechanism liberates people from heavy labeling work, but inevitably produces various incorrect labeled data meanwhile, which would have an influential effect on the construction of high-quality relation extraction models. To handle noise labels in the distant supervision relation extraction, here it is assumed that the final label of sentence is based on noisy observations generated by some unknown factors. Based on this assumption, a new relation extraction model is constructed, which consists of encoder layer, attention based on noise distribution layer, real label output layer, and noisy observation layer. In the training phase, transformation probabilities are learned from real label to noisy label by using automatically labeled data, and in the testing phase, the real label is obtained through the real label output layer. This study proposes to combine the noise observation model with deep neural network. The attention mechanism of noise distribution is focused based on deep neural network, and unbalanced samples are denoised of under the framework of deep neural network, aiming to further improve the performance of distant supervision relation extraction based on noisy observation. To examine its performance, the proposed method is applied to a public dataset. The performance of distant supervision relation extraction model is evaluated under different distribution families. The experimental results illustrate the proposed method is more effective with higher precision and recall, compared to the existing methods.
Key words: distant supervision    relation extraction    noise label

1 相关工作

2 带噪观测的远监督关系抽取模型

 Fig. 1 Structure of the proposed model 图 1 模型整体结构

3 带噪观测与噪声分布

3.1 带噪观测与真实标签输出

 $p(y' = j|x, w, {w_{noise}}) = \sum\nolimits_{i = 1}^k p (y' = j|y = i;{w_{noise}})p(y = i|x, w)$ (1)

 $p(y = i|x;w) = \frac{{\exp (u_i^Th + {b_i})}}{{\sum\nolimits_{l = 1}^k {\exp (u_l^Th + {b_i})} }}$ (2)

 $p(y' = j|y = i, x) = \frac{{\exp (u_{ij}^Th + {b_{ij}})}}{{\sum\nolimits_{l = 1}^k {\exp (u_{il}^Th + {b_{il}})} }}$ (3)
 $p(y' = j|x) = \sum\nolimits_{l = 1}^k {p(y' = j|y = i, x)p(y = i|x)}$ (4)

 Fig. 2 Structure of training phase model 图 2 训练阶段模型结构

 $S(w, {w_{noise}}) = \sum\nolimits_t {\log p({{y'}_t}|{x_t}) = \sum\nolimits_t {\log (\sum\nolimits_i {p({{y'}_t}|{y_t} = i, {x_t};{w_{noise}})} } } p({y_t} = i|{x_t};w))$ (5)

 $p(y' = j|y = i) = \frac{{\exp ({b_{ij}})}}{{\sum\nolimits_l {\exp ({b_{il}})} }}$ (6)
 $p(y' = j|x) = \sum\nolimits_i {p(y' = j|y = i)p(y = i|x)}$ (7)

 $S(w, {w_{noise}}) = \sum\nolimits_t {\log p({{y'}_t}|{x_t})} = \sum\nolimits_t {\log \left( {\sum\nolimits_i {p({{y'}_t}|{y_t} = i;{w_{noise}})p({y_t} = i|{x_t};w)} } \right)}$ (8)

 $p(y' = j|y = i) = \theta (i, j) = \frac{{\exp ({b_{ij}})}}{{\sum\nolimits_l {\exp ({b_{il}})} }}$ (9)

 Fig. 3 Structure of testing phase model 图 3 测试阶段模型结构

3.2 噪声分布的初始化

 ${b_{ij}} = \sum\nolimits_t {{1_{\{ {{y'}_t} = j\} }}p({y_t} = i|{x_t})}$ (10)

 ${b'_{ij}} = \log \left( {\frac{{{b_{ij}}}}{{\sum\nolimits_t {p({y_t} = i|{x_t})} }}} \right)$ (11)

4 基于噪声分布的注意力机制

4.1 编码层

 Fig. 4 Location information of words relative to entities 图 4 单词相对实体的位置信息

 $A \otimes B = \sum\nolimits_{i = 1}^m {\sum\nolimits_{j = 1}^n {{a_{ij}}{b_{ij}}} }$ (12)

 Fig. 5 Encoder layer 图 5 编码层

 Fig. 6 Attention with noise distribution 图 6 基于噪声分布的注意力机制

4.2 基于噪声分布的注意力机制

 ${\alpha _i} = \frac{{\exp ({e_i})}}{{\sum\nolimits_{j = 1}^k {\exp ({e_j})} }}$ (13)

 $s = \sum\nolimits_{i = 1}^n {{\alpha _i}} {p_i}$ (14)

 $o = Ms + d$ (15)

 $p(y|s, w) = \frac{{\exp ({o_i})}}{{\sum\nolimits_{j = 1}^k {\exp ({o_j})} }}$ (16)

5 不均衡样本的降噪处理

 ${b_{ij}} = \left\{ \begin{array}{l} 1, {\rm{ }}{b_{ij}} = 0{\rm{ and }}i = j\\ 0, {\rm{ }}i = 0, j \ne 0{\rm{ or }}i \ne 0, j = 0\\ \sum\nolimits_t {{1_{\{ {{y'}_t} = j\} }}} p({y_t} = i|{x_t}), {\rm{ otherwise}} \end{array} \right.$ (17)
6 实验

6.1 数据集和评价指标

6.2 参数设置

Table 1 Parameter settings 表 1 参数设置

6.3 噪声分布情况分析

 Fig. 7 Confusion matrix 图 7 混淆矩阵

6.4 对比基线

 Fig. 8 NYT experimental results 图 8 NYT实验结果

 Fig. 9 Wiki-KBP experimental results 图 9 Wiki-KBP实验结果

Table 2 Evaluation of performance 表 2 评估结果

Table 3 Model parameters 表 3 模型参数

 Fig. 10 Experimental results of cnn_att 图 10 cnn_att实验结果

Table 4 Evaluation of performance 表 4 评估结果

 Fig. 11 Confusion matrix 图 11 混淆矩阵

 Fig. 12 Experimental results 图 12 实验结果图

 Fig. 13 Comparative experimental results 图 13 对比实验结果

7 结束语

[7] 鄂海红, 张文静, 肖思琪, 程瑞, 胡莺夕, 周筱松, 牛佩晴. 基于深度学习的实体关系抽取研究综述. 软件学报, 2019, 30(6): 1793-1818. http://www.jos.org.cn/jos/ch/reader/view_abstract.aspx?file_no=5817&flag=1 [doi:10.13328/j.cnki.jos.005817] [8] 徐红艳, 赵宏, 王嵘冰, 付瀚臣, 刘逸伦. 融合用户相似度的影视推荐系统研究. 辽宁大学学报(自然科学版), 2018, 45(3): 193-200. http://d.old.wanfangdata.com.cn/Periodical/lndxxb201803001 [15] 欧阳丹彤, 瞿剑峰, 叶育鑫. 关系抽取中基于本体的远监督样本扩充. 软件学报, 2014, 25(9): 2088-2101. http://www.jos.org.cn/jos/ch/reader/view_abstract.aspx?flag=1&file_no=4638&journal_id=jos [doi:10.13328/j.cnki.jos.004638]