软件学报  2017, Vol. 28 Issue (11): 2879-2890 PDF

1. 智能技术与系统国家重点实验室(清华大学), 北京 100084;
2. 清华大学 计算机科学与技术系, 北京 100084

Convolution Neural Network Feature Importance Analysis and Feature Selection Enhanced Model
LU Hong-Yu1,2, ZHANG Min1,2, LIU Yi-Qun1,2, MA Shao-Ping1,2
1. State Key Laboratory of Intelligent Technology and System(Tsinghua University), Beijing 100084, China;
2. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Foundation item: Foundation item: National Natural Science Foundation of China (61622208, 61532011, 61672311); National Program on Key Basic Research Project of China (973) (2015CB358700)
Abstract: Because of its strong expressive power and outstanding performance of classification, deep neural network (DNN), such as like convolution neural network (CNN), is widely used in various fields. When faced with high-dimensional features, DNNs are usually considered to have good robustness, for it can implicitly select relevant features. However, due to the huge number of parameters, if the data is not enough, the learning of neural network will be inadequate and the feature selection will not be desirable. DNN is a black box, which makes it difficult to observe what features are chosen and to evaluate its ability of feature selection. This paper proposes a feature contribution analysis method based on neuron receptive field. Using this method, the feature importance of a neural network, for example CNN, can be explicitly obtained. Further, the study finds that the neural network's ability in recognizing relevant and noise features is weaker than the tratitional evaluation methods. To enhance its feature selection ability, a feature selection enhanced CNN model is proposed to improve classification accuracy by applying traditional feature evaluation method to the learning process of neural network. In the task of the text-based user attribute modeling in social media, experimental results demonstrate the validity of the preoposed model.
Key words: convolution neural network     feature importance analysis     feature selection     text categorization

1 相关工作 1.1 神经网络的样本特征分析

 ${S_{{x_{ij}}}} = \partial L\left( {\tilde y, x} \right)/{\partial _{{x_{ij}}}},$

1.2 样本特征分析方法的评估

 $x_{MF}^{\left( 0 \right)} = x;\forall 1 \le k \le L:x_{MF}^{\left( k \right)} = g\left( {x_{MF}^{\left( {k-1} \right)}, {r_k}} \right)$ (1)

 $AOPC = \frac{1}{{L + 1}}{\left\langle {\sum\nolimits_{k = 0}^L {f\left( {x_{MF}^{\left( 0 \right)}} \right)-f\left( {x_{MF}^{\left( k \right)}} \right)} } \right\rangle _x}$ (2)

1.3 传统特征选择方法

 ${\chi ^2} = \sum\nolimits_{i = 1}^2 {\sum\nolimits_{j = 1}^k {\frac{{\left( {{A_{ij}}-{E_{ij}}} \right)}}{{{E_{ij}}}}} }$ (3)

 ${\rho _{X, C}} = \frac{{{\mathop{\mathit cov}} \left( {X, C} \right)}}{{{\sigma _X}{\sigma _C}}}$ (4)

2 神经网络的特征重要性分析

2.1 基于感受野的神经网络特征贡献度分析

 Fig. 1 Sketch map of the feature contribution analysis based on receptive field 图 1 基于感受野的特征贡献度分析示意图

1.输出层神经元yj的贡献度被初始化为${C_{{y_j}}} = {\delta _{jc}}$, δ为克罗内克函数, c为待观测的类别(例如样本的正确类别).

2.输出层神经元yj值由池化层神经元p经过一层全连接得到, 因此pi的贡献度Cpi可以通过Cyj和相应的全连接层权重${w_{{p_i}{y_j}}}$计算得到:

 ${C_{{p_i}}} = {w_{{p_i}{y_j}}}{C_{{y_j}}}$ (5)

3.最大池化层pj仅保留对应的特征图fmi中最大的一项, 赢者通吃, 池化神经元的贡献度${C_{{p_i}}}$全部反向传播给特征图fmj最大激活卷积神经元$con{v_{i, {k_{\max }}}}$:

 $con{v_{i, k}} = {I_{k = {k_{\max }}}}{C_{{p_j}}}$ (6)

4.卷积神经元convj, k的激活值由其感受野内特征wi与卷积核参数进行卷积操作得来, 因此, wi的贡献度${C_{{w_i}}}$可以通过其词向量${x_{{w_i}}}$与卷积核对应位置参数向量的点积得到:

 ${C_{{w_i}}} = \sum {_j\sum {_k{I_{i \in RF\left( k \right)}}conv\_kene{l_{i-k + kenel\_size/2}}{x_{{w_i}}} \times cin{v_{j, k}}} }$ (7)

 $im{p_{{w_i}}} = \frac{1}{N}\sum\nolimits_{j \in doc\left( {{w_i}} \right)} {im{p_{{w_{ij}}}}}$ (8)

2.2 样本特征重要性分析方法的有效性对比实验 2.2.1 实验数据及模型

 Fig. 2 A convolution neural network model for text categorization tasks 图 2 文本分类任务下的卷积神经网络模型

2.2.2 有效性实验及结果分析

 Fig. 3 Visual display of feature contribution and feature sensitivity analysis 图 3 特征贡献度和特征敏感性分析可视化展示

 Fig. 4 Effective experiments of feature analysis method 图 4 特征分析方法有效性实验

2.3 神经网络的特征选择结果

Table 1 Top10 keywords of different feature importance evaluation methods 表 1 不同特征重要性评价方法Top10特征词

3 神经网络特征选择能力与传统特征选择方法的对比分析 3.1 特征选择能力的评估

3.2 高重要性特征的识别能力的实验性对比研究(正向选择)

 Fig. 5 Experimental result of positive selection 图 5 正向选择实验结果

3.3 噪声特征的识别能力的实验性对比研究(反向遮挡)

 Fig. 6 Experimental result of reverse occlusion 图 6 反向遮挡实验结果

4 卷积神经网络的增强特征选择模型

4.1 特征选择层

 Fig. 7 Sketch map of feature selection layer 图 7 特征选择层示意图

 $x' = \mathit{ReLU}\left( {x \odot w + b} \right)$ (9)

 Fig. 8 Feature selection enhanced model applied to the convolutional neural network with embedded layer 图 8 增强特征选择模型应用于包含嵌入层的卷积神经网络

 Fig. 9 Feature selection enhanced model applied to the neural networks with fixed length features 图 9 增强特征选择模型应用于定长特征的神经网络

4.2 模型有效性验证

Table 2 Experimental results of feature selection enhanced convolution neural network 表 2 增强特征选择的卷积神经网络模型实验结果

5 结论与展望

