###
Journal of Software:2020.31(12):3733-3752

自然进化策略的特征选择算法研究
张鑫,李占山
(吉林大学 计算机科学与技术学院, 吉林 长春 130012;符号计算与知识工程教育部重点实验室(吉林大学), 吉林 长春 130012)
Research on Feature Selection Algorithm Based on Natural Evolution Strategy
ZHANG Xin,LI Zhan-Shan
(College of Computer Science and Technology, Jilin University, Changchun 130012, China;Key Laboratory of Symbolic Computation and Knowledge Engineering(Jilin University), Ministry of Education, Changchun 130012, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 62   Download 124
Received:December 07, 2018    Revised:June 17, 2019
> 中文摘要: 特征选择是一种NP-难问题,旨在剔除数据集中不相关及冗余的特征来减少模型训练的时间,提高模型的精确度.因此,特征选择在机器学习、数据挖掘和模式识别等领域中是一种重要的数据预处理手段.提出一种新的基于自然进化策略的特征选择算法——MCC-NES.首先,算法采用了基于对角协方差矩阵建模并通过梯度信息自适应调整参数的自然进化策略;其次,为了使算法有效地处理特征选择问题,在初始化阶段引入了一种特征编码方式;之后,结合分类准确率和维度缩减给出了算法的适应度函数;此外,面对高维数据引入了合作协同进化的思想,将原问题分解为相对较小的子问题并分别对每个子问题独立求解,然后,通过所有子问题相互联系来优化原问题的解决方案;进一步引入分布式种群进化的概念,实现多个种群竞争进化来增加算法的探索能力,并设计了种群重启策略以防止种群陷入局部最优解.最后将提出的算法与几种传统的特征选择算法在一些UCI公共数据集上进行对比实验,实验结果显示:所提出的算法可以有效地完成特征选择问题,并且与经典特征选择算法相比有一定的竞争力,尤其是在处理高维数据时有着出色的表现.
Abstract:Feature selection is an NP-hard problem that aims to improve the accuracy of the model by eliminating irrelevant or redundant features to reduce model training time. Therefore, feature selection is an important data preprocessing technique in the fields of machine learning, data mining, and pattern recognition. This study proposes a new feature selection algorithm MCC-NES based on natural evolutionary strategy. Firstly, the algorithm adopts natural evolutionary strategy based on diagonal covariance matrix modeling, which adaptively adjusts parameters through gradient information. Secondly, in order to enable the algorithm to effectively deal with feature selection problems, a feature coding mechanism is introduced in the initialization phase, and combined with classification accuracy and dimensional reduction, given the new fitness function. In addition, the idea of sub-population cooperative co-evolution is introduced to solve high-dimensional data. The original problem is decomposed into relatively small sub-problems to reduce the combined effect of the original problem scale and each sub-question is solved independently, and then all sub-problems are correlated to optimize the solution to the original problem. Further, applying multiple competing evolutionary populations to enhance the exploration ability of the algorithm and designing a population restart strategy to prevent the population from falling into the local optimal solution. Finally, the proposed algorithm is compared with several traditional feature selection algorithms on some UCI public datasets. The experimental results show that the proposed algorithm can effectively complete the feature selection problem and has excellent performance compared with the classical feature selection algorithm, especially when dealing with high-dimensional data.
文章编号:     中图分类号:TP18    文献标志码:
基金项目:国家自然科学基金(61672261);吉林省自然科学基金(20180101043JC);吉林省发展和改革委员会产业技术研究与开发项目(2019C053-9) 国家自然科学基金(61672261);吉林省自然科学基金(20180101043JC);吉林省发展和改革委员会产业技术研究与开发项目(2019C053-9)
Foundation items:National Natural Science Foundation of China (61672261); Natural Science Foundation of Jilin Province (20180101043JC); Industrial Technology R&D Project of Jilin Province Development and Reform Commission (2019C053-9)
Reference text:

张鑫,李占山.自然进化策略的特征选择算法研究.软件学报,2020,31(12):3733-3752

ZHANG Xin,LI Zhan-Shan.Research on Feature Selection Algorithm Based on Natural Evolution Strategy.Journal of Software,2020,31(12):3733-3752