###
Journal of Software:2020.31(5):1511-1524

森林优化特征选择算法的增强与扩展
刘兆赓,李占山,王丽,王涛,于海鸿
(吉林大学 软件学院, 吉林 长春 130012;符号计算与知识工程教育部重点实验室(吉林大学), 吉林 长春 130012;吉林大学 软件学院, 吉林 长春 130012;吉林大学 计算机科学与技术学院, 吉林 长春 130012;符号计算与知识工程教育部重点实验室(吉林大学), 吉林 长春 130012;长春工业大学 计算机科学与工程学院, 吉林 长春 130012)
Enhancement and Extension of Feature Selection Using Forest Optimization Algorithm
LIU Zhao-Geng,LI Zhan-Shan,WANG Li,WANG Tao,YU Hai-Hong
(College of Software, Jilin University, Changchun 130012, China;Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012, China;College of Software, Jilin University, Changchun 130012, China;College of Computer Science and Technology, Jilin University, Changchun 130012, China;Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012, China;College of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 188   Download 56
Received:July 12, 2018    Revised:August 05, 2018
> 中文摘要: 特征选择作为一种重要的数据预处理方法,不但能解决维数灾难问题,还能提高算法的泛化能力.各种各样的方法已被应用于解决特征选择问题,其中,基于演化计算的特征选择算法近年来获得了更多的关注并取得了一些成功.近期研究结果表明,森林优化特征选择算法具有更好的分类性能及维度缩减能力.然而,初始化阶段的随机性、全局播种阶段的人为参数设定,影响了该算法的准确率和维度缩减能力;同时,算法本身存在着高维数据处理能力不足的本质缺陷.从信息增益率的角度给出了一种初始化策略,在全局播种阶段,借用模拟退火控温函数的思想自动生成参数,并结合维度缩减率给出了适应度函数;同时,针对形成的优质森林采取贪心算法,形成一种特征选择算法EFSFOA(enhanced feature selection using forest optimization algorithm).此外,在面对高维数据的处理时,采用集成特征选择的方案形成了一个适用于EFSFOA的集成特征选择框架,使其能够有效处理高维数据特征选择问题.通过设计对比实验,验证了EFSFOA与FSFOA相比在分类准确率和维度缩减率上均有明显的提高,高维数据处理能力更是提高到了100 000维.将EFSFOA与近年来提出的比较高效的基于演化计算的特征选择方法进行对比,EFSFOA仍具有很强的竞争力.
Abstract:As an important data preprocessing method, feature selection can not only solve the dimensionality disaster problem, but also improve the generalization ability of algorithms. A variety of methods have been applied to solve feature selection problems, where evolutionary computation techniques have recently gained much attention and shown some success. Recent study has shown that feature selection using forest optimization algorithm has better classification performance and dimensional reduction ability. However, the randomness of initialization phase and the artificial parameter setting of global seeding phase affect the accuracy and the dimension reduction ability of the algorithm. At the same time, the algorithm itself has the essential defect of insufficient high-dimensional data processing capability. In this study, an initialization strategy is given from the perspective of information gain rate, parameter is automatically generated by using simulated annealing temperature control function during global seeding, a fitness function is given by combining dimension reduction rate, using greedy algorithm to select the best tree from the high-quality forest obtained, and a feature selection algorithm EFSFOA (enhanced feature selection using forest optimization algorithm) is proposed. In addition, in the face of high-dimensional data processing, ensemble feature selection scheme is used to form an ensemble feature selection framework suitable for EFSFOA, so that it can effectively deal with the problem of high-dimensional data feature selection. Through designing some contrast experiments, it is verified that EFSFOA has significantly improved classification accuracy and dimensionality reduction rate compared with FSFOA, and the high-dimensional data processing capability has been increased to 100 000 dimensions. Comparing EFSFOA with other efficient evolutionary computation for feature selection approaches which have been proposed in recent years, EFSFOA still has strong competitiveness.
文章编号:     中图分类号:TP18    文献标志码:
基金项目:国家自然科学基金(61672261);吉林省自然科学基金(20180101043JC);吉林省发改委产业技术研究与开发专项资金(2019C053-9) 国家自然科学基金(61672261);吉林省自然科学基金(20180101043JC);吉林省发改委产业技术研究与开发专项资金(2019C053-9)
Foundation items:National Natural Science Foundation of China (61672261); Natural Science Foundation of Jilin Province (2018010 1043JC); Industrial Technology Research and Development Special Project of Jilin Province Development and Reform Commission (2019C053-9)
Reference text:

刘兆赓,李占山,王丽,王涛,于海鸿.森林优化特征选择算法的增强与扩展.软件学报,2020,31(5):1511-1524

LIU Zhao-Geng,LI Zhan-Shan,WANG Li,WANG Tao,YU Hai-Hong.Enhancement and Extension of Feature Selection Using Forest Optimization Algorithm.Journal of Software,2020,31(5):1511-1524