一种基于各向异性高斯核核惩罚的PCA特征提取算法
作者:
作者单位:

作者简介:

通讯作者:

刘俊,E-mail:junliu@cqupt.edu.cn

基金项目:

国家自然科学基金(61772099,61772098);重庆市自然科学基金(cstc2021jcyj-msxmX0530);重庆市“三百”科技创新领军人才支持计划(CSTCCXLJRC201917);重庆市创新创业示范团队培育计划(CSTC2017kjrc-cxcytd0063)


PCA Feature Extraction Algorithm Based on Anisotropic Gaussian Kernel Penalty
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    提出了一种基于各向异性高斯核核惩罚的主成分分析的特征提取算法.该算法不同于传统的核主成分分析算法.在非线性数据降维中,传统的核主成分分析算法忽略了原始数据的无量钢化.此外,传统的核函数在各维度上主要由一个相同的核宽参数控制,该方法无法准确反映各维度不同特征的重要性,从而导致降维过程中准确率低下.为了解决上述问题,本文首先针对现原始数据的无量钢化问题,提出了一种均值化算法,使得原始数据的总方差贡献率有明显地提高.其次,引入了各向异性高斯核函数,该核函数每个维度拥有不同的核宽参数,各核宽参数能够准确的反映所在维度数据特征的重要性.再次,基于各向异性高斯核函数建立了核主成分分析的特征惩罚目标函数,以便用较少的特征表示原始数据,并反映每个主成分信息的重要性.最后,为了寻求最佳特征,引入梯度下降算法来更新特征惩罚目标函数中的核宽度和控制特征提取算法的迭代过程.为了验证所提出算法的有效性,各算法在UCI公开数据集上和KDDCUP99数据集上进行了比较.实验结果表明,本文提出的基于各向异性高斯核核惩罚的主成分分析的特征提取算法比传统的主成分分析算法在9种公开的UCI公开数据集上准确率平均提高了4.49%.在KDDCUP99数据集上,本文提出的基于各向异性高斯核核惩罚的主成分分析的特征提取算法比传统的主成分分析算法准确率提高了8%.

    Abstract:

    This paper proposes a feature extraction algorithm based on the principal component analysis of the anisotropic Gaussian kernel penalty which is different from the traditional kernel principal component analysis algorithms. In the non-linear data dimensionality reduction, the infinite steel tempering of raw data is ignored by the traditional kernel principal component analysis algorithms. Meanwhile, the previous kernel function is mainly controlled by one identical kernel width parameter in each dimension, which cannot reflect the significance of different features in each dimension precisely, resulting the low accuracy of dimensionality reduction process. To address the above issues, contraposing the current problem of infinite steel tempering of raw data, an averaging algorithm is presented in this paper, which has shown good performance in improving the variance contribution rate of the original data typically. Then, anisotropic Gaussian kernel function is introduced owing each dimension has different kernel width parameters which can critically reflect the importance of the dimension data features. In addition, the feature penalty function of kernel principal component analysis is formulated based on the anisotropic Gaussian kernel function to represent the raw data with fewer features and reflect the importance of each principal component information. Furthermore, the gradient descent method is introduced to update the kernel width of feature penalty function and control the iterative process of the feature extraction algorithm. To verify the effectiveness of the proposed algorithm, several algorithms are compared on UCI public data sets and KDDCUP99 data sets respectively. The experimental results show that the feature extraction algorithm of the principal component analysis based on the anisotropic Gaussian kernel penalty is 4.49% higher on average than the previous principal component analysis algorithms on UCI public data sets. The feature extraction algorithm of the principal component analysis based on the anisotropic Gaussian kernel penalty is 8% higher on average than the previous principal component analysis algorithms on KDDCUP99 data sets.

    参考文献
    相似文献
    引证文献
引用本文

刘俊,李威,陈蜀宇,徐光侠.一种基于各向异性高斯核核惩罚的PCA特征提取算法.软件学报,,():0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2021-04-09
  • 最后修改日期:2021-09-12
  • 录用日期:
  • 在线发布日期: 2021-11-24
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号