###
Journal of Software:2020.31(10):3216-3237

一种时间序列鉴别性特征字典构建算法
张伟,王志海,原继东,郝石磊
(北京交通大学 计算机与信息技术学院, 北京 100044)
Time Series Discriminative Feature Dictionary Construction Algorithm
ZHANG Wei,WANG Zhi-Hai,YUAN Ji-Dong,HAO Shi-Lei
(School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 41   Download 67
Received:October 23, 2018    Revised:January 01, 2019
> 中文摘要: 时间序列数据广泛产生于科技和经济的多个领域.基于符号傅里叶近似(symbolic Fourier approximation)和滑动窗口的定长单词抽取算法是目前时间序列特征字典构建过程中最有效的特征生成算法之一,但是该算法在特征生成过程中不能根据不同滑动窗口长度动态地选择保留的最优傅里叶值的个数,而且特征字典构建过程中缺少从生成的海量特征中对鉴别性特征进行有效选择的算法.为此,提出一种鉴别性特征字典构建算法.首先,提出一种针对不同长度滑动窗口学习最优单词长度的基于Fourier近似的可变长度单词抽取方法;其次,构建了一种新的特征鉴别性评价指标,并依据其动态阈值对生成的特征进行选择.实验结果表明,基于构建的特征字典的逻辑回归模型不仅分类精度高,而且可以有效发现预测过程中的鉴别性特征.
Abstract:Time series data are widely generated in many fields of science, technology and economy. Time series feature generation algorithm based on Symbolic Fourier Approximation (SFA) and sliding window transformation mechanism is one of the most effective feature dictionary construction algorithms, but there are some obvious shortcomings in this kind of methods. Firstly, the number of optimal Fourier values cannot be dynamically selected for different sliding window lengths in the process of transformation. Secondly, there is a lack of effective algorithm to select discriminant features from the generated massive features. To this end, a new variable length feature dictionary building algorithm is proposed in this study. First, a variable length word extraction method based on SFA is proposed. The method dynamically selects the optimal number of Fourier values for different sliding window lengths. Second, a new feature discriminant evaluation indicator is designed, and the generated features are selected according to its dynamic threshold. Experimental results show that, based on the proposed time series dictionary, the logistic regression model can achieve high classification accuracy and find the discriminant features in the prediction process.
文章编号:     中图分类号:TP311    文献标志码:
基金项目:中央高校基本科研业务费专项资金(2018JBM014);国家自然科学基金(61702030,61672086);北京市自然科学基金(4182052);北京市优秀人才项目资助(2017000020124G056) 中央高校基本科研业务费专项资金(2018JBM014);国家自然科学基金(61702030,61672086);北京市自然科学基金(4182052);北京市优秀人才项目资助(2017000020124G056)
Foundation items:Fundamental Research Funds for the Central Universities (2018JBM014); National Natural Science Foundation of China (61702030, 61672086); Beijing Natural Science Foundation of China (4182052); Beijing Excellent Talents (2017000020124G056)
Reference text:

张伟,王志海,原继东,郝石磊.一种时间序列鉴别性特征字典构建算法.软件学报,2020,31(10):3216-3237

ZHANG Wei,WANG Zhi-Hai,YUAN Ji-Dong,HAO Shi-Lei.Time Series Discriminative Feature Dictionary Construction Algorithm.Journal of Software,2020,31(10):3216-3237