###
Journal of Software:2017.28(11):2891-2904

基于显露模式的数据流贝叶斯分类算法
杜超,王志海,江晶晶,孙艳歌
(北京交通大学 计算机与信息技术学院, 北京 100044)
Bayesian Classifier Algorithm Based on Emerging Pattern for Data Stream
DU Chao,WANG Zhi-Hai,JIANG Jing-Jing,SUN Yan-Ge
(School of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 989   Download 771
Received:May 15, 2017    Revised:June 16, 2017
> 中文摘要: 基于模式的贝叶斯分类模型是解决数据挖掘领域分类问题的一种有效方法.然而,大多数基于模式的贝叶斯分类器只考虑模式在目标类数据集中的支持度,而忽略了模式在对立类数据集合中的支持度.此外,对于高速动态变化的无限数据流环境,在静态数据集下的基于模式的贝叶斯分类器就不能适用.为了解决这些问题,提出了基于显露模式的数据流贝叶斯分类模型EPDS(Bayesian classifier algorithm based on emerging pattern for data stream).该模型使用一个简单的混合森林结构来维护内存中事务的项集,并采用一种快速的模式抽取机制来提高算法速度.EPDS采用半懒惰式学习策略持续更新显露模式,并为待分类事务在每个类下建立局部分类模型.大量实验结果表明,该算法比其他数据流分类模型有较高的准确度.
中文关键词: 数据流  显露模式  贝叶斯  数据挖掘
Abstract:Pattern-Based Bayesian model is one of the solutions for the classification problem in data mining. Most pattern-based Bayesian classifiers consider the supports of patterns in the dataset of the home class only. However, the supports of the patterns in the counterpart class are ignored. In addition, for the high-speed dynamic changes and infinite data stream, pattern-based Bayesian classifier which aims at static datasets can not work. To overcome these problems, EPDS (Bayesian classifier algorithm based on emerging pattern for data stream) is proposed. EPDS is a Bayesian classification model based on the emerging patterns discovered over data stream. In this model, EPDS presents a simple hybrid forests (HYF) data structure to maintain the itemsets of the transactions in memory, and uses a fast pattern extracting mechanism to accelerate the algorithm. EPDS adopts partially-lazy learning strategy to update emerging itemsets continuously, and establishes a local classification model in each class for the test transaction. Experimental results on real and synthetic data streams show that EPDS achieves higher classification accuracy compared to other classic classifiers.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61672086) 国家自然科学基金(61672086)
Foundation items:National Natural Science Foundation of China (61672086)
Reference text:

杜超,王志海,江晶晶,孙艳歌.基于显露模式的数据流贝叶斯分类算法.软件学报,2017,28(11):2891-2904

DU Chao,WANG Zhi-Hai,JIANG Jing-Jing,SUN Yan-Ge.Bayesian Classifier Algorithm Based on Emerging Pattern for Data Stream.Journal of Software,2017,28(11):2891-2904