具有回忆和遗忘机制的数据流挖掘模型与算法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61272141,60905032,61120106005,61273232)


Ensemble Model and Algorithm with Recalling and Forgetting Mechanisms for Data Stream Mining
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    集成式数据流挖掘是对存在概念漂移的数据流进行学习的重要方法.针对传统集成式数据流挖掘存在的缺陷,将人类的回忆和遗忘机制引入到数据流挖掘中,提出基于记忆的数据流挖掘模型MDSM(memorizing based data stream mining).该模型将基分类器看作是系统获得的知识,通过"回忆与遗忘"机制,不仅使历史上有用的基分类器因记忆强度高而保存在"记忆库"中,提高预测的稳定性,而且从"记忆库"中选取当前分类效果好的基分类器参与集成预测,以提高对概念变化的适应能力.基于MDSM模型,提出了一种集成式数据流挖掘算法MAE(memorizing based adaptive ensemble),该算法利用Ebbinghaus遗忘曲线对系统的遗忘机制进行设计,并利用选择性集成来模拟人类的"回忆"机制.与4种典型的数据流挖掘算法进行比较,结果表明:MAE算法分类精度高,对概念漂移的整体适应能力强,尤其对重复出现的概念漂移以及实际应用中存在的复杂概念漂移具有很好的适应能力.不仅能够快速适应新的概念变化,并且能够有效抵御随机的概念波动对系统性能的影响.

    Abstract:

    Using ensemble of classifiers on sequential chunks of training instances is a popular strategy for data stream mining with concept drifts. Aiming at the limitations of existing approaches, this paper introduces human recalling and forgetting mechanisms into a data stream mining system, and proposes a memorizing based data stream mining (MDSM) model. The model considers base classifiers as learned knowledge. Through "recalling and forgetting" mechanism, most useful classifiers in the past will be reserved in a "memory repository", which improves the stability under random concept drifts. The best classifiers for the current data chunk are selected for prediction, which achieves high adaptability for different concept drifts. Based on MSDM, the paper puts forward a new algorithm MAE (memorizing based adaptive ensemble). MAE uses Ebbinghaus forgetting curve as forgetting mechanism and adopts ensemble pruning to emulate the "recalling" mechanism. Compared with four traditional data stream mining approaches, the results show that MAE achieves high and stable accuracy with moderate training time. The results also proved that MAE has good adaptability for different kinds of concept drifts, especially for the applications with recurring or complex concept drifts.

    参考文献
    相似文献
    引证文献
引用本文

赵强利,蒋艳凰,卢宇彤.具有回忆和遗忘机制的数据流挖掘模型与算法.软件学报,2015,26(10):2567-2580

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2014-07-31
  • 最后修改日期:2014-09-03
  • 录用日期:
  • 在线发布日期: 2015-10-10
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号