Concept Drift Detection Method Based on Online Performance Test
Author:
Affiliation:

Clc Number:

TP181

Fund Project:

National Natural Science Foundation of China (61503229, 61673249, U1805263); Natural Science Foundation of Shanxi Province of China (201901D111033); Key R&D Program of Shanxi Province (International Cooperation) (201903D421050)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Concept drift is a common problem in dynamic streaming data mining, but the false concept drift generated by the mixed noise data or too small scale size training data will cause similar results to the concept drift, that is, the instability fluctuation of model online testing performance, which leads to confusion between them, and the false alarm of concept drift. To address the problem which is easy to confuse the authenticity of concept drift, concept drift detection method based on online performance test, namely CDPT, is presented. With CDPT, the latest acquired data are evenly divided into groups, and online learning is performed on each group sub sets. At the same time, the classification accuracy vectors obtained by training and testing of each group sub sets are recorded, and the accuracy difference between adjacent learning time units is calculated. The effective fluctuation points are obtained according to the testing accuracy decline threshold. Then, the effective fluctuation points in different groups are integrated by cross checking to eliminate the detection interference caused by the instability of the model due to the small training samples in the online learning process of streaming data, and the consistent fluctuation points are obtained according to the consistency of accuracy fluctuation. Finally, by tracking the classification accuracy of online learning, the change of testing accuracy can be achieved of neighborhood reference points of consistent fluctuation points, and the decline and convergence of model testing accuracy can be compared of neighborhood reference points of consistent fluctuation points, so as to effectively detect the true concept drift points of the consistent fluctuation points. The experimental results demonstrate that the proposed CDPT method can effectively identify the true concept drift occurring in the online learning process of streaming data, effectively avoid the negative impact of too small training samples or noise on the detection results, and improve the generalization performance of the model.

    Reference
    Related
    Cited by
Get Citation

郭虎升,张爱娟,王文剑.基于在线性能测试的概念漂移检测方法.软件学报,2020,31(4):932-947

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 08,2019
  • Revised:July 11,2019
  • Adopted:
  • Online: January 14,2020
  • Published: April 06,2020
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063