###
DOI:
:2006.17(8):1796-1803

基于动态网格的数据流离群点快速检测算法
杨宜东,孙志挥,朱玉全,杨明,张柏礼
(东南大学,计算机科学与工程系,江苏,南京,210096;江苏大学,计算机科学与通信工程学院,江苏,镇江,212013;南京师范大学,计算机科学系,江苏,南京,210000)
A Fast Outlier Detection Algorithm for Data Streams Based on Dynamic Grids
YANG Yi-Dong,SUN Zhi-Hui,ZHU Yu-Quan,YANG Ming,ZHANG Bo-Li
()
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 3210   Download 3062
Received:September 30, 2004    Revised:October 11, 2005
> 中文摘要: 离群点检测问题作为数据挖掘的一个重要任务,在众多领域中得到了应用.近年来,基于数据流数据的挖掘算法研究受到越来越多的重视.为了解决数据流数据中的离群点检测问题,提出了一种基于数据空间动态网格划分的快速数据流离群点检测算法.算法利用动态网格对空间中的稠密和稀疏区域进行划分,过滤处于稠密区域的大量主体数据,有效地减少了算法所需考察的数据对象的规模.而对于稀疏区域中的候选离群点,采用近似方法计算其离群度,具有高离群度的数据作为离群点输出.在保证一定精确度的条件下,算法的运行效率可以得到大幅度提高.对模拟数据集和真实数据集的实验检测均验证了该算法具有良好的适用性和有效性.
Abstract:As an important task of data mining, outlier detection has been applied to many fields. Recently, research on mining in data stream is receiving more and more attention. For solving outlier detection in data stream, a new fast outlier detection algorithm is presented. Based on dynamically grid partitioning data space, the method separates dense areas from sparse areas. Data in dense areas are filtered simply, which reduces greatly the size of objects the algorithm should consider. Outliernesses of candidates in sparse areas are approximated efficiently. Data with high outlierness are outputted as outliers. Results of experiments on synthetic and real data sets show promising availabilities of the approaches.
文章编号:     中图分类号:    文献标志码:
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60572112(国家自然科学基金) Supported by the National Natural Science Foundation of China under Grant No.60572112(国家自然科学基金)
Foundation items:
Reference text:

杨宜东,孙志挥,朱玉全,杨明,张柏礼.基于动态网格的数据流离群点快速检测算法.软件学报,2006,17(8):1796-1803

YANG Yi-Dong,SUN Zhi-Hui,ZHU Yu-Quan,YANG Ming,ZHANG Bo-Li.A Fast Outlier Detection Algorithm for Data Streams Based on Dynamic Grids.Journal of Software,2006,17(8):1796-1803