###
Journal of Software:2015.26(5):1113-1128

面向海量数据流的基于密度的簇结构挖掘算法
于彦伟,王欢,王沁,赵金东
(烟台大学 计算机与控制工程学院, 山东 烟台 264005;Department of Computer Science, University of California, San Diego, USA;北京科技大学 计算机与通信工程学院, 北京 100083)
Density-Based Cluster Structure Mining Algorithm for High-Volume Data Streams
YU Yan-Wei,WANG Huan,WANG Qin,ZHAO Jin-Dong
(School of Computer and Control Engineering, Yantai University, Yantai 264005, China;Department of Computer Science, University of California, San Diego, USA;School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 2489   Download 2467
Received:May 16, 2014    Revised:September 12, 2014
> 中文摘要: 提出一种基于密度的簇结构挖掘算法(mining density-based clustering structure over data streams,简称MCluStream),以解决数据流密度聚类中输入参数选择困难和重叠簇识别等问题.首先,设计了一种树拓扑CR-Tree索引结构,将直接核心可达的一对数据点映射成树结构中的父子关系,蕴含了数据点依赖关系的CR-Tree涵盖了一系列subEps参数下的基于密度的簇结构;其次,MCluStream算法采用滑动窗口的方式更新CR-Tree,在线维护当前窗口上的簇结构,实现了对海量数据流的快速演化聚类分析;再次,设计了一种快速从CR-Tree提取簇结构的方法,根据可视化的簇结构,选择合理的聚类结果;最后,在真实和合成海量数据上的实验验证了MCluStream算法具有有效的挖掘效果、较高的聚类效率和较小的空间开销.MCluStream可适用于海量数据流应用中自适应的密度聚类演化 分析.
Abstract:This paper proposes a mining algorithm of density-based cluster-structure, named MCluStream, to resolve the problems of input parameter selection and overlapping cluster identification in evolving data stream. First, a tree topology index, named CR-Tree, is designed to map a pair of data points with directly core reachable into relationship of father and child node. The CR-Tree that record relationships among points represents cluster-structure under a series of subEps settings. Second, the online update of cluster-structure on CR-Tree is completed by MCluStream under sliding window environments, which effectively maintains clusters over massive evolving data streams. Third, a fast cluster-structure extraction method is implemented from the CR-Tree. Users can easily select reasonable clustering results according to the visualized cluster-structure. Finally, experimental evaluations on massive-scale real and synthetic data demonstrate the effective mining result and better performance of the proposed algorithm compared against state-of-the-art methods. MCluStream is desirable to be applied to self-adaptive density-based clustering over high-volume data streams.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61403328, 61302065, 61172049); 山东省自然科学基金(ZR2013FM011); 山东省高等学校科技计划(J14LN24); 吉林大学符号计算与知识工程教育部重点实验室开放基金(93K172014K13) 国家自然科学基金(61403328, 61302065, 61172049); 山东省自然科学基金(ZR2013FM011); 山东省高等学校科技计划(J14LN24); 吉林大学符号计算与知识工程教育部重点实验室开放基金(93K172014K13)
Foundation items:
Reference text:

于彦伟,王欢,王沁,赵金东.面向海量数据流的基于密度的簇结构挖掘算法.软件学报,2015,26(5):1113-1128

YU Yan-Wei,WANG Huan,WANG Qin,ZHAO Jin-Dong.Density-Based Cluster Structure Mining Algorithm for High-Volume Data Streams.Journal of Software,2015,26(5):1113-1128