###
Journal of Software:2015.26(5):1079-1095

基于MapReduce与相关子空间的局部离群数据挖掘算法
张继福,李永红,秦啸,荀亚玲
(太原科技大学 计算机科学与技术学院, 山西 太原 030024;Department of Computer Science and Software Engineering, Auburn University, Auburn, USA)
Related-Subspace-Based Local Outlier Detection Algorithm Using MapReduce
ZHANG Ji-Fu,LI Yong-Hong,QIN Xiao,XUN Ya-Ling
(School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China;Department of Computer Science and Software Engineering, Auburn University, Auburn, USA)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 3510   Download 3076
Received:December 19, 2013    Revised:May 21, 2014
> 中文摘要: 针对高维海量数据,在MapReduce编程模型下,提出了一种基于相关子空间的局部离群数据挖掘算法.该算法首先利用属性维上的局部稀疏程度,重新定义了相关子空间,从而能够有效地刻画各种局部数据集上的分布特征;其次,利用局部数据集的概率密度,给出了相关子空间中的局部离群因子计算公式,有效地体现了相关子空间中数据对象不服从局部数据集分布特征的程度,并选取离群程度最大的N个数据对象定义为局部离群数据;在此基础上,采用LSH分布式策略,提出了一种MapReduce编程模型下的局部离群数据挖掘算法;最后,采用人工数据集和恒星光谱数据集,实验验证了该算法的有效性、可扩展性和可伸缩性.
Abstract:In this paper, a related-subspace-based local outlier detection algorithm is proposed in MapReduce programming model for high-dimensional and massive data set. Firstly, the relevant subspace, which can effectively describe the local distribution of the various data sets, is redefined by using local sparseness of attribute dimensions. Secondly, a local outlier factor calculation formula in the relevant subspace is defined with probability density of local data sets. The formula can not only effectively reflect the outlierness of data object that does not obey the distribution of the local data set in relevant subspace, but also select N data objects with the greatest-outlierness as local outliers. Furthermore, a related-subspace-based local outlier detection algorithm is constructed by using LSH distributed strategy in MapReduce programming model. Finally, experimental results validate the effectiveness, scalability and extensibility of the presented algorithms by using artificial data and stellar spectral data as experimental data sets.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61272263) 国家自然科学基金(61272263)
Foundation items:
Reference text:

张继福,李永红,秦啸,荀亚玲.基于MapReduce与相关子空间的局部离群数据挖掘算法.软件学报,2015,26(5):1079-1095

ZHANG Ji-Fu,LI Yong-Hong,QIN Xiao,XUN Ya-Ling.Related-Subspace-Based Local Outlier Detection Algorithm Using MapReduce.Journal of Software,2015,26(5):1079-1095