Density-Based Distributed Clustering Method

doi:10.13328/j.cnki.jos.005343

微信服务号

微信订阅号

Home > Archive>Volume 28, Issue 11, 2017 >2836-2850. DOI:10.13328/j.cnki.jos.005343

PDF HTML XML Export Cite reminder

Density-Based Distributed Clustering Method
DOI:
                        10.13328/j.cnki.jos.005343
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (60903098); Industry Technology Research and Development Projects of Jilin Province Development and Reform Commission (2015Y055); Key Scientific Research Project of Jilin Province Department of Science (20150204040GX); Graduate Innovation Fund of Jilin University (2016183)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Clustering is an important method for data analysis in the field of data mining. The function of clustering is to divide unlabeled data divided into several groups according to the data similarity. CSDP is a density-based clustering method. When data size is large or data dimensionality is high, the efficiency of clustering is relatively low. In order to improve the efficiency of clustering algorithm, this paper proposes a density-based distributed clustering method, called MRCSDP, which uses MapReduce to cluster text data. This method introduces the definition of independent calculation unit and independent calculation block. First, data are split into several data blocks which are used to construct independent calculation unit and independent calculation block. The task for each independent calculation block is assigned. Then the distributed calculation is conducted to obtain the local density of the data blocks. The local densities are combined to obtain the global density. The center value is calculated according to the global density. Based on the global density and the center value, the candidate cluster centers of each data block can be obtained. Finally, the global cluster centers are obtained by calculating the density of all candidate cluster centers. MRCSDP can achieve better clustering performance by reducing time complexity. Experimental results show that compared to CSDP, MRCSDP can process large scale data more effectively with load-balancing on each computing nodes.

Reference

Cited by

Get Citation

王岩,彭涛,韩佳育,刘露.一种基于密度的分布式聚类方法.软件学报,2017,28(11):2836-2850

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 14,2017
Revised:June 16,2016
Adopted:
Online: November 03,2017
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History