A Hierarchical Method for Determining the Number of Clusters
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    A fundamental and difficult problem in cluster analysis is the determination of the "true" number of clusters in a dataset. The common trail-and-error method generally depends on certain clustering algorithms and is inefficient when processing large datasets. In this paper, a hierarchical method is proposed to get rid of repeatedly clustering on large datasets. The method firstly obtains the CF (clustering feature) via scanning the dataset and agglomerative generates the hierarchical partitions of dataset, then a curve of the clustering quality w.r.t the varying partitions is incrementally constructed. The partitions corresponding to the extremum of the curve is used to estimate the number of clusters finally. A new validity index is also presented to quantify the clustering quality, which is independent of clustering algorithm and emphasis on the geometric features of clusters, handling efficiently the noisy data and arbitrary shaped clusters. Experimental results on both real world and synthesis datasets demonstrate that the new method outperforms the recently published approaches, while the efficiency is significantly improved.

    Reference
    Related
    Cited by
Get Citation

陈黎飞,姜青山,王声瑞.基于层次划分的最佳聚类数确定方法.软件学报,2008,19(1):62-72

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 01,2007
  • Revised:October 09,2007
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063