###
DOI:
Journal of Software:2008.19(3):663-673

基于Tri-Training和数据剪辑的半监督聚类算法
邓超,郭茂祖
(哈尔滨工业大学 计算机科学与技术学院,黑龙江 哈尔滨 150001)
Tri-Training and Data Editing Based Semi-Supervised Clustering Algorithm
DENG Chao,GUO Mao-Zu
()
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 3606   Download 5078
Received:June 21, 2006    Revised:March 07, 2007
> 中文摘要: 提出一种半监督聚类算法,该算法在用seeds集初始化聚类中心前,利用半监督分类方法Tri-training的迭代训练过程对无标记数据进行标记,并加入seeds集以扩大规模;同时,在Tri-training训练过程中结合基于最近邻规则的Depuration数据剪辑技术对seeds集扩大过程中产生的误标记噪声数据进行修正、净化,以提高seeds集质量.实验结果表明,所提出的基于Tri-training和数据剪辑的DE-Tri-training半监督聚类新算法能够有效改善seeds集对聚类中心的初始化效果,提高聚类性能.
Abstract:In this paper, a algorithm named DE-Tri-training semi-supervised K-means is proposed, which could get a seeds set of larger scale and less noise. In detail, prior to using the seeds set to initialize cluster centroids, the training process of a semi-supervised classification approach named Tri-training is used to label unlabeled data and add them into the initial seeds set to enlarge the scale. Meanwhile, to improve the quality of the enlarged seeds set, a nearest neighbor rule based data editing technique named Depuration is introduced into Tri-training process to eliminate and correct the mislabeled noise data in the enlarged seeds. Experimental results show that the novel semi-supervised clustering algorithm could effectively improve the cluster centroids initialization and enhance clustering performance.
文章编号:     中图分类号:    文献标志码:
基金项目:Supported by the National Natural Science Foundation of China under Grant Nos.60702033,60772076(国家自然科学基金);the National High-Tech Researth and Development Plan of China under Grant No.2007AA012171(国家高技术研究发展计划(863));the Science Fund for Distinguished Young Scholars of Heilongjiang Province of China under Grant No.JC200611(黑龙江省杰出青年科学基金);the Natural Science Foundation of Heilongjiaag Province of China under Grant No.ZJG0705(黑龙江省自然科学重点基金);the Foundation of Harbin Institute of Technology of China under Grant No.HIT.2003.53(哈尔滨工业大学校基金) Supported by the National Natural Science Foundation of China under Grant Nos.60702033,60772076(国家自然科学基金);the National High-Tech Researth and Development Plan of China under Grant No.2007AA012171(国家高技术研究发展计划(863));the Science Fund for Distinguished Young Scholars of Heilongjiang Province of China under Grant No.JC200611(黑龙江省杰出青年科学基金);the Natural Science Foundation of Heilongjiaag Province of China under Grant No.ZJG0705(黑龙江省自然科学重点基金);the Foundation of Harbin Institute of Technology of China under Grant No.HIT.2003.53(哈尔滨工业大学校基金)
Foundation items:
Reference text:

邓 超,郭茂祖.基于Tri-Training和数据剪辑的半监督聚类算法.软件学报,2008,19(3):663-673

DENG Chao,GUO Mao-Zu.Tri-Training and Data Editing Based Semi-Supervised Clustering Algorithm.Journal of Software,2008,19(3):663-673