Journal of Software:2014.25(5):929-938

(西北工业大学 自动化学院, 陕西 西安 710072;Department of Biostatistics, Yale University, USA)
Heuristic Clustering Method Based on Neighbor-Seeds for 454 Sequencing Data
CHEN Wei,CHENG Yong-Mei,ZHANG Shao-Wu,PAN Quan
(College of Automation, Northwestern Polytechnical University, Xi'an 710072, China;Department of Biostatistics, Yale University, USA)
Chart / table
Similar Articles
Article :Browse 2155   Download 2530
Received:July 10, 2013    Revised:December 03, 2013
> 中文摘要: 随着二代测序技术的发展,产生了海量16S rRNA基因序列数据.如何有效地挖掘这些数据中隐藏的基因组学信息,是当前研究的热点与难点.序列聚类研究如何将来源于同一物种的序列合并在一起,其构成了物种多样性、结构及功能多样性研究的基础.针对454测序误差的来源特点,提出一种基于邻域种子序列的启发式序列聚类算法(NbHClust).实验结果表明,该算法具有良好的鲁棒性能.与传统启发式序列聚类算法相比,该算法能够降低操作分类单元(operational taxonomy unit,简称OTU)过估计问题,提高聚类精度,有效地进行操作分类单元计算.
Abstract:With the development of next-generation sequencing technology, a large number of 16S rRNA gene reads have been collected. A key and important issue is to develop novel methods for mining the hidden information among those data. Sequence clustering aims to find the natural groups of large-scale data which can help us to understand the species, functional and structural diversity of microbial communities. This present work proposes a heuristic clustering method based on Neighbor-seeds, named NbHClust, for 454 sequencing data. The results show that this method can reduce extent of overestimation of operational taxonomy unit (OTU) and have a good robust and high clustering accuracy.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61170134,61135001);航空基金(20100853010);西安市科技计划(CXY1350(2));西北工业大学博士创新基金(cx201017) 国家自然科学基金(61170134,61135001);航空基金(20100853010);西安市科技计划(CXY1350(2));西北工业大学博士创新基金(cx201017)
Foundation items:
Reference text:


CHEN Wei,CHENG Yong-Mei,ZHANG Shao-Wu,PAN Quan.Heuristic Clustering Method Based on Neighbor-Seeds for 454 Sequencing Data.Journal of Software,2014,25(5):929-938