(国防科学技术大学 计算机学院, 湖南 长沙 410073;深圳华大基因研究院, 广东 深圳 518083)
Parallel Optimization Strategy on Tianhe-2 Supercomputer for a Method of DNA Sequence de novo Assembly
ZHANG Feng,LIAO Xiang-Ke,PENG Shao-Liang,ZHU Xiao-Qian,WANG Bing-Qiang,CUI Ying-Bo
(College of Computer, National University of Defense Technology, Changsha 410073, China;Shenzhen Huada Gene Research Institute, Shenzhen 518083, China)
Received:August 05, 2013    Revised:March 13, 2014
> 中文摘要: 基于String Graph理论的序列拼接工具SGA是当前国际上的一种新型序列拼接工具.首先,形式化证明了SGA的序列拼接问题是一个NP完全问题,然后对SGA的拼接效率进行了分析,发现与业界同类拼接软件相比,SGA在内存开销方面具有优势,但却有更大的时间开销,其中构建索引占了60%~70%的比例.基于此,设计了一种并行优化策略,并实现了面向天河二号体系结构的并行策略来解决这一问题.分别在普通机群和天河二号上进行性能测试,针对小规模数据,优化后的索引构建时间比之前的最佳性能提高了3.06倍,中等规模数据提高了1.60倍,实验结果表明,其优化效果明显,且并行构建局部索引过程具有良好的线性扩展性.其中用到的优化方法和策略对相关问题的研究有一定的借鉴意义.这也表明,天河二号的超级计算能力能够很好地助力生命科学领域的相关研究.
Abstract:SGA is a tool based on string graph theory for DNA sequence de novo assembly. In this paper, the sequence de novo assembly problem based on SGA is proved to be an NP-complete problem, and detailed analysis on SGA is provided. According to the result, SGA outperforms other similar tools in memory consumption, but cost much more on time in which 60%~70% is spent by index construction. To tackle these issues, this paper introduces a deep parallel optimization strategy, and implements a Tianhe-2 architecture oriented parallel framework. Experiments are carried out on different data sizes on ordinary cluster and Tianhe-2. For data of small size, the optimized solution is 3.06 times as fast as before, and for data of medium size, it's 1.60 times. The results demonstrate the evident overall improvement and the linear scalability for parallel FM-index construction. This study can be beneficial to the optimization research of other relevant issues, and it also affirms the powerful computing ability of Tianhe-2 as a useful tool in life sciences research.
基金项目:国家自然科学基金(U1435222,61272056,61070041);广州超算应用研发基金(1488064512003,7411694292900) 国家自然科学基金(U1435222,61272056,61070041);广州超算应用研发基金(1488064512003,7411694292900)
Foundation items:
