SWAP-Assembler 2: Scalable Genome Assembler towards Millions of Cores - Practice and Experience
Jintao Meng; Yanjie Wei; Sangmin Seo; Pavan Balaji
2015
会议名称CCGRid2015
会议地点深圳
英文摘要There is widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these huge sequencing data, which can be Tara bytes or even Peta bytes. Previously our assembly tool, SWAP-Assembler, can scale to 2048 cores on TianHe 1A for human Yanhuang genome. This work is to further scale SWAP-Assembler to millions of cores on Mira. SWAP-Assembler can be divided into 5 steps, and the most time consuming steps are input parallelization, kmer graph construction, graph simplification (edge merging). We optimize these three steps to keep the percentage of time usage in each step constant when the number of cores increases. For the input parallelization step, the input data is divided into virtual fragments with almost equal size, the begin position and end position for each fragment is automatically separated at the beginning symbol of reads. This data blocking strategy plays a central role in adjusting the data size to keep the communication and memory efficiency for the subsequent steps. In kmer graph construction, to prevent the communication efficiency degradation, the message size is kept constant (about 8k bytes) between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step in each round. The memory usage can be also benefited, as only a small part of the input data is processed in each round. Within graph simplification, the major improvement is to combine messages sending & receiving between its two neighbors into one loop in the communication protocol. After integrated with the above optimizations, the new assembly tool is denoted as SWAP-Assembler 2 or SWAP2 for short. In our experiment for 1k human genome dataset, the modified SWAP-Assembler 2 can scale to 16k cores with parallel efficiency of 70%.
收录类别EI
语种英语
内容类型会议论文
源URL[http://ir.siat.ac.cn:8080/handle/172644/6968]  
专题深圳先进技术研究院_数字所
作者单位2015
推荐引用方式
GB/T 7714
Jintao Meng,Yanjie Wei,Sangmin Seo,et al. SWAP-Assembler 2: Scalable Genome Assembler towards Millions of Cores - Practice and Experience[C]. 见:CCGRid2015. 深圳.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace