A Cross-Platform SpMV Framework on Many-Core Architectures | |
Zhang, Yunquan3; Li, Shigang3; Yan, Shengen2; Zhou, Huiyang1 | |
刊名 | ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION |
2016-12-01 | |
卷号 | 13期号:4页码:25 |
关键词 | SpMV segmented scan BCCOO OpenCL CUDA GPU Intel MIC parallel algorithms |
ISSN号 | 1544-3566 |
DOI | 10.1145/2994148 |
英文摘要 | Sparse Matrix-Vector multiplication (SpMV) is a key operation in engineering and scientific computing. Although the previous work has shown impressive progress in optimizing SpMV on many-core architectures, load imbalance and high memory bandwidth remain the critical performance bottlenecks. We present our novel solutions to these problems, for both GPUs and Intel MIC many-core architectures. First, we devise a new SpMV format, called Blocked Compressed Common Coordinate (BCCOO). BCCOO extends the blocked Common Coordinate (COO) by using bit flags to store the row indices to alleviate the bandwidth problem. We further improve this format by partitioning the matrix into vertical slices for better data locality. Then, to address the load imbalance problem, we propose a highly efficient matrix-based segmented sum/scan algorithm for SpMV, which eliminates global synchronization. At last, we introduce an autotuning framework to choose optimization parameters. Experimental results show that our proposed framework has a significant advantage over the existing SpMV libraries. In single precision, our proposed scheme outperforms clSpMV COCKTAIL format by 255% on average on AMD FirePro W8000, and outperforms CUSPARSE V7.0 by 73.7% on average and outperforms CSR5 by 53.6% on average on GeForce Titan X; in double precision, our proposed scheme outperforms CUSPARSE V7.0 by 34.0% on average and outperforms CSR5 by 16.2% on average on Tesla K20, and has equivalent performance compared with CSR5 on Intel MIC. |
资助项目 | National Natural Science Foundation of China[61502450] ; National Natural Science Foundation of China[61432018] ; National Natural Science Foundation of China[61521092] ; National Natural Science Foundation of China[61272136] ; National Key Research and Development Program of China[2016YFB0200803] ; NSF project[1216569] ; AMD Inc. |
WOS研究方向 | Computer Science |
语种 | 英语 |
出版者 | ASSOC COMPUTING MACHINERY |
WOS记录号 | WOS:000392416400002 |
内容类型 | 期刊论文 |
源URL | [http://119.78.100.204/handle/2XEOYT63/7660] |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Li, Shigang; Yan, Shengen |
作者单位 | 1.North Carolina State Univ, Dept Elect & Comp Engn, Raleigh, NC 27695 USA 2.Chinese Univ Hong Kong, Dept Informat Engn, SenseTime Grp Ltd, Hong Kong, Hong Kong, Peoples R China 3.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Zhang, Yunquan,Li, Shigang,Yan, Shengen,et al. A Cross-Platform SpMV Framework on Many-Core Architectures[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,2016,13(4):25. |
APA | Zhang, Yunquan,Li, Shigang,Yan, Shengen,&Zhou, Huiyang.(2016).A Cross-Platform SpMV Framework on Many-Core Architectures.ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,13(4),25. |
MLA | Zhang, Yunquan,et al."A Cross-Platform SpMV Framework on Many-Core Architectures".ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 13.4(2016):25. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论