AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs

CORC > 软件研究所 > 软件所图书馆 > 会议论文

	AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs
	Wang, Qian (1) ; Zhang, Xianyi (1) ; Zhang, Yunquan (2) ; Yi, Qing (3)
	2013
会议名称	2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013
会议日期	November 17, 2013 - November 22, 2013
会议地点	Denver, CO, United states
中文摘要	Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameterized code templates to formulate a number of commonly occurring instruction sequences within the optimized low-level C code of these DLA kernels. Then, our framework uses a specialized low-level C optimizer to identify instruction sequences that match the pre-defined code templates and thereby translates them into extremely efficient SSE/AVX instructions. The DLA kernels generated by our templatebased approach surpass the implementations of Intel MKL and AMD ACML BLAS libraries, on both Intel Sandy Bridge and AMD Piledriver processors. Copyright 2013 ACM.
英文摘要	Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameterized code templates to formulate a number of commonly occurring instruction sequences within the optimized low-level C code of these DLA kernels. Then, our framework uses a specialized low-level C optimizer to identify instruction sequences that match the pre-defined code templates and thereby translates them into extremely efficient SSE/AVX instructions. The DLA kernels generated by our templatebased approach surpass the implementations of Intel MKL and AMD ACML BLAS libraries, on both Intel Sandy Bridge and AMD Piledriver processors. Copyright 2013 ACM.
收录类别	EI
会议录出版地	IEEE Computer Society
语种	英语
ISSN号	21674329
ISBN号	9781450323789
内容类型	会议论文
源URL	[http://ir.iscas.ac.cn/handle/311060/16662]
专题	软件研究所_软件所图书馆_会议论文
推荐引用方式 GB/T 7714	Wang, Qian ,Zhang, Xianyi ,Zhang, Yunquan ,et al. AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs[C]. 见:2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013. Denver, CO, United states. November 17, 2013 - November 22, 2013.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们