Towards Compact and Fast Neural Machine Translation Using a Combined Method
Xiaowei Zhang1,2; Wei Chen1; Feng Wang1,2; Shuang Xu1; Bo Xu1
2017-09
会议日期2017-9
会议地点丹麦哥本哈根
关键词Machine Translation Neural Network Model Compression Decoding Speedup
页码1475–1481
英文摘要Neural Machine Translation (NMT) lays intensive burden on computation and
memory cost. It is a challenge to deploy NMT models on the devices with limited computation and memory budgets. This paper presents a four stage pipeline to
compress model and speed up the decoding for NMT. Our method first introduces
a compact architecture based on convolutional encoder and weight shared embeddings. Then weight pruning is applied to obtain a sparse model. Next, we propose a fast sequence interpolation approach which enables the greedy decoding to achieve performance on par with the beam search. Hence, the time-consuming beam search can be replaced by simple
greedy decoding. Finally, vocabulary selection is used to reduce the computation
of softmax layer. Our final model achieves 10
× speedup, 17× parameters reduction,
<35MB storage size and comparable performance compared to the baseline model.

语种英语
内容类型会议论文
源URL[http://ir.ia.ac.cn/handle/173211/21185]  
专题类脑智能研究中心_神经计算及脑机交互
作者单位1.Institute of Automation, Chinese Academy of Sciences
2.University of Chinese Academy of Sciences
推荐引用方式
GB/T 7714
Xiaowei Zhang,Wei Chen,Feng Wang,et al. Towards Compact and Fast Neural Machine Translation Using a Combined Method[C]. 见:. 丹麦哥本哈根. 2017-9.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace