Towards Compact and Fast Neural Machine Translation Using a Combined Method | |
Xiaowei Zhang1,2; Wei Chen1; Feng Wang1,2; Shuang Xu1; Bo Xu1 | |
2017-09 | |
会议日期 | 2017-9 |
会议地点 | 丹麦哥本哈根 |
关键词 | Machine Translation Neural Network Model Compression Decoding Speedup |
页码 | 1475–1481 |
英文摘要 | Neural Machine Translation (NMT) lays intensive burden on computation and memory cost. It is a challenge to deploy NMT models on the devices with limited computation and memory budgets. This paper presents a four stage pipeline to compress model and speed up the decoding for NMT. Our method first introduces a compact architecture based on convolutional encoder and weight shared embeddings. Then weight pruning is applied to obtain a sparse model. Next, we propose a fast sequence interpolation approach which enables the greedy decoding to achieve performance on par with the beam search. Hence, the time-consuming beam search can be replaced by simple greedy decoding. Finally, vocabulary selection is used to reduce the computation of softmax layer. Our final model achieves 10× speedup, 17× parameters reduction, <35MB storage size and comparable performance compared to the baseline model. |
语种 | 英语 |
内容类型 | 会议论文 |
源URL | [http://ir.ia.ac.cn/handle/173211/21185] |
专题 | 类脑智能研究中心_神经计算及脑机交互 |
作者单位 | 1.Institute of Automation, Chinese Academy of Sciences 2.University of Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Xiaowei Zhang,Wei Chen,Feng Wang,et al. Towards Compact and Fast Neural Machine Translation Using a Combined Method[C]. 见:. 丹麦哥本哈根. 2017-9. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论