汉语语言模型在统计机器翻译系统中的应用

CORC > 自动化研究所 > 中国科学院自动化研究所 > 毕业生 > 硕士学位论文

题名	汉语语言模型在统计机器翻译系统中的应用
作者	王韦华
学位类别	工学硕士
答辩日期	2009-12-21
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	徐波
关键词	统计机器翻译语料预处理汉语语言模型语料后处理 statistical machine translation training data preprocessing Chinese language model data post-processing
其他题名	The Application of Chinese Language Models for Statistical Machine Translation System
学位专业	模式识别与智能系统
中文摘要	目前，大多数机器翻译系统使用的是基于统计的方法。其中该方法的主流包括基于短语的系统和基于层次短语系统。语言模型在统计翻译系统中起到非常重要的作用。它使得翻译的结果更加符合目标语言的语法。然而，不同规模不同元数的n元语言模型对不同的系统到底有什么影响，本文做了大量的实验进行了比较和分析。论文的主要工作归纳如下： 1.介绍了基于短语的统计机器翻译系统的整体架构和各个功能模块的实现与优化。其中主要包括了语言模型的训练，翻译模型的训练，短语系统解码器，最小错误率训练和后处理几大模块。 2. 描述了基于层次短语的统计机器翻译系统的原理和实现。 3.介绍了如何对汉英平行语料进行预处理以满足机器翻译系统的需要，获取原始语料到训练翻译模型和统计模型所需要的语料所需要经过的初始加工和深度加工，实现了一个汉英语料预处理平台。 4. 分析了汉语语言模型的规模对统计机器翻译系统的影响。专门研究了汉语语言模型的规模大小，语法元数在两个英汉统计机器翻译系统中的影响。这两个系统分别是基于短语的统计翻译系统和基于层次短语的统计翻译系统。综上所述，本论文面向统计机器翻译在训练语料预处理、系统实现与优化、语言模型的规模对系统的影响等方面做了大量的实验，进行了比较深入的研究，改进了现有实验系统的性能。
英文摘要	At present, the statistical methods including Phrase-based system and Hierarchical-based system in machine translation field is predominant. Language model plays an important role in statistical translation system. It makes the translation fit for grammar of target language. We wonder what the effects of Chinese language models’ scale and n-gram’s dimension in English-Chinese machine translation systems are. So we have done many experiments in this dissertation. The main contributions of this paper are as follows: 1. Study on Phrase-based system’s framework and every functional model. The functional models include language model training, translation model training, decoder, the algorithm of minimum error rate training and post-processing. 2. Describe the implementation of Hierarchical-based statistical translation system. 3. Study on how to process Chinese to English parallel corpus in machine translation system, how to make corpus from original to mature, and developed a form to preprocess corpus. 4. Study on the effects of Chinese language models’ scale and n-gram’s dimension in English-Chinese machine translation systems. Experiments show that for the same language models, hierarchical phrase-based MT system is better than phrase-based MT system, but for the same MT system, Language models’ scale and dimension effects the BLEU value obviously. It is not sure that a larger scale and higher dimension language model has a better result. In general, this paper mainly focuses on the preprocessing of the training data, the implement of machine translation system, the scale of Chinese language models for Statistical Machine Translation Systems, which have greatly improved the translation result.
语种	中文
其他标识符	200628014628047
内容类型	学位论文
源URL	[http://ir.ia.ac.cn/handle/173211/7503]
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	王韦华. 汉语语言模型在统计机器翻译系统中的应用[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2009.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们