题名口语对话语音识别解码策略与置信度研究
作者付跃文
学位类别博士
答辩日期2005
授予单位中国科学院声学研究所
授予地点中国科学院声学研究所
关键词语音识别 对话系统 搜索算法 树形词表 置信度 预测子 决策树词图
其他题名Decoding Strategy and Confidence Measures for Speech Recognition in Spoken Dialog System
中文摘要口语对话系统的语音识别是语音识别领域近年来的研究重点之一,本文的研究工作的总的目的是建立实用的针对口语对话系统的语音识别解码平台。在本文中,口语对话系统解码器的两个重要的要求-解码的实时性和对于识别结果的置信度的估计,被作为重点来研究。本论文的主要工作及贡献如下:1、从语音识别的基本理论出发,建立了对识别错误进行分类和统计分析的方法,将每个错误区域分类为解码错误、声学错误、语言模型错误、混合错误四大类,接着对错误进行统计分析从而诊断系统的主要错误特征,为解码系统的改进提供依据。2、为查询类对话系统的语音识别任务建立了一个有效的快速解码架构。在本部分工作中,针对查询类对话任务的语料稀疏和解码的实时性问题,提出了将语料稀疏问题和快速解码问题作为一个整体来考虑的新思路。为了适应语料稀疏,采用了基于类的语言模型,在此基础上,本文利用了查询类对话系统少数词类在词表中占有很大比例的特点,在解码中采用了基于词类的多树结构及灵活的语言模型look-ahead方法。拥有很多词的大类被单独建树,从而在解码时免除look-ahead计算,加快解码速度,同时由于词表本身的结构特点,搜索空间也扩大很少。该方法对于传统的基于单树的解码算法是一种重要的改进。3、论文第三个方面的工作是对语音识别输出的结果进行了置信度估计,对于当前的对话系统来说,置信度估计是必不可少的组成部分。在本部分工作中,利用解码过程第二遍搜索-堆栈译码搜索过程中产生的局域词图计算了局域后验概率,并以此为基础,采用决策树方法融合词长及本文提出的新的预测子-邻接词的局域词后验概率等预测子,显著提高了基于局域词后验概率的置信度估计的性能。同时,为满足采用wordlatice作为输出的对话系统语音解码器的需要,本文研究了在lattice结构(本文具体使用HTK的lattice结构)上的基于词后验概率的置信度估计算法。本文采用了基于连接弧的前后向算法计算后验概率,然后基于后验概率对输出词做了置信度估计,实现了基于lattice的置信度估计的高效计算。
英文摘要Speech recognition for spoken dialog system has been one of the important subjects in speech recognition area in recent years. This thesis is focused on building a practical speech recognition decoder for spoken dialog system. Two of important issues in speech decoder for spoken dialog system, fast decoding and confidence measure, are studied in this thesis. The main contributions of this thesis are: Based on fundamental theory of speech recognition, this thesis presents a set of procedures to classify recognition errors into one of four classes: the decoding, the acoustic model, the language model, and the acoustic&language model. Then statistical analysis of classified errors is performed to diagnose the main causes of errors to provide guidance for improving the system. Build a fast decoding frame for information query dialog system. This thesis considers both data sparse and fast decoding problem as a whole and give an appropriate solution. In order to deal with the sparse data problem, class-based language model is used. Taking advantage of the characteristics of vocabulary structure in many information query dialog systems (very few classes take up large proportion of the vocabulary), this thesis constructs multiple lexicon-tree based on word class as search space and flexible look-ahead technique is adopeded in decoding. Big word classes ( classes that own many words) are separated to build trees individually, which dispenses with look-ahead computation in decoding and search space is enlarged little at the same time owing to the structure of the vocabulary. This method is an important improvement for traditional single-tree search algorithm. 3. The third work of this thesis is the confidence measures for the output of the recognition system, which is an important part for present-day dialog systems. In this part of work, local word posterior probability computed from word expansion during stack decoding search is taken as a important predictor of confidence measure under real-time condition, the performance of which is improved by using decision tree to combine it with other real-time predictors. A series of other predictors are constructed and the experiments on different combination of predictors using decision tree are carried out. The experimental results show that confidence measure based on local word posterior probability can be improved significantly. The experiments also show that local posterior probabilities of adjacent words suggested by this thesis are relatively effective predictors. The confidence measure under word lattice ( HTK lattice in this thesis) is also studied for the need of speech decodor using lattice as output. This thesis first uses forward-backward algorithm to compute very efficiently word posterior probabilities based on lattice arcs. The confidence value of each word is then computed based on word posterior probabilities of arcs.
语种中文
公开日期2011-05-07
页码110
内容类型学位论文
源URL[http://159.226.59.140/handle/311008/890]  
专题声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式
GB/T 7714
付跃文. 口语对话语音识别解码策略与置信度研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2005.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace