题名汉语人机对话话语信息形式化表示研究
作者方志炜
学位类别博士
答辩日期2005
授予单位中国科学院声学研究所
授予地点中国科学院声学研究所
关键词人机对话系统 话语形式化 言语行为 语用行为 语义框架
其他题名Research on Formalizing Chinese Utterance Information for Human-Computer Interactions
中文摘要在人机言语对话过程中,核心问题是计算机对话系统如何理解人类自然语言的含义,并运用人类自然语言与用户进行有针对性的话语交谈。目前研发不受领域限制的人机对话系统在技术上存在很多瓶颈,而针对特定任务领域应用开发又存在研发周期长和成本高的问题。本论文着眼于人机言语对话的核心问题,为提高人机对话系统研究的通用性,构建适合人机对话的话语信息形式化表达体系,主要工作如下:(1)在Griee会话准则的基础上,提出人机对话系统的响应适当准则(RAM),从系统响应的信息准确性、信息适度性、表达自然性三个方面,建立和谐、合作人机对话的准则性规范,作为人机对话系统性能评测的检验性标准和人机对话系统设计与开发的指导性准则,对北京票务信,息系统(BEST)的优化结果表明,事务处理平均话轮数降低了0.92,事务处理成功率提高了11.47%,有效地提高了对话系统的人性化程度;(2)从话语交际功能出发,将话语信,急按语用信息F和命题内容P两个范畴进行形式化,构建了话语信息形式化表达体系(F(P)),对F(P)的表达覆盖率实验验证结果表明,对于无合作原则约束的对话,(P)表达覆盖率为56.24%,对于合作对一话,F(P)表达覆盖率为98.79%;(3)利用规则和统计相结合的方法,以Seorle语用行为分类作为初始蓝本,从人机对话的实际语料中获得人机对话语用行为的聚类分布(CS_(SDS)),并用于人机对话话语信息形式化表达体系F(P)的F形式化指示;(4)结合汉语短语本位的语法特点,提出双层语义框架文法(THSFG)作为实际汉语话语与话语信息F(P)表达之间的映射规则,回避了汉语词类无句法成分标一记在产生式定义上引起的困难,通过限制产生式的推导层级,有效降低任务领域语言建模中产生式总数,减少产生式定义与产生式推导规则之间的冲突;(5)实验研究了汉语双层语义框架文法在话语与话语信息(P)表达之间的相互转换,分别给出了THsFG在基于F(P)表达的汉语人机对话系统响应生成器和语言处理器中的应用流程,编程实现了响应生成器F(P)NLG,对F(P)NLG生成效果的主观评测结果表明,在不考虑语境的情况下,汉语简单句的生成效果满意率为79.87%,在考虑语境的情况下,生成效果满意率为77.01%。
英文摘要The challenge of making research on Chinese Spoken Dialog Systems (SDS's) is to build intelligent systems that can understand and use natural language impersonating human closely. But at present there are many bottlenecks on general SDS's researching and developing. And in order to improve the portability of SDS's, it is most important to build a general expression system to formalizing utterance information in human-computer interactions. Therefore, basing on exchanging-information nature of human-computer interactions, this thesis's research surrounds formalizing Chinese utterance information. And the main contributions of this thesis are: (1) Refer to Gricean conversational maxim, in order to standardize harmonious and cooperative human-computer interaction, Response Appropriate Maxim (RAM) of SDS is defined as follows: feed back information correctly, offer suggestive information neither more nor less, and express in spoken as natural as possible. RAM not only can be used as a testing standard in SDS's evaluation, but also can be used as a directive standard in SDS's developing and designing. Applying RAM to optimize BEST (BEST is a Chinese SDS oriented to Beijing railway's information) and the result (depressing ANT 0.92 and improving TS 11.47%) indicates that applying RAM to direct SDS's developing and designing is good for enhancing the SDS's impersonating performance totally. ( 2 ) Basing on exchanging-information nature of utterance, this thesis decomposes utterance information into illocutionary force (F) and propositional content (P), and indicating F by determining the classification of utterance's illocutionary act while expressing P by predicate logic expression. Furthermore, this thesis deal with transform rules of Chinese sentence pattern and ordinary habits of Chinese dialogue and set down corresponding rule of F(P) expression. Integrating F(P)'s expressing form and F(P)'s expressing rules of Chinese dialogue, this thesis build a expression system to formalize Chinese utterance information. Organizing an experiment to testify the descriptive capability of F(P) expression system, the experimental result is show as follow: for free human-computer dialog, the descriptive capability of F(P) is 86.24%; but for cooperative dialog, the descriptive capability of F(P) is 98.79%. (3) Choosing Searle's classification of illocutionary act as initialization, this thesis combines rule-based and statistics-based approaches to analyze human-computer dialog record (about 300 dialogues), and extracts illocutionary act classification of human-computer interactions (CSSDS)- CSSDS can be used as the illocutionary force indicating mechanism of human-computer interactions. Basing on Chinese phrase-standard grammar system, this thesis plan out a Two Hierarchy Semantic Frame Grammar (THSFG) to link textual Chinese utterance and F(P) expression of formalizing utterance information. THSFG is established on semantic parsing but not on syntax parsing, therefore the difficulty of syntax parsing resulted from Chinese lack of syntax characteristic label is avoided. Moreover, THSFG restricts the definition of extended rewriting rules to two hierarchies (sentence hierarchy and phrase hierarchy), and therefore conflict among rewriting rules is decreased sharply by reducing sum of rewriting rules. (5) Discussing the application of THSFG to transform textual Chinese utterance to F(P) expression and transform F(P) expression to textual Chinese utterance respectively. And a generator of F(P) expressing Chinese SDS (FP_NLG) is designed and implemented by applying THSFG. In order to evaluate the performance of F(P)_NLG, a subjective assessment in form of questionnaires was conducted and the statistical result is showed as follow: without regard to context, the satisfaction of simple sentence generation is 79.87%; with regard to context, the satisfaction is 77.01%.
语种中文
公开日期2011-05-07
页码122
内容类型学位论文
源URL[http://159.226.59.140/handle/311008/1060]  
专题声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式
GB/T 7714
方志炜. 汉语人机对话话语信息形式化表示研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2005.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace