关键词HNC理论 句类分析 句群 领域句类 概念关联式
其他题名The Design and Knowledge Representation of the Professional Activity Domain Sentence Category
中文摘要汉语是一种以意合为主的语言,以句群来说,句群是围绕某一特定主题展开的话语,这些话语必然表述这一主题的特征,蕴涵一定的知识,HNC称之为领域知识。领域知识既需要体现主题知识中主要构成部分的语义角色,又需要指出其各个语法成分。这些要求使我们联想到句类表示式这一形式化表述方式,通过句类表示式的方式将这一知识形式化表述出来。 本文采用句类表示式的形式化方法将领域知识有效地组织起来,形成可供计算机句群处理使用的领域句类知识。以HNC概念符号基元体系为切入点,将概念延伸结构蕴涵的领域知识归纳出来,通过有效组织,以领域句类表示式的方式表示出来,并给出相关的概念关联式。 在研究方法上,本文重在分析与归纳;分析部分体现在概念节点及其延伸结构的描述上,归纳部分体现在对节点分析的知识的进一步重组。 针对HNC概念延伸结构的多种设计内容,本文提出了对应的四项领域句类设计原则,将领域句类设计过程进一步细化为四个步骤:概念节点的分析、领域知识的归纳、领域句类表示式的设计和概念关联式的设计。研究结果所形成的领域句类知识库最终可服务于句类分析系统向语境单元萃取的提升。 本文的创新点表现为: (1)在HNC概念基元符号体系基础上归纳形成了领域句类的知识表示方法。概念基元符号体系在语义上揭示了概念的基元性和系统性,描述了概念之间的关联性。对于领域概念而言则定义了相对应的概念延伸结构及其概念关联知识;句类表示式则能够有效地体现语句深层的语义结构。本文从概念基元符号体系中的领域概念出发,以句类表示式为纲,形成了领域句类表示式这种新的世界知识形式化表示方法。 (2)在HNC概念基元符号体系基础上形成了领域句类知识设计的一般性方法,提出了设计的具体步骤。以概念基元符号体系中的延伸结构为切入点,通过延伸结构的具体分析,获取高层概念节点的知识设计总体;进一步对底层概念节点进行分析,以作用效应链为中心,归纳下层延伸结构领域知识;以句类知识为指导,为领域知识各语义角色分配语义块内容,确定其句类代码,最终形成领域句类表示式;在领域句类整体框架下,对概念节点本身和语义块内容进行关联性分析,最终获得各概念关联性知识,通过HNC映射符号形式化处理,给出概念关联式。 (3)提出了领域句类表示式的设计原则。共包含四大原则:分级体系设计原则有效地解决了领域概念领域知识归纳浓缩后的知识边界问题;作用效应设计原则保证了领域知识的设计过程中作用效应链准则的总体把握;延伸结构设计原则针对不同的延伸结构的节点设计特点量身定做了对应的表示模式;语句整合原则很好地解决了领域句类知识在句类分析系统中的应用问题。 (4)实现了专业活动领域四大领域概念林的领域知识的归纳及其形式化表示。通过真实新闻语料的统计分析,四大领域概念林(共性、政治、经济、文化)涵盖了约56%的领域空间,我们为其每一个领域概念林的每一个领域概念设计了对应的领域句类表示式,并配置了概念关联式,这些研究成果将成为领域句类知识库的主体内容,为领域句类知识库的最后完成打下坚实的基础。 (5)探索了领域句类知识在句类分析系统中的应用。领域句类知识最终服务于句类分析系统,一方面可以提高句类分析系统的句群处理能力,同时另一方面又可帮助提高句类分析系统对新词、语义切分模糊等处理难点的能力。领域句类知识将服务于最终实现句类分析系统从第一介层到第二介层的跨越。 综上所述,在HNC理论框架下,本文系统研究了领域句类的设计问题,并提出了相应的设计步骤和设计原则,并在专业活动的四大领域概念林内进行了领域句类的具体设计。本文的研究结果将有助于深化句类分析中有关句群处理的研究。
英文摘要Chinese sentences are constructed by the meaning of the words. We take the sentence group as the example. The sentence group is the sentences which have the central subject, and each sentence indicate the characteristic and knowledge of the subject. The knowledge is called the domain knowledge by the HNC theory. The domain knowledge both must be able to manifest the semantic role of the each main constituent part, and have to be able to point out its grammar ingredient. We use the sentence category expression to formalize the domain knowledge. This dissertation uses the formalized method of the sentence category expression to organize the domain knowledge, and forms the domain sentence category knowledge which can be used by computer in the sentence group processing. This dissertation takes the symbolism of concept element as the starting point, and summarizes the domain knowledge which be contained in the concept extending structure, through the effective organization, and expressed by the way of the domain sentence category expression, and produces the correlation the concept connection expression. As to methodology, this dissertation focuses on analysis and induction, and the process of analysis embodied in description of the concept nodes and the extended structure, and the process of induction embodied in the further reorganization of the knowledge of the nodes analysis. According to the many kinds of designs content of the HNC concept extended structure, this dissertation proposed the corresponding four design principle of the domain sentences category, and proposed the process of domain sentence category design further refined into four steps. The steps are the analysis of concept node, the induction of the domain knowledge, the design of the domain sentence category expression and the design of the concept connection expression. The knowledge library of the domain sentence category is used in the promotion of the sentence category analysis system to the sentence group unit extract. The main points of the contribution in this dissertation are listed following: (1) Formed the description method of the domain sentence category knowledge in the HNC concept element symbolism. The concept element symbolism has revealed the semantics and the systematic characteristic of the concept element, and described the connection characteristic between the concepts. This dissertation defines the corresponding extended structure of the concept and it’s the concept connection knowledge; the sentence category expression can effectively manifest the semantic in-depth structure of the sentence. This dissertation takes the domain concept in the symbolism of concept element as the starting point, and takes the sentence category expression type as the outline, and formed the formalized method of the domain sentence category expression. (2)Formed the general method of the domain sentence category knowledge design in the HNC concept element symbolism foundation, and proposed the design concrete steps. This dissertation takes the extended structure in the symbolism of concept element as the starting point, through the concrete analysis of the extended structure, and obtains the outline of the knowledge design of the high level concept node; and further carries on the analysis to the first floor concept node, and take the Action-Effect chain as the center, and induce the domain knowledge of the lower level extended structure; and took the sentence category knowledge as the instruction, and assigned the semantic role for the semantics block content, and determined its sentence category code, and finally forms the domain sentence category expression; Under the overall frame of the domain sentence category, this dissertation analysis the semantic block content and the concept node itself, and finally obtained various concepts connection knowledge, through the processing of the HNC mapping mark formalization, and produced the concept connection expression. (3) Proposed the design principle of the domain sentence category expression. The design principle of the graduation system has effectively solved the boundary problem of the domain knowledge induction concentration knowledge; The design principle of the Action-Effect has guaranteed the overall extrication to the Action-Effect chain criterion in the domain knowledge design process; The design principle of the extended structural made the corresponding design principle to the differently extended structure node design; the design principle of the sentence conformity has solved the application question which the domain sentence category knowledge be used in the sentence category analysis system. (4) Realized the induction and formalized expression with the domain knowledge of the four big domains concepts forest in the professional activity domain. Through the statistical analysis to the real news language material, these four big domains concept tree (general character, politics, economy, culture) have covered approximately 56% domain space. We designed the corresponding domain sentence category expression for each domain concept, and have disposed the concept connection expression. These research results will become the main body content of the domain sentence category knowledge library, and will builds the solid foundation for the completion of the domain sentence category knowledge library. (5) Explored the application of the domain sentence category knowledge in the sentence category analysis system. The domain sentence category knowledge finally serves the sentence category analysis system, on the one hand the domain sentence category knowledge may enhance the sentence group handling ability of the sentence category analysis system, on the other hand, and simultaneously the domain sentence category knowledge may enhance the handling ability to the new word and the semantics cutting fuzzy of sentence kind of analysis system, and so on. The domain sentence category knowledge will make the sentence category analysis system surmounted from the first referral level to the first referral level. In summary, under the HNC theory frame, this dissertation has studied the design question of the domain sentence category, and proposed the corresponding design procedure and the principle of design, and have carried on the domain sentence category concrete design of the four big domains concepts forest in the professional activity. This dissertation research results will be helpful the related sentence group processing research in a deepened sentence category analysis.
GB/T 7714
缪建明. 专业活动领域句类的设计与知识表示[D]. 声学研究所. 中国科学院声学研究所. 2007.
所有评论 (0)


©版权所有 ©2017 CSpace - Powered by CSpace