CORC  > 自动化研究所  > 中国科学院自动化研究所  > 毕业生  > 硕士学位论文
题名基于概念网络的信息检索研究与开发实践
作者程盛远
学位类别工学硕士
答辩日期2004-06-01
授予单位中国科学院研究生院
授予地点中国科学院自动化研究所
导师杨一平
关键词概念网络 信息检索 自然语言处理 词法分析 相似度 Conceptual Network Information Retrieval Natural Language Processing Lexical Analysis Similarity
其他题名Information Retrieval Based on Conceptual Network
学位专业模式识别与智能系统
中文摘要随着网络信息的扩张,如何提高信息检索系统对自然语言的处理能力,成 为研究热点。传统的基于关键词字面匹配的方式无法解决复杂的语言关联问题, 一些新的自然语言表达模型试图刻画语义关系,利用自然语言处理技术实现对 文本信息的理解和检索。本文在分析已有模型和方法的基础上,提出了基于概 念网络的信息检索思想,旨在采用自然语言处理技术解决检索系统文本分析的 词法和语义问题,以语言单位的意义(概念)为核心,构建英语概念网络的语 言知识表达方式,用于词法分析、关联搜索、语义匹配与相似度计算等方面。 论文的主要内容分为三部分: 1、研究并阐释了概念网络的表达体系,将概念网络用于英语的自然语言知 识的表达。概念(义项)成为描述语言关联现象的基本单位,按属性、关系和 行为三部分深入分析了概念的组成,不同的概念通过关系和行为(产生式规则) 联系起来成为概念网络;分析了基于概念网络进行信息检索的可行性,对在此 基础上的自然语言处理过程作了策略分析,认为需要分阶段和层次一词法、语 法、语义和语用;初步构建了包含英语概念网络在内的实验用知识库。 2、基于知识的多语言词法分析器。本文利用专家系统的技术,将词法分析 的知识组织成数据、知识库和控制三部分,建立了一套规则表示与解释的机制; 需要时推理机从数据库调用知识,实现了具体分析规则的内容与程序的分离, 使得可以无需修改程序代码而直接向数据添加、删除、更新知识,这给系统 更新带来了方便;对于多语言词法分析来讲,就司能达到在同一个框架内使用 内容不同的词法分析知识的目的。这个框架有望能够以很小的代价发展成德语、 法语等同一语系的词法分析器。 3、基于概念网络的词汇相似度计算模型。深入分析了模型的结构组成,研 究了模型的不同部分词性、背景、词形和词义相似度的具体实现及其核心的概 念网络搜索算法,模型设计体现了权值动态调整策略、关系的不同处理方式。
英文摘要With the information extension on Internet, it has been a research focus to investigate how to perform natural language processing (NLP) in an information retrieval (IR) system. Generally speaking, the conventional text representation scheme is keyword-based, which is helpless when it comes to comprehensive correlations in text in natural language. Some new models have been developed representing semantic relations, to understand and retrieve text information in natural language with the help of NLP technology. On the basis of the analysis of the existing models and methods, I present the idea of applying the Conceptual Network (Connet) to IR system to deal with lexical and semantic problems. Based on word meaning (concept), a Connet for English is built and used in lexical analysis, searching, semantic matching and similarity computation. Three main contributions are included in my dissertation: 1. Research on Connet, and its application in knowledge representation for English natural language. In Connet, the basic representing unit is concept which is composed of three elements: attribute, relation and invoking. The Connet incorporates all interrelated concepts. I study the feasibility and the procedures of IR with the help of Connet and divide the NLP procedure into four different phases or hierarchies: lexical, syntactic, semantic and pragmatic. A simple knowledge base has been built for experimental purpose which embodies the Connet for English. 2. Knowledge based lexical analyzer for multi languages. On the principles of expert system the lexical knowledge is divided into three parts as data, knowledge base and control, and the lexical rules are represented and explained in some way. The specific knowledge in database is separated from the codes, called by the inference engine and can be added, deleted or updated directly by database operation without code changing, which makes it easy for the analyzer to upgrade. Different lexical knowledge may be incorporated into the above framework, which makes it expectable to develop this analyzer easily for multi inflecting languages such as French and Germany. 3. A model for word similarity computation based on Connet. Flow charts are given and the specific procedures are analyzed such as part of speech similarity, context similarity, morphology similarity and semantic similarity, of which the most important component is the searching algorithm for Connet. The model incorporates some features such as dynamic weight regulating, specific strategies for different relations.
语种中文
其他标识符768
内容类型学位论文
源URL[http://ir.ia.ac.cn/handle/173211/6758]  
专题毕业生_硕士学位论文
推荐引用方式
GB/T 7714
程盛远. 基于概念网络的信息检索研究与开发实践[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2004.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace