题名化工大型工具书计算机辅助检索集成系统
作者王新宇
学位类别硕士
答辩日期1993-06-30
授予单位中国科学院研究生院
授予地点北京
导师杨章运
学位专业化学工艺
中文摘要随着计算机科学技术的不断发展,计算机处理汉字信息的能力正在逐步提高。化学、化工学科信息庞大,借助计算机技术进行化学、化工领域的信息检索尤为重要。本文以《化工百科全书》和《化工辞典》为对象,将计算机技术应用于化工大型工具书的信息处理,提出了一种建立化工大型工具书计算机辅助检索集成系统的方法,发展了百科全书类全文数据库系统软件,为建立全文数据库在线检索系统奠定了基础。科技工作者往往需借助于索引进行手检以从正文中获得所需信息,然而国内出版的读物大都缺少索引。以前采用的人工标引具有工作量大、效率低下、质量因人而异、标引结果难以规范化等缺点,解决这些问题的希望在于发展计算机辅助自动标引技术。《化工百科全书》等大型工具书的特点是信息量庞大而且涉及知识面很广。针对这一特点,通过研究信息间的横向交叉关系,寻求其内在规律,建立计算机辅助检索系统,以满足读者多方位、多层次的检索要求。本文提出了计算机辅助单汉字部分标引快速倒排全文检索方法,建立了一、二两级索引机制。本系统能够在全文范围内检索一级关键词并抽提上下文相关内容,实现了一定程度的智能化。一级关键词索引提供纵向信息,二级说明子句索引提供复杂的横向交叉信息。应用模糊数学的概念,建立了模糊规则库和推理机制,用于编制二级说明子句索引。建立模糊规则库和推理机制的原人有较强的普遍适用性,可适用于其它学科的信息检索。系统同时提供了一个辅助编辑、选择、构造二级说明子句索引的集成编辑器,帮助用户选择、构造二级说明子句。系统具有对上下文内容进行精简筛选的能力,能够将与关键词关联度小的上下文内容去除,充分发挥了计算机的功效。系统采用预处理半自动化方式确定关键词是在正文中的页码,将顺序检索与倒排检索相结合,系统的时间和空间复杂度较低、运行效率较高,提高了标引质量、查全率和查准率。
英文摘要In this thesis, a software is developed aimed at indexing the Encyclopedia of Chemical Industry and the Dictionary of Chemical Industry and other large scale reference books. Most of the books published in CHinese so far are lack of indices because of the special and complex features in Chinese compared with Latin languages. As a large-scale reference book, the Encyclopedia of Chemical Industry covers a great deal of information involving wide scientific fields. A computer aided retrieval system is needed to fit the user's multi-level and multi-direction retrieval requirements by studying the horizontal and cross relationships among the information and looking for its internal regularities. In this thesis, a method of creating an inverted index file through computer aided indexing on single Chinese character is proposed to realize the fast retrieval to the full text. The first-level keywords can be retrieved from the full text of he book, and the corresponding contents can be abstracted. The algorithm has the intelligent ability, with which can eliminate the false information from the context concerned with the keywords. According to the concept of fuzzy logic, fuzzy inference on the basis of fuzzy rules is introduced into the system. For easy to use, an user-friendly editor with the integration of editing, screening and establishing the second-level explanatory sentences is presented. A method of pre-processing the text file is suggested for page number determination to build up the indices. WIth the combination of sequenced and inverted retrieval methods, the compliancy in both time and space domains becomes easier, the indexing quality, recall ratio and the precision ratio are improved. The methodology of computer aided information processing developed in this thesis are not confined to the chemistry and chemical industry only, it can also be extended to other fields. Furthermore, the retrieval algorithm lays a foundation for building online full text database.
语种中文
公开日期2013-10-31
页码71
内容类型学位论文
源URL[http://ir.ipe.ac.cn/handle/122111/4656]  
专题过程工程研究所_研究所(批量导入)
推荐引用方式
GB/T 7714
王新宇. 化工大型工具书计算机辅助检索集成系统[D]. 北京. 中国科学院研究生院. 1993.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace