汉字识别后处理方法研究

CORC > 自动化研究所 > 中国科学院自动化研究所 > 毕业生 > 硕士学位论文

题名	汉字识别后处理方法研究
作者	刘端正
学位类别	工学硕士
答辩日期	1991-06-01
授予单位	中国科学院自动化研究所
授予地点	中国科学院自动化研究所
导师	戴汝为
关键词	汉字识别后处理匹配方法松驰方法模糊词法关系连接表模糊语义关系连接表句法词汇功能语法人工神经元网络联想记忆 Chinese Character Recognition (CCR) Post-processing Word matching method Relaxation method Fuzzy lexical connection table Fuzzy 语义分析
其他题名	Study on the Post-processing Methods of Chinese Character Recognition
学位专业	模式识别与智能系统
中文摘要	汉字识别是中文信息处理的一个重要环节，对于计算机在中国的应用与普及具有非常重要的意义。在汉字识别的研究过程中，人们越来越清楚的认识到，只利用单个汉字字符本身的信息，识别率已很难得到进一步的提高，而必须利用汉语高层次的信息，如词法、句法和语义信息，因此，作为这部分信息具体应用的汉字识别后处理过程，就显得更加重要了。一个完整的汉字识别系统主要包括三部分，即前处理、识别和后处理。汉字识别的这三个部分并不是截然分开的，在一些系统中，前处理与识别过程或识别过程和后处理已密切地结合在一起了。汉字识剐的后处理方法，从用户参予的程烹来分，可分为三类：手工处理、交互式处理和计算机自动处理。手工处理就是把识别后形成的文本文件送给一个标准的文本编辑软件，如Word star或PE，然后由用户逐个纠正误识的字，给出拒识的字。交互式处理就是把识别后形成的文本文件送给一个处理程序，该程序能够为每一个误识或拒识的汉字提供一些候选字，然后通过与用户的交互过程，完成对错误的纠正。计算机自动处理就是通过一个程序，自动地纠正识别后所形成的文本文件中的错误。从所应用的方法上分，后处理过程也可以分为三类：基于词汇信息的方法、基于句法分析和语义分析的方法和新近出现的人工神经元网络方法。本文在基于知识的模式识别和自然语言处理这两大背景下，从理论和实践两方面，第一次对汉字识别的后处理方法进行了系统的探讨，主要内容包括：①实现了一个基于综合匹配法的汉字识别后处理系统；②第一次将松驰方法用于汉字识别后处理，提出了基于非线性概率松驰过程的汉字识别后处理方法；③提出了句法信息与语义信息的一种表示方法一模糊词法关系连接表和模糊语义关系连接表，并描述了基于这种表示的汉字识别后处理方法；④提出了用词汇功能语法对汉字识别初级结果进行句法分析的基本思想；⑤从一些常用的人工神经元网络(ANN)模型入手，讨论了ANN的信息处理原理及其与传统方法的联系和区别；⑥给出了汉语词汇在ANN中的一种表示方法，并基于这种表示构造了一个综合利用监督学习和非监督学习的汉字识别后处理系统NETpocer。
英文摘要	Chinese Character Recognition (CCR) is an important part of Chinese information processing, it plays a significant role for the application and popularization of computer in China. As the progress of the study, people become more and more clear that the recognition rate can't improve much if we only use the character information itself We must use the high-love! information of Chinese, such as morphology, syntax and semantic information. As a result, the post-processing of CCR which make use of this high-- level information become more and more important. The post-processing method of CCR, look from the degree of the participate of the user, can be divided into three classes: user manual correction, interactive correction and computer automatic correction. User manual correction is a method for which the text file after recognition is processed by the user to correct the wrong recognized characters and give the unrecognized characters under some standard text editors, such as Word Star or PE. The interactive correction is a method for which the text file after recognition is transformed into a program that can offer some candidates for the incorrectly recognized character or unrecognized characters. The computer antomatic correction is a method for which the computer correct the mistakes in the text file after recognition automatically through a program. If we look from the method it use, the post-processing procedure can be divided into three classes: methods based on word information, methods based on syntax and set antic analyses and methods based on artificial neural networks (ANNs). On the background of knowledge-based pattern recognition and natural language processing, this paper make a systematically study on the post-processing method of CCR both from theory and practice. The main content include: (I) We have made a poss.-processing system of CCR based on synthetic word matching method. (2) We have first use the relaxation method for the post-processing of CCR, and put forward a post-processing method based on non-liner propabality relaxation process. (3) We have posed a method for representing the lexical and semantic information - fuzzy lexical connection table and fuzzy semantic connection table, and describe a post-processing method based on this representation. (4) We have put forward the basic idea of using the lexieal functional grammer for the syntax analysis of the initial recognition results. (5) We have discuss the connection and difference between the information processing method of ANN and the traditional method from some concrete ANN models. (6) We have advanced a representation method of the Chinese word in ANN, and accomplished a post-processing system of CCR: NETpocer bassed on this representation which use both supervised and unsupervised learning.
语种	中文
其他标识符	208
内容类型	学位论文
源URL	[http://ir.ia.ac.cn/handle/173211/6996]
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	刘端正. 汉字识别后处理方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所. 1991.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们