CORC  > 软件研究所  > 软件所图书馆  > 早期
题名矢量笔迹混排文本的分割与识别方法研究
作者张堃
学位类别硕士
答辩日期2008-06-02
授予单位中国科学院研究生院
授予地点中国科学院软件研究所
导师张习文
关键词矢量笔迹文本 笔迹分割 孤立单字识别 连续单字识别 可视化 人机交互
其他题名Research on Methods of Segmentation and Recognition toward Ink Mixed Text Document
学位专业计算机软件与理论
中文摘要矢量笔迹是通过数码笔等计算机笔输入设备采集的,由笔划组成。笔划包含时序采样点,采样点具有坐标、时间和压力等。中文矢量笔迹文本具有复杂的组成单字,例如类型多样、间距较小。结构化和符号化是智能处理中文矢量笔迹文本的基础,因而,本文针对分割和识别技术分别展开了深入研究,具体内容如下:(1)针对混排中文矢量笔迹文本中单字复杂性,提出了迭代提取方法;(2)针对分割结果中元素重叠性和降低用户查错负担,提出了自适应可视化,以及相应的交互校正方法;(3)针对混排文本整体识别问题,利用多种特征进行组合分类,对比了多种分类器,采用了基于支持向量机的分类方法,可以对包括汉字、英文单词、英文字母、数字和标点符号在内的语言详细类别进行自动判断;(4)针对孤立单字识别,通过构建汉字部首组成信息库,提出了基于组成和整体一致性原则的识别后处理方法;(5)基于词汇连续识别结果,通过机械字典构建了利用词库信息的连续识别后处理方法,并在此基础上实现了可视化表达和基于上下文的交互校正方法;(6)设计和开发了原型系统,对若干数据进行了深入测试和评估。
英文摘要Digital ink can be captured by computer input devices, such as Anoto pen and paper and Tablet PC. It consists of strokes. Sampling points in each stroke are ordered in their sampling times. A sampling point contains coordinates, sampling time, and pressure. The digital ink text in Chinese contains characters with complex structures, multiple languages, and smaller gaps. The digital ink text in Chinese needs structurization and symbolization for advanced utilization. Thus, the thesis focuses on segmentation and recognition of digital ink texts in Chinese, more details are as follows: 1. Ink characters are extracted with multiple steps from digital ink texts in Chinese since they are complex. The text can contain Chinese and English. 2. Components in segmented digital ink texts in Chinese are adaptively visualized because some of them are overlapped, and which can also reduce users’ correcting burdens. Wrongly extracted components are interactively corrected based on visualized results. 3. Ink characters are classified as detailed recognition types using a support vector machine, and many features are used. 4. Isolated ink characters are recognized based on their components and wholes. 5. Ink characters are continuously recognized based on words and word pairs, and then the recognized results are visualized. The wrongly recognized sentences, words and characters are corrected. 6. A software prototype is developed, and many digital ink texts in Chinese are segmented and recognized. The processed results are evaluated in detail.
公开日期2011-03-17
内容类型学位论文
源URL[http://124.16.136.157/handle/311060/6844]  
专题软件研究所_软件所图书馆_早期
推荐引用方式
GB/T 7714
张堃. 矢量笔迹混排文本的分割与识别方法研究[D]. 中国科学院软件研究所. 中国科学院研究生院. 2008.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace