视频文字信息抽取技术研究

CORC > 自动化研究所 > 中国科学院自动化研究所 > 毕业生 > 博士学位论文

题名	视频文字信息抽取技术研究
作者	杨武夷
学位类别	工学博士
答辩日期	2009-05-31
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	张树武
关键词	文字定位字符图像二值化字符切分字符识别集成型字符切分与识别 Text localization text image binarization character segmentation character recognition integrated segmentation and recognition
其他题名	Text Extraction in Video
学位专业	模式识别与智能系统
中文摘要	视频文字直接承载了高层语义信息，因此，如果能够有效地提取视频中的文字信息，对高速增长的视频内容的高效检索、理解和复用将具有重要的作用。传统的OCR技术不能完全解决视频中的文字信息提取问题，特别是复杂背景中的文字。因此，需要从理论和技术上提供有效的解决方案。视频文字信息抽取的技术难点主要来源于5个方面：⑴复杂背景中的文字定位以及退化文字的定位问题；⑵种类繁多的字符图像二值化问题；⑶复杂背景中的字符切分问题；⑷粘连字符的切分问题；⑸退化字符的识别问题。本文围绕着视频文字信息抽取这一领域，针对其中的若干问题展开了研究工作。本文的主要贡献包括： 1．为解决各种类型的字符图像二值化问题，提出了一种多二值图像融合的字符图像二值化算法。算法首先从不同的角度提取图像的信息，得到不同的二值图像，然后把这些二值图像融合，得到最终的二值图像。对比其他字符图像二值化算法，该多二值图像融合的算法能大大地提高字符识别系统的性能。 2．为解决字符的切分问题，分析了字符图像的特点及字符切分的难点，提出了一种基于启发和识别的字符切分算法。算法能同时对粘连字符及复杂背景中的字符进行较准确的切分，同时能去除切分单元中的“噪声”成分，克服了启发式字符切分算法的某些缺陷。 3．为解决退化字符的识别问题，提出了一种基于融合图像的字符识别及基于语言模型的后处理算法。对比字符的二值图像和灰度图像，融合图像既能保留有用的字符笔画灰度信息，同时能去除无用的背景信息，提高了字符识别系统的性能。该字符识别算法同时能较准确地给出识别结果的置信度，结合基于词的二元及三元统计语言模型，利用连续多个字符切分单元的上下文信息，进一步提高了字符识别率。 4．提出了一种集成型的字符切分与识别算法。串行的字符切分与识别方法没有形成有效的反馈，字符切分过程无法利用识别的信息，导致一些复杂情况的字符图像不能得到准确的切分与识别。为了克服串行方法的缺点，该集成型算法基于图像分析或字符识别对字符二值图像中的宽连通域进行切分，基于字符识别组合连通域得到候选识别结果，基于语言模型选出字符识别结果。对比串行的方法，该集成型算法能更准确地识别粘连字符及复杂背景中的字符。 5．提出了一种图像中的文字定位以及视频中的文字提取算法。图像中的文字定位算法首先基于字符笔画的双边缘模型得到候选文字区域，然后对候选文字区域进行分解得到精确定位的文本块，最后基于启发和字符识别对文本块进行验证。视频中的文字提取算法每隔若干视频帧取一帧进行基于图像的文字定位得到文字对象，然后在视频帧序列中对文字对象进行向前和向后的跟踪，最后对文字对象进行识别得到文字提取结果。本文针对视频文字提取的技术难点，特别是退化字符及复杂背景中字符的切分与识别问题，提出了一些解决方案，取得了一些研究进展。
英文摘要	Text in videos is one powerful source of high-level semantics. If the text could be detected, segmented, and recognized automatically, they would be a valuable source of high-level semantics for indexing and retrieving the explosively increasing digital videos. The traditional character extraction methods were specially developed for the scanned images and they cannot effectively extract the text in videos, especially the text with complex backgrounds. Therefore, it is necessary to develop new methods. There are five challenges to extract text in videos: ⑴ how to localize the text which may be with complex backgrounds; ⑵ how to binarize different kinds of text images; ⑶ how to segment the characters with complex backgrounds; ⑷ how to segment the merged characters; ⑸ how to recognize the degraded characters. To solve these problems, this dissertation involves with the following aspects: 1. A novel method to binarize different kinds of text images is proposed. The method is based on fusing several binary images. First, the locally adaptive seed-fill method, the locally adaptive thresholding method and the stroke-model-based method are respectively used to get three binary images. Then, the final binary image is gotten by fusing these three binary images. Compared with other methods, the proposed method can greatly improve the character recognition accuracy. 2. For character segmentation, the characteristics of the text image are analyzed and a novel heuristic method based on character recognition is proposed. The proposed method, which can not only segment the merged characters or the characters with complex backgrounds but also remove the “noise” components in the segments, overcomes the drawbacks of the heuristic method. 3. To precisely recognize the degraded characters, a novel character recognition method based on fusing image is proposed. Compared with the binary image or gray image, the fusing image can not only preserve the useful information of character strokes, but also remove the noisy information of complex backgrounds. The proposed method first fuses the binary image and gray image of the character. Then, based on the fusing image, the character recognition engine gives several candidates. The post-processing approach based on statistical language model is proposed to select out the best character sequence of the text image. The proposed method can greatly improve the character recognition accuracy. 4. To overcome the drawbacks of se...
语种	中文
其他标识符	200618014628045
内容类型	学位论文
源URL	[http://ir.ia.ac.cn/handle/173211/6203]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	杨武夷. 视频文字信息抽取技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2009.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们