CORC  > 北京大学  > 计算机科学技术研究所
Analysis of book documents' table of content based on clustering
Gao, Liangcai ; Tang, Zhi ; Lin, Xiaofan ; Tao, Xin ; Chu, Yimin
2009
英文摘要Table of contents (TOC) recognition has attracted a great deal of attention in recent years. After reviewing the merits and drawbacks of the existing TOC recognition methods, we have observed that book documents are multi-page documents with intrinsic local format consistency. Based on this finding we introduce an automatic TOC analysis method through clustering. This method first detects the decorative elements in TOC pages. Then it learns a layout model used in the TOC pages through clustering. Finally, it generates TOC entries and extracts their hierarchical structure under the guidance of the model. More specifically, broken lines are taken into account in the method. Experimental results show that this method achieves high accuracy and efficiency. In addition, this method has been successfully applied in a commercial E-book production software package. ? 2009 IEEE.; EI; 0
语种英语
DOI标识10.1109/ICDAR.2009.143
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/162034]  
专题计算机科学技术研究所
推荐引用方式
GB/T 7714
Gao, Liangcai,Tang, Zhi,Lin, Xiaofan,et al. Analysis of book documents' table of content based on clustering. 2009-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace