CORC  > 北京大学  > 信息科学技术学院
Mathematical formula identification and performance evaluation in PDF documents
Lin, Xiaoyan ; Gao, Liangcai ; Tang, Zhi ; Baker, Josef ; Sorge, Volker
刊名international journal on document analysis and recognition
2014
关键词Mathematical formula identification Machine learning PDF documents Performance evaluation EXPRESSIONS RECOGNITION
DOI10.1007/s10032-013-0216-1
英文摘要An important initial step of mathematical formula recognition is to correctly identify the location of formulae within documents. Previous work in this area has traditionally focused on image-based documents; however, given the prevalence and popularity of the PDF format for dissemination, alternatives to image-based approaches are increasingly being explored. In this paper, we investigate the use of both machine learning techniques and heuristic rules to locate the boundaries of both isolated and embedded formulae within documents, based upon data extracted directly from PDF files. We propose four new features along with preprocessing and post-processing techniques for isolated formula identification. Furthermore, we compare, analyse and extensively tune nine state-of-the-art learning algorithms for a comprehensive evaluation of our proposed methods. The evaluation is carried out over a ground-truth dataset, which we have made publicly available, together with an application adaptable fine-grained evaluation metric. Our experimental results demonstrate that the overall accuracies of isolated and embedded formula identification are increased by 11.52 and 10.65 %, compared with our previously proposed formula identification approach.; http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000340610000003&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=8e1609b174ce4e31116a60747a720701 ; Computer Science, Artificial Intelligence; SCI(E); EI; 2; ARTICLE; linxiaoyan@pku.edu.cn; glc@pku.edu.cn; tangzhi@pku.edu.cn; J.Baker@cs.bham.ac.uk; V.Sorge@cs.bham.ac.uk; 3; 239-255; 17
语种英语
内容类型期刊论文
源URL[http://ir.pku.edu.cn/handle/20.500.11897/161737]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Lin, Xiaoyan,Gao, Liangcai,Tang, Zhi,et al. Mathematical formula identification and performance evaluation in PDF documents[J]. international journal on document analysis and recognition,2014.
APA Lin, Xiaoyan,Gao, Liangcai,Tang, Zhi,Baker, Josef,&Sorge, Volker.(2014).Mathematical formula identification and performance evaluation in PDF documents.international journal on document analysis and recognition.
MLA Lin, Xiaoyan,et al."Mathematical formula identification and performance evaluation in PDF documents".international journal on document analysis and recognition (2014).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace