半结构化文档中非标记化表格的抽取

CORC > 清华大学

	半结构化文档中非标记化表格的抽取
	宋强 ; 徐鹏 ; 李涓子 ; SONG Qiang ; XU Peng ; LI Juanzi
	2010-06-09 ; 2010-06-09
关键词	非标记化表格信息抽取分层聚类 Untagged table Information extraction Hierarchical clustering TP311.11
其他题名	Untagged Table Extraction in Semi-structured Documents
中文摘要	对非标记化表格进行数据建模,利用非标记化表格在文档中的结构分布特征,给出了非标记化表格的抽取算法。对非标记化表格进行行列划分,然后进行标题归纳和单元格合并。实验结果表明,论文提出的算法的正确性令人满意。; Based on the data modeling of the untagged table,this paper proposes an extraction algorithm by using its structural distribution features in documents.It splits the untagged table into rows and columns,and then inducts headers and merges cells.Experimental results indicate that the accuracy of the algorithm is satisfactory.
语种	中文 ; 中文
内容类型	期刊论文
源URL	[http://hdl.handle.net/123456789/55086]
专题	清华大学
推荐引用方式 GB/T 7714	宋强,徐鹏,李涓子,等. 半结构化文档中非标记化表格的抽取[J],2010, 2010.
APA	宋强,徐鹏,李涓子,SONG Qiang,XU Peng,&LI Juanzi.(2010).半结构化文档中非标记化表格的抽取..
MLA	宋强,et al."半结构化文档中非标记化表格的抽取".(2010).

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

暂无评论

评注功能仅针对注册用户开放，请您登录

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接