自动获取HTML表格语义层次结构方法

CORC > 清华大学

	自动获取HTML表格语义层次结构方法
	范莉娅 ; 肖田元 ; FAN Liya ; XIAO Tianyuan
	2010-06-09 ; 2010-06-09
关键词	行标题表格列标题表格行列标题表格内容树 row-wise table column-wise table row-column-wise table content tree TP312.2
其他题名	Automatically extraction of semantic hierarchical structures from HTML tables
中文摘要	针对目前方法不能处理复杂表格或嵌套表格等缺点,提出了自动获取超文本标记语言(HTML)表格的语义层次结构的方法。该方法以表格的4种基本类型为基础,使用内容树表示表格的语义层次结构。方法主要包含3个步骤:识别HTML表格的属性单元格和值单元格;将表格拆分为基本表格;为拆分后的基本表格构造内容树,获取表格的语义层次结构。实验结果证明该方法能自动处理嵌套表格和复杂表格,复杂性不高,精度较好。; Existing approaches for extracting information from hyper text markup language (HTML) tables are incapable of processing complicated or nested tables.This paper presents an approach for extracting semantic hierarchical structures from complex HTML tables based on the four basic types of tables with a content tree used to depict the semantic hierarchical structure of the HTML table.The approach differentiates the attribute cells and value cells in the HTML table and divides the HTML table into basic tables to then construct content trees to extract the semantic hierarchical structure from the HTML table.Tests demonstrate that the approach can automatically analyze complex,nested tables with accurate results.; 国家“八六三”高技术项目(2004AA414020)
语种	中文 ; 中文
内容类型	期刊论文
源URL	[http://hdl.handle.net/123456789/57362]
专题	清华大学
推荐引用方式 GB/T 7714	范莉娅,肖田元,FAN Liya,等. 自动获取HTML表格语义层次结构方法[J],2010, 2010.
APA	范莉娅,肖田元,FAN Liya,&XIAO Tianyuan.(2010).自动获取HTML表格语义层次结构方法..
MLA	范莉娅,et al."自动获取HTML表格语义层次结构方法".(2010).

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们