A WebPage Content Block Detection Method Based on Layout Features and Languages Features
Han Xianpei; Liu Kang; Zhao Jun
刊名Chinese Journal of Computers
2008
期号22页码:15-21
关键词Web-page Cleaning
英文摘要This paper analyzed the different feature types of web-page blocks, and presented a Web-page content block detection method based on layout features and language features, which effectively resolved the seesaw problem between detection accuracy and model generality across different types of web-pages. The method used the vision-block tree to represent web-page, built two individual classifiers respectively for web-page’s layout features and language features, and used different strategies to combine these two classifiers. The experimental results show that, with holding the content block detection recall higher than 90%, thecombined classifiers’ accuracy can reach 85 percents, 5 percents higher than the classifier using only the layout features, and 15 percents higher than the classifier using only the language features; and the experimental results also show that the combined classifiers obtained good detection performance over five selected websites which means that it have good generality.
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/40979]  
专题模式识别国家重点实验室_自然语言处理
推荐引用方式
GB/T 7714
Han Xianpei,Liu Kang,Zhao Jun. A WebPage Content Block Detection Method Based on Layout Features and Languages Features[J]. Chinese Journal of Computers,2008(22):15-21.
APA Han Xianpei,Liu Kang,&Zhao Jun.(2008).A WebPage Content Block Detection Method Based on Layout Features and Languages Features.Chinese Journal of Computers(22),15-21.
MLA Han Xianpei,et al."A WebPage Content Block Detection Method Based on Layout Features and Languages Features".Chinese Journal of Computers .22(2008):15-21.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace