CORC  > 北京大学  > 信息科学技术学院
Learning element similarity matrix for semi-structured document analysis
Yang, Jianwu ; Cheung, William K. ; Chen, Xiaoou
刊名knowledge and information systems
2009
关键词Semi-structured document analysis Learning similarity matrix Similarity-based clustering Extended Vector Space Model INFORMATION XML MODEL
DOI10.1007/s10115-008-0138-2
英文摘要Capturing latent structural and semantic properties in semi-structured documents (e.g., XML documents) is crucial for improving the performance of related document analysis tasks. Structured Link Vector Mode (SLVM) is a representation recently proposed for modeling semi-structured documents. It uses an element similarity matrix to capture the latent relationships between XML elements-the constructing components of an XML document. In this paper, instead of applying heuristics to define the element similarity matrix, we propose to compute the matrix using the machine learning approach. In addition, we incorporate term semantics into SLVM using latent semantic indexing to enhance the model accuracy, with the element similarity learnability property preserved. For performance evaluation, we applied the similarity learning to k-nearest neighbors search and similarity-based clustering, and tested the performance using two different XML document collections. The SLVM obtained via learning was found to outperform significantly the conventional Vector Space Model and the edit-distance-based methods. Also, the similarity matrix, obtained as a by-product, can provide higher-level knowledge on the semantic relationships between the XML elements.; http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000264610200003&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=8e1609b174ce4e31116a60747a720701 ; Computer Science, Artificial Intelligence; Computer Science, Information Systems; SCI(E); 9; ARTICLE; 1; 53-78; 19
语种英语
内容类型期刊论文
源URL[http://ir.pku.edu.cn/handle/20.500.11897/161883]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Yang, Jianwu,Cheung, William K.,Chen, Xiaoou. Learning element similarity matrix for semi-structured document analysis[J]. knowledge and information systems,2009.
APA Yang, Jianwu,Cheung, William K.,&Chen, Xiaoou.(2009).Learning element similarity matrix for semi-structured document analysis.knowledge and information systems.
MLA Yang, Jianwu,et al."Learning element similarity matrix for semi-structured document analysis".knowledge and information systems (2009).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace