CORC  > 厦门大学  > 信息技术-已发表论文
基于词语热度的启发式中文句子压缩算法; Heuristic Chinese sentence compression algorithm based on hot word
韩静 ; 张东站
2014-02-15
关键词中文句子压缩 热词 语言学 句法分析树 Chinese sentence compression hot word linguistic parse tree
英文摘要传统的句子压缩方法多基于难以获得的“原句-压缩句“对齐语料库,因此提出了不依赖于对齐语料库的中文句子压缩算法。通过研究人工压缩结果并结合语言学知识,提出了词语层面和分句层面的两组压缩规则。算法在原句句法分析树和词语间依赖关系的基础上,使用两组规则进行压缩,同时为了保证压缩算法具有更强的适应性和准确性,引入词语的热度加强了压缩算法,最后通过句子整理和语法修复得到最终的压缩句。对比了人工压缩、只使用规则压缩和引入词语热度压缩三种压缩方法。实验结果表明,基于热度的启发式中文句子压缩算法可以在压缩比、语法性、信息量都损失较少的情况下,提高压缩句的热度。; Since the parallel sentence/compression corpora which most of the traditional methods based on are not easy to obtain, a linguistically-motivated heuristics Chinese sentence compression algorithm is proposed after studying traditional methods.By analyzing the human-produced compression and linguistic knowledge, two sets of rules are proposed, one is in word layer and the other is in clause layer.Two sets of rules based on the parse tree and the words dependence are used to compress sentence, and enhance the algorithm by hot word in order to keep the algorithm flexibility and accuracy.In the last step the compression result is cleaned and repaired.Human-produced compression, rule-only algorithm and hot word enhanced algorithm are compared then the results are evaluated in compression rate, grammaticality, informativeness and heat.The experimental results show that heuristic Chinese sentence compression algorithm based on hot word can improve the heat of compression results without much loss in compression rate, grammaticality and informativeness.; 国家自然科学基金(No.50604012)
语种zh_CN
内容类型期刊论文
源URL[http://dspace.xmu.edu.cn/handle/2288/123134]  
专题信息技术-已发表论文
推荐引用方式
GB/T 7714
韩静,张东站. 基于词语热度的启发式中文句子压缩算法, Heuristic Chinese sentence compression algorithm based on hot word[J],2014.
APA 韩静,&张东站.(2014).基于词语热度的启发式中文句子压缩算法..
MLA 韩静,et al."基于词语热度的启发式中文句子压缩算法".(2014).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace