Empirical Exploring Word-Character Relationship for Chinese Sentence Representation
Wang, Shaonan1; Zhang, Jiajun1; Zong, Chengqing2
刊名ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
2018-05-01
卷号17期号:3页码:14
关键词Sentence Representation Compositionmodel Inner-word Character Mixed Character-word Representation Mask Gate Max Pooling
DOI10.1145/3156778
文献子类Article
英文摘要This article addresses the problem of learning compositional Chinese sentence representations, which represent the meaning of a sentence by composing the meanings of its constituent words. In contrast to English, a Chinese word is composed of characters, which contain rich semantic information. However, this information has not been fully exploited by existing methods. In this work, we introduce a novel, mixed character-word architecture to improve the Chinese sentence representations by utilizing rich semantic information of inner-word characters. We propose two novel strategies to reach this purpose. The first one is to use a mask gate on characters, learning the relation among characters in a word. The second one is to use a max-pooling operation on words to adaptively find the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task. To further verify the generalization ability of our model, we employ the learned sentence representations as features in sentence classification task, question classification task, and sentence entailment task. Results have shown that the proposed mixed character-word sentence representation models outperform both the character-based and word-based models.
WOS关键词Sentence representation ; composition model ; inner-word character ; mixed character-word representation ; mask gate ; max pooling
WOS研究方向Computer Science
语种英语
WOS记录号WOS:000433090800001
资助机构Natural Science Foundation of China(61673380 ; 61403379)
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/20678]  
专题自动化研究所_模式识别国家重点实验室_自然语言处理团队
作者单位1.Univ Chinese Acad Sci, Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Intelligence Bldg,498 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Chinese Acad Sci, CAS Ctr Excellence Brain Sci & Intelligence Techn, Natl Lab Pattern Recognit,Inst Automat, Intelligence Bldg,498 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Wang, Shaonan,Zhang, Jiajun,Zong, Chengqing. Empirical Exploring Word-Character Relationship for Chinese Sentence Representation[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,2018,17(3):14.
APA Wang, Shaonan,Zhang, Jiajun,&Zong, Chengqing.(2018).Empirical Exploring Word-Character Relationship for Chinese Sentence Representation.ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,17(3),14.
MLA Wang, Shaonan,et al."Empirical Exploring Word-Character Relationship for Chinese Sentence Representation".ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING 17.3(2018):14.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace