CORC  > 北京大学  > 信息科学技术学院
Pre-trained multi-view Word embedding using two-side neural network
Luo, Yong ; Tang, Jian ; Yan, Jun ; Xu, Chao ; Chen, Zheng
2014
英文摘要Word embedding aims to learn a continuous representation for each word. It attracts increasing attention due to its effectiveness in various tasks such as named entity recognition and language modeling. Most existing word embedding results are generally trained on one individual data source such as news pages or Wikipedia articles. However, when we apply them to other tasks such as web search, the performance suffers. To obtain a robust word embedding for different applications, multiple data sources could be leveraged. In this paper, we proposed a two-side multimodal neural network to learn a robust word embedding from multiple data sources including free text, user search queries and search click-through data. This framework takes the word embeddings learned from different data sources as pre-train, and then uses a two-side neural network to unify these embeddings. The pre-trained embeddings are obtained by adapting the recently proposed CBOW algorithm. Since the proposed neural network does not need to re-train word embeddings for a new task, it is highly scalable in real world problem solving. Besides, the network allows weighting different sources differently when applied to different application tasks. Experiments on two real-world applications including web search ranking and word similarity measuring show that our neural network with multiple sources outperforms state-of-the-art word embedding algorithm with each individual source. It also outperforms other competitive baselines using multiple sources. Copyright ? 2014, Association for the Advancement of Artificial Intelligence.; EI; 0
语种英语
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/295556]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Luo, Yong,Tang, Jian,Yan, Jun,et al. Pre-trained multi-view Word embedding using two-side neural network. 2014-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace