基于自动编码器的中文词汇特征无监督学习

CORC > 厦门大学 > 信息技术－已发表论文

	基于自动编码器的中文词汇特征无监督学习; Unsupervised Feature Learning for Chinese Lexicon Based on Auto-Encoder
	张开旭 ; 周昌乐
	2013-09-15
关键词	无监督特征学习中文分词词性标注 unsupervised feature learning Chinese word segmentation part-of-speech tagging
英文摘要	大规模未标注语料中蕴含了丰富的词汇信息,有助于提高中文分词词性标注模型效果。该文从未标注语料中抽取词汇的分布信息,表示为高维向量,进一步使用自动编码器神经网络,无监督地学习对高维向量的编码算法,最终得到可直接用于分词词性标注模型的低维特征表示。在宾州中文树库5.0数据集上的实验表明,所得到的词汇特征对分词词性标注模型效果有较大帮助,在词性标注上优于主成分分析与k均值聚类结合的无监督特征学习方法。; Large-scale unlabeled data contains abundant lexical information for NLP tasks such as Chinese word segmentation and POS tagging.This work extracted high-dimensional distributional lexical information from a largescale unlabeled Chinese corpus.An auto-encoder then performed the unsupervised dimension reduction.The learned low-dimensional lexicon features were used as new lexical features for a joint Chinese word segmentation and POS tagging task.Experiments on the Chinese Treebank 5corpus showed that the additional lexicon features improve the performance and are better than those features learned by using the principal component analysis and the k-means algorithm.; 国家自然科学基金资助项目(61273338); 教育部高等学校博士学科点专项科研基金资助项目(新教师类)(20120121120046); 福建省自然科学基金资助项目(2010J01351); 中国博士后基金资助项目(2013M541861)
语种	zh_CN
内容类型	期刊论文
源URL	[http://dspace.xmu.edu.cn/handle/2288/123054]
专题	信息技术－已发表论文
推荐引用方式 GB/T 7714	张开旭,周昌乐. 基于自动编码器的中文词汇特征无监督学习, Unsupervised Feature Learning for Chinese Lexicon Based on Auto-Encoder[J],2013.
APA	张开旭,&周昌乐.(2013).基于自动编码器的中文词汇特征无监督学习..
MLA	张开旭,et al."基于自动编码器的中文词汇特征无监督学习".(2013).

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们