CORC  > 自动化研究所  > 中国科学院自动化研究所
Structure Preserving Convolutional Attention for Image Captioning
Lu, Shichen1,2,5; Hu, Ruimin1,2; Liu, Jing3; Guo, Longteng3; Zheng, Fei4
刊名APPLIED SCIENCES-BASEL
2019-07-02
卷号9期号:14页码:10
关键词image captioning attention spatial structure deep learning computer vision
DOI10.3390/app9142888
通讯作者Hu, Ruimin(hrm@whu.edu.cn)
英文摘要In the task of image captioning, learning the attentive image regions is necessary to adaptively and precisely focus on the object semantics relevant to each decoded word. In this paper, we propose a convolutional attention module that can preserve the spatial structure of the image by performing the convolution operation directly on the 2D feature maps. The proposed attention mechanism contains two components: convolutional spatial attention and cross-channel attention, aiming to determine the intended regions to describe the image along the spatial and channel dimensions, respectively. Both of the two attentions are calculated at each decoding step. In order to preserve the spatial structure, instead of operating on the vector representation of each image grid, the two attention components are both computed directly on the entire feature maps with convolution operations. Experiments on two large-scale datasets (MSCOCO and Flickr30K) demonstrate the outstanding performance of our proposed method.
资助项目National Nature Science Foundation of China[U1736206]
WOS研究方向Chemistry ; Materials Science ; Physics
语种英语
出版者MDPI
WOS记录号WOS:000479026900115
资助机构National Nature Science Foundation of China
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/27613]  
专题中国科学院自动化研究所
通讯作者Hu, Ruimin
作者单位1.Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp, Wuhan 430072, Hubei, Peoples R China
2.Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan 430072, Hubei, Peoples R China
3.Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China
4.China Gen Technol Res Inst, Beijing 100190, Peoples R China
5.Wuhan Univ, Informat Dept, Dormitory 8,Room 617, Wuhan 430072, Hubei, Peoples R China
推荐引用方式
GB/T 7714
Lu, Shichen,Hu, Ruimin,Liu, Jing,et al. Structure Preserving Convolutional Attention for Image Captioning[J]. APPLIED SCIENCES-BASEL,2019,9(14):10.
APA Lu, Shichen,Hu, Ruimin,Liu, Jing,Guo, Longteng,&Zheng, Fei.(2019).Structure Preserving Convolutional Attention for Image Captioning.APPLIED SCIENCES-BASEL,9(14),10.
MLA Lu, Shichen,et al."Structure Preserving Convolutional Attention for Image Captioning".APPLIED SCIENCES-BASEL 9.14(2019):10.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace