Unsupervised and Pseudo-Supervised Vision-Language Alignment in Visual Dialog | |
Feilong Chen1,2; Duzhen Zhang2; Xiuyi Chen2; Jing Shi2; Shang Xu2; Bo Xu2 | |
2022 | |
会议日期 | October 10–14, 2022 |
会议地点 | Lisboa, Portugal |
英文摘要 | Visual dialog requires models to give reasonable answers accordingto a series of coherent questions and related visual concepts inimages. However, most current work either focuses on attentionbased fusion or pre-training on large-scale image-text pairs, ignoring the critical role of explicit vision-language alignment in visualdialog. To remedy this defect, we propose a novel unsupervisedand pseudo-supervised vision-language alignment approach forvisual dialog (AlignVD). Firstly, AlginVD utilizes the visual anddialog encoder to represent images and dialogs. Then, it explicitlyaligns visual concepts with textual semantics via unsupervised andpseudo-supervised vision-language alignment (UVLA and PVLA)Specifically, UVLA utilizes a graph autoencoder, while PVLA usesdialog-guided visual grounding to conduct alignment. Finally, basedon the aligned visual and textual representations, AlignVD givesa reasonable answer to the question via the cross-modal decoderExtensive experiments on two large-scale visual dialog datasetshave demonstrated the effectiveness of vision-language alignmentand our proposed AlignVD achieves new state-of-the-art results. Inaddition, our single model has won first place on the visual dialogchallenge leaderboard with a NDCG metric of 78.70, surpassing theprevious best ensemble model by about 1 point. |
内容类型 | 会议论文 |
源URL | [http://ir.ia.ac.cn/handle/173211/51892] |
专题 | 数字内容技术与服务研究中心_听觉模型与认知计算 |
通讯作者 | Xiuyi Chen |
作者单位 | 1.School of Future Technology, University of CAS 2.Institute of Automation, Chinese Academy of Sciences (CAS) |
推荐引用方式 GB/T 7714 | Feilong Chen,Duzhen Zhang,Xiuyi Chen,et al. Unsupervised and Pseudo-Supervised Vision-Language Alignment in Visual Dialog[C]. 见:. Lisboa, Portugal. October 10–14, 2022. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论