Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval

doi:10.1145/3478642

CORC > 自动化研究所 > 中国科学院自动化研究所 > 模式识别国家重点实验室 > 多媒体计算与图形学团队

	Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval
	Zhang, Feifei 1,2,6; Xu, Mingliang 5; Xu, Changsheng 2,3,4
刊名	ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
	2022-05-01
卷号	18 期号:2 页码:23
关键词	Composing text and image to image retrieval end-to-end image generation generative adversarial network global-local
ISSN号	1551-6857
DOI	10.1145/3478642
通讯作者	Zhang, Feifei(feifeizhang1231@gmail.com)
英文摘要	Composing Text and Image to Image Retrieval (CTI-IR) is an emerging task in computer vision, which allows retrieving images relevant to a query image with text describing desired modifications to the query image. Most conventional cross-modal retrieval approaches usually take one modality data as the query to retrieve relevant data of another modality. Different from the existing methods, in this article, we propose an endto-end trainable network for simultaneous image generation and CTI-IR. The proposed model is based on Generative Adversarial Network (GAN) and enjoys several merits. First, it can learn a generative and discriminative feature for the query (a query image with text description) by jointly training a generative model and a retrieval model. Second, our model can automatically manipulate the visual features of the reference image in terms of the text description by the adversarial learning between the synthesized image and target image. Third, global-local collaborative discriminators and attention-based generators are exploited, allowing our approach to focus on both the global and local differences between the query image and the target image. As a result, the semantic consistency and fine-grained details of the generated images can be better enhanced in our model. The generated image can also be used to interpret and empower our retrieval model. Quantitative and qualitative evaluations on three benchmark datasets demonstrate that the proposed algorithm performs favorably against state-of-the-art methods.
资助项目	National Key Research and Development Program of China[2018AAA0100604] ; National Natural Science Foundation of China[62036012] ; National Natural Science Foundation of China[61720106006] ; National Natural Science Foundation of China[62002355] ; National Natural Science Foundation of China[61721004] ; National Natural Science Foundation of China[61832002] ; National Natural Science Foundation of China[62072455] ; National Natural Science Foundation of China[U1705262] ; National Natural Science Foundation of China[U1836220] ; Key Research Program of Frontier Sciences of CAS[QYZDJ-SSW-JSC039] ; National Postdoctoral Program for Innovative Talents[BX20190367] ; Beijing Natural Science Foundation[L201001]
WOS研究方向	Computer Science
语种	英语
出版者	ASSOC COMPUTING MACHINERY
WOS记录号	WOS:000773689400012
资助机构	National Key Research and Development Program of China ; National Natural Science Foundation of China ; Key Research Program of Frontier Sciences of CAS ; National Postdoctoral Program for Innovative Talents ; Beijing Natural Science Foundation
内容类型	期刊论文
源URL	[http://ir.ia.ac.cn/handle/173211/48169]
专题	自动化研究所_模式识别国家重点实验室_多媒体计算与图形学团队
通讯作者	Zhang, Feifei
作者单位	1.Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China 2.Chinese Acad Sci, Inst Automat, NLPR, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China 3.Peng Cheng Lab, 2 Xingke 1st St, Shenzhen 518000, Peoples R China 4.Univ Chinese Acad Sci, Sch Artificial Intelligence, 19 Yuquan Rd, Beijing 100049, Peoples R China 5.Zhengzhou Univ, Sch Informat Engn, 100 Sci Ave, Zhengzhou 450001, Henan, Peoples R China 6.Tianjin Univ Technol, Sch Comp Sci & Engn, 391 Bin Shui Xi Dao Rd, Tianjin 300384, Peoples R China
推荐引用方式 GB/T 7714	Zhang, Feifei,Xu, Mingliang,Xu, Changsheng. Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS,2022,18(2):23.
APA	Zhang, Feifei,Xu, Mingliang,&Xu, Changsheng.(2022).Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval.ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS,18(2),23.
MLA	Zhang, Feifei,et al."Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval".ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS 18.2(2022):23.