Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition

CORC > 自动化研究所 > 中国科学院自动化研究所 > 多模态人工智能系统全国重点实验室

	Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition
	Zitong, Yu 2; Benjia, Zhou 6; Jun, Wan 1; Pichao, Wang 3; Haoyu, Chen 2; Xin, Liu 4; Stan, Z., Li 5; Guoying, Zhao 2
刊名	IEEE Transactions on Image Processing
	2021
卷号	30 页码:5626-5640
英文摘要	Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are partially due to the fact that the existing manually designed network architectures have low efficiency in the joint learning of multi-modalities. In this paper, we propose the first neural architecture search (NAS)- based method for RGB-D gesture recognition. The proposed method includes two key components: 1) enhanced temporal representation via the proposed 3D Central Difference Convolution (3D-CDC) family, which is able to capture rich temporal context via aggregating temporal difference information; and 2) optimized backbones for multi-sampling-rate branches and lateral connections among varied modalities. The resultant multi-modal multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics. Comprehensive experiments are performed on three benchmark datasets (IsoGD, NvGesture, and EgoGesture), demonstrating the state-of-the-art performance in both single- and multi-modality settings. The code is available at https://github.com/ZitongYu/3DCDC-NAS.
语种	英语
内容类型	期刊论文
源URL	[http://ir.ia.ac.cn/handle/173211/57115]
专题	多模态人工智能系统全国重点实验室
通讯作者	Jun, Wan; Guoying, Zhao
作者单位	1.Institute of Automation, Chinese Academy of Sciences 2.University of Oulu 3.Alibaba Group 4.Tianjin University 5.Westlake University 6.Macau University of Science and Technology
推荐引用方式 GB/T 7714	Zitong, Yu,Benjia, Zhou,Jun, Wan,et al. Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition[J]. IEEE Transactions on Image Processing,2021,30:5626-5640.
APA	Zitong, Yu.,Benjia, Zhou.,Jun, Wan.,Pichao, Wang.,Haoyu, Chen.,...&Guoying, Zhao.(2021).Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition.IEEE Transactions on Image Processing,30,5626-5640.
MLA	Zitong, Yu,et al."Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition".IEEE Transactions on Image Processing 30(2021):5626-5640.