Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition | |
Zitong, Yu2; Benjia, Zhou6; Jun, Wan1; Pichao, Wang3; Haoyu, Chen2; Xin, Liu4; Stan, Z., Li5; Guoying, Zhao2 | |
刊名 | IEEE Transactions on Image Processing |
2021 | |
卷号 | 30页码:5626-5640 |
英文摘要 | Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are partially due to the fact that the existing manually designed network architectures have low efficiency in the joint learning of multi-modalities. In this paper, we propose the first neural architecture search (NAS)- based method for RGB-D gesture recognition. The proposed method includes two key components: 1) enhanced temporal representation via the proposed 3D Central Difference Convolution (3D-CDC) family, which is able to capture rich temporal context via aggregating temporal difference information; and 2) optimized backbones for multi-sampling-rate branches and lateral connections among varied modalities. The resultant multi-modal multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics. Comprehensive experiments are performed on three benchmark datasets (IsoGD, NvGesture, and EgoGesture), demonstrating the state-of-the-art performance in both single- and multi-modality settings. The code is available at https://github.com/ZitongYu/3DCDC-NAS. |
语种 | 英语 |
内容类型 | 期刊论文 |
源URL | [http://ir.ia.ac.cn/handle/173211/57115] |
专题 | 多模态人工智能系统全国重点实验室 |
通讯作者 | Jun, Wan; Guoying, Zhao |
作者单位 | 1.Institute of Automation, Chinese Academy of Sciences 2.University of Oulu 3.Alibaba Group 4.Tianjin University 5.Westlake University 6.Macau University of Science and Technology |
推荐引用方式 GB/T 7714 | Zitong, Yu,Benjia, Zhou,Jun, Wan,et al. Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition[J]. IEEE Transactions on Image Processing,2021,30:5626-5640. |
APA | Zitong, Yu.,Benjia, Zhou.,Jun, Wan.,Pichao, Wang.,Haoyu, Chen.,...&Guoying, Zhao.(2021).Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition.IEEE Transactions on Image Processing,30,5626-5640. |
MLA | Zitong, Yu,et al."Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition".IEEE Transactions on Image Processing 30(2021):5626-5640. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论