DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation | |
Zhenyu Li2 | |
刊名 | Machine Intelligence Research |
2023 | |
卷号 | 20期号:6页码:837-854 |
关键词 | Autonomous driving, 3D reconstruction, monocular depth estimation, Transformer, convolution |
ISSN号 | 2731-538X |
DOI | 10.1007/s11633-023-1458-0 |
英文摘要 | This paper aims to address the problem of supervised monocular depth estimation. We start with a meticulous pilot study to demonstrate that the long-range correlation is essential for accurate depth estimation. Moreover, the Transformer and convolution are good at long-range and close-range depth estimation, respectively. Therefore, we propose to adopt a parallel encoder architecture consisting of a Transformer branch and a convolution branch. The former can model global context with the effective attention mechanism and the latter aims to preserve the local information as the Transformer lacks the spatial inductive bias in modeling such contents. However, independent branches lead to a shortage of connections between features. To bridge this gap, we design a hierarchical aggregation and heterogeneous interaction module to enhance the Transformer features and model the affinity between the heterogeneous features in a set-to-set translation manner. Due to the unbearable memory cost introduced by the global attention on high-resolution feature maps, we adopt the deformable scheme to reduce the complexity. Extensive experiments on the KITTI, NYU, and SUN RGB-D datasets demonstrate that our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins. The effectiveness of each proposed module is elaborately evaluated through meticulous and intensive ablation studies. |
内容类型 | 期刊论文 |
源URL | [http://ir.ia.ac.cn/handle/173211/54170] |
专题 | 自动化研究所_学术期刊_International Journal of Automation and Computing |
作者单位 | 1.Department of Automation, University of Science and Technology of China, Hefei 230026, China 2.Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China |
推荐引用方式 GB/T 7714 | Zhenyu Li. DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation[J]. Machine Intelligence Research,2023,20(6):837-854. |
APA | Zhenyu Li.(2023).DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation.Machine Intelligence Research,20(6),837-854. |
MLA | Zhenyu Li."DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation".Machine Intelligence Research 20.6(2023):837-854. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论