arXiv Analytics

Sign in

arXiv:2403.08556 [cs.CV]AbstractReferencesReviewsResources

SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model

Yihao Liu, Feng Xue, Anlong Ming, Mingshuai Zhao, Huadong Ma, Nicu Sebe

Published 2024-03-13, updated 2024-08-15Version 2

In the last year, universal monocular metric depth estimation (universal MMDE) has gained considerable attention, serving as the foundation model for various multimedia tasks, such as video and image editing. Nonetheless, current approaches face challenges in maintaining consistent accuracy across diverse scenes without scene-specific parameters and pre-training, hindering the practicality of MMDE. Furthermore, these methods rely on extensive datasets comprising millions, if not tens of millions, of data for training, leading to significant time and hardware expenses. This paper presents SM$^4$Depth, a model that seamlessly works for both indoor and outdoor scenes, without needing extensive training data and GPU clusters. Firstly, to obtain consistent depth across diverse scenes, we propose a novel metric scale modeling, i.e., variation-based unnormalized depth bins. It reduces the ambiguity of the conventional metric bins and enables better adaptation to large depth gaps of scenes during training. Secondly, we propose a "divide and conquer" solution to reduce reliance on massive training data. Instead of estimating directly from the vast solution space, the metric bins are estimated from multiple solution sub-spaces to reduce complexity. Additionally, we introduce an uncut depth dataset, BUPT Depth, to evaluate the depth accuracy and consistency across various indoor and outdoor scenes. Trained on a consumer-grade GPU using just 150K RGB-D pairs, SM$^4$Depth achieves outstanding performance on the most never-before-seen datasets, especially maintaining consistent accuracy across indoors and outdoors. The code can be found https://github.com/mRobotit/SM4Depth.

Comments: Accepted by ACM MultiMedia 24, Project Page: xuefeng-cvr.github.io/SM4Depth
Categories: cs.CV, cs.AI
Related articles: Most relevant | Search more
arXiv:2403.18913 [cs.CV] (Published 2024-03-27)
UniDepth: Universal Monocular Metric Depth Estimation
arXiv:2407.14620 [cs.CV] (Published 2024-07-19)
The Research of Group Re-identification from Multiple Cameras
arXiv:2410.18511 [cs.CV] (Published 2024-10-24)
A Note on Geometric Calibration of Multiple Cameras and Projectors