arXiv Analytics

Sign in

arXiv:2003.08933 [cs.CV]AbstractReferencesReviewsResources

Depth Estimation by Learning Triangulation and Densification of Sparse Points for Multi-view Stereo

Ayan Sinha, Zak Murez, James Bartolozzi, Vijay Badrinarayanan, Andrew Rabinovich

Published 2020-03-19Version 1

Multi-view stereo (MVS) is the golden mean between the accuracy of active depth sensing and the practicality of monocular depth estimation. Cost volume based approaches employing 3D convolutional neural networks (CNNs) have considerably improved the accuracy of MVS systems. However, this accuracy comes at a high computational cost which impedes practical adoption. Distinct from cost volume approaches, we propose an efficient depth estimation approach by first (a) detecting and evaluating descriptors for interest points, then (b) learning to match and triangulate a small set of interest points, and finally (c) densifying this sparse set of 3D points using CNNs. An end-to-end network efficiently performs all three steps within a deep learning framework and trained with intermediate 2D image and 3D geometric supervision, along with depth supervision. Crucially, our first step complements pose estimation using interest point detection and descriptor learning. We demonstrate that state-of-the-art results on depth estimation with lower compute for different scene lengths. Furthermore, our method generalizes to newer environments and the descriptors output by our network compare favorably to strong baselines.

Related articles: Most relevant | Search more
arXiv:2410.11610 [cs.CV] (Published 2024-10-15)
Depth Estimation From Monocular Images With Enhanced Encoder-Decoder Architecture
arXiv:2205.14320 [cs.CV] (Published 2022-05-28)
RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo
arXiv:2401.11673 [cs.CV] (Published 2024-01-22)
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo