arXiv Analytics

Sign in

arXiv:2104.02972 [cs.CV]AbstractReferencesReviewsResources

Self-supervised Learning of Depth Inference for Multi-view Stereo

Jiayu Yang, Jose M. Alvarez, Miaomiao Liu

Published 2021-04-07Version 1

Recent supervised multi-view depth estimation networks have achieved promising results. Similar to all supervised approaches, these networks require ground-truth data during training. However, collecting a large amount of multi-view depth data is very challenging. Here, we propose a self-supervised learning framework for multi-view stereo that exploit pseudo labels from the input data. We start by learning to estimate depth maps as initial pseudo labels under an unsupervised learning framework relying on image reconstruction loss as supervision. We then refine the initial pseudo labels using a carefully designed pipeline leveraging depth information inferred from higher resolution images and neighboring views. We use these high-quality pseudo labels as the supervision signal to train the network and improve, iteratively, its performance by self-training. Extensive experiments on the DTU dataset show that our proposed self-supervised learning framework outperforms existing unsupervised multi-view stereo networks by a large margin and performs on par compared to the supervised counterpart. Code is available at https://github.com/JiayuYANG/Self-supervised-CVP-MVSNet.

Related articles: Most relevant | Search more
arXiv:2101.05456 [cs.CV] (Published 2021-01-14)
Self-Supervised Learning for Segmentation
arXiv:2105.04553 [cs.CV] (Published 2021-05-10)
Self-Supervised Learning with Swin Transformers
arXiv:2104.09866 [cs.CV] (Published 2021-04-20)
Distill on the Go: Online knowledge distillation in self-supervised learning