{ "id": "2203.04895", "version": "v1", "published": "2022-03-09T17:20:18.000Z", "updated": "2022-03-09T17:20:18.000Z", "title": "Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction", "authors": [ "Xiaoqi Zhao", "Youwei Pang", "Lihe Zhang", "Huchuan Lu" ], "comment": "Manuscript Version", "categories": [ "cs.CV" ], "abstract": "Benefiting from color independence, illumination invariance and location discrimination attributed by the depth map, it can provide important supplemental information for extracting salient objects in complex environments. However, high-quality depth sensors are expensive and can not be widely applied. While general depth sensors produce the noisy and sparse depth information, which brings the depth-based networks with irreversible interference. In this paper, we propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD). Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks. In this way, the depth information can be completed and purified. Moreover, we introduce a multi-modal filtered transformer (MFT) module, which equips with three modality-specific filters to generate the transformer-enhanced feature for each modality. The proposed model works in a depth-free style during the testing phase. Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time. And, the resulted depth map can help existing RGB-D SOD methods obtain significant performance gain.", "revisions": [ { "version": "v1", "updated": "2022-03-09T17:20:18.000Z" } ], "analyses": { "keywords": [ "depth estimation", "contour extraction", "depth map", "joint learning", "multi-modal filtered transformer" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }