arXiv Analytics

Sign in

arXiv:2307.12591 [cs.CV]AbstractReferencesReviewsResources

SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation

Yiqing Wang, Zihan Li, Jieru Mei, Zihao Wei, Li Liu, Chen Wang, Shengtian Sang, Alan Yuille, Cihang Xie, Yuyin Zhou

Published 2023-07-24Version 1

Recent advancements in large-scale Vision Transformers have made significant strides in improving pre-trained models for medical image segmentation. However, these methods face a notable challenge in acquiring a substantial amount of pre-training data, particularly within the medical field. To address this limitation, we present Masked Multi-view with Swin Transformers (SwinMM), a novel multi-view pipeline for enabling accurate and data-efficient self-supervised medical image analysis. Our strategy harnesses the potential of multi-view information by incorporating two principal components. In the pre-training phase, we deploy a masked multi-view encoder devised to concurrently train masked multi-view observations through a range of diverse proxy tasks. These tasks span image reconstruction, rotation, contrastive learning, and a novel task that employs a mutual learning paradigm. This new task capitalizes on the consistency between predictions from various perspectives, enabling the extraction of hidden multi-view information from 3D medical data. In the fine-tuning stage, a cross-view decoder is developed to aggregate the multi-view information through a cross-attention block. Compared with the previous state-of-the-art self-supervised learning method Swin UNETR, SwinMM demonstrates a notable advantage on several medical image segmentation tasks. It allows for a smooth integration of multi-view information, significantly boosting both the accuracy and data-efficiency of the model. Code and models are available at https://github.com/UCSC-VLAA/SwinMM/.

Comments: MICCAI 2023; project page: https://github.com/UCSC-VLAA/SwinMM/
Categories: cs.CV
Related articles: Most relevant | Search more
arXiv:2307.12004 [cs.CV] (Published 2023-07-22)
COLosSAL: A Benchmark for Cold-start Active Learning for 3D Medical Image Segmentation
Han Liu et al.
arXiv:2302.05615 [cs.CV] (Published 2023-02-11)
Anatomical Invariance Modeling and Semantic Alignment for Self-supervised Learning in 3D Medical Image Segmentation
arXiv:2003.07923 [cs.CV] (Published 2020-03-17)
3D medical image segmentation with labeled and unlabeled data using autoencoders at the example of liver segmentation in CT images