arXiv:2406.13002 Abstract | arXiv Analytics

arXiv:2406.13002 [cs.CV]Abstract References Reviews Resources

Recurrence over Video Frames (RoVF) for the Re-identification of Meerkats

Mitchell Rogers, Kobe Knowles, Gaël Gendron, Shahrokh Heidari, David Arturo Soriano Valdez, Mihailo Azhar, Padriac O'Leary, Simon Eyre, Michael Witbrock, Patrice Delmas

Published 2024-06-18Version 1

Deep learning approaches for animal re-identification have had a major impact on conservation, significantly reducing the time required for many downstream tasks, such as well-being monitoring. We propose a method called Recurrence over Video Frames (RoVF), which uses a recurrent head based on the Perceiver architecture to iteratively construct an embedding from a video clip. RoVF is trained using triplet loss based on the co-occurrence of individuals in the video frames, where the individual IDs are unavailable. We tested this method and various models based on the DINOv2 transformer architecture on a dataset of meerkats collected at the Wellington Zoo. Our method achieves a top-1 re-identification accuracy of $49\%$, which is higher than that of the best DINOv2 model ($42\%$). We found that the model can match observations of individuals where humans cannot, and our model (RoVF) performs better than the comparisons with minimal fine-tuning. In future work, we plan to improve these models by using pre-text tasks, apply them to animal behaviour classification, and perform a hyperparameter search to optimise the models further.

Comments: Presented as a poster at the CV4Animals Workshop, CVPR 2024

Categories: cs.CV

Keywords: video frames, re-identification, recurrence, best dinov2 model, dinov2 transformer architecture

Related articles: Most relevant | Search more

arXiv:1707.07150 [cs.CV] (Published 2017-07-22)

Multi-Oriented Text Detection and Verification in Video Frames and Scene Images

Aneeshan Sain, Ayan Kumar Bhunia, Partha Pratim Roy, Umapada Pal

arXiv:2211.12627 [cs.CV] (Published 2022-11-22)

$β$-Multivariational Autoencoder for Entangled Representation Learning in Video Frames

Fatemeh Nouri, Robert Bergevin

arXiv:2305.01443 [cs.CV] (Published 2023-05-02)

Scalable Mask Annotation for Video Text Spotting