arXiv Analytics

Sign in

arXiv:1710.01292 [cs.CV]AbstractReferencesReviewsResources

Visual speech recognition: aligning terminologies for better understanding

Helen L Bear, Sarah Taylor

Published 2017-10-03Version 1

We are at an exciting time for machine lipreading. Traditional research stemmed from the adaptation of audio recognition systems. But now, the computer vision community is also participating. This joining of two previously disparate areas with different perspectives on computer lipreading is creating opportunities for collaborations, but in doing so the literature is experiencing challenges in knowledge sharing due to multiple uses of terms and phrases and the range of methods for scoring results. In particular we highlight three areas with the intention to improve communication between those researching lipreading; the effects of interchanging between speech reading and lipreading; speaker dependence across train, validation, and test splits; and the use of accuracy, correctness, errors, and varying units (phonemes, visemes, words, and sentences) to measure system performance. We make recommendations as to how we can be more consistent.

Journal: Helen L Bear and Sarah Taylor. Visual speech recognition: aligning terminologies for better understanding. British Machine Vision Conference (BMVC) Deep learning for machine lip reading workshop. 2017
Categories: cs.CV
Related articles: Most relevant | Search more
arXiv:1301.4558 [cs.CV] (Published 2013-01-19)
Lip Localization and Viseme Classification for Visual Speech Recognition
arXiv:2403.18843 [cs.CV] (Published 2024-03-04)
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition
arXiv:2303.17200 [cs.CV] (Published 2023-03-30)
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Xubo Liu et al.