arXiv:2306.07650 Abstract | arXiv Analytics

arXiv:2306.07650 [cs.CL]Abstract References Reviews Resources

Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

Yuchen Han, Chen Xu, Tong Xiao, Jingbo Zhu

Published 2023-06-13Version 1

Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace "modality gap" between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On the other hand, we find that there has another gap, which we call the "capacity gap": high resource tasks (such as ASR and MT) always require a large model to fit, when the model is reused for a low resource task (E2E ST), it will get a sub-optimal performance due to the over-fitting. In a case study, we find that the regularization plays a more important role than the well-designed modality adaption method, which achieves 29.0 for en-de and 40.3 for en-fr on the MuST-C dataset. Code and models are available at https://github.com/hannlp/TAB.

Comments: ACL 2023 Main Conference

Categories: cs.CL, cs.SD, eess.AS

Keywords: end-to-end speech translation, case study, regularization, e2e st, well-designed modality adaption method

Tags: conference paper

Related articles: Most relevant | Search more

arXiv:1405.3282 [cs.CL] (Published 2014-05-13)

How to Ask for a Favor: A Case Study on the Success of Altruistic Requests

Tim Althoff, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky

arXiv:1610.02567 [cs.CL] (Published 2016-10-08)

Mining the Web for Pharmacovigilance: the Case Study of Duloxetine and Venlafaxine

Abbas Chokor, Abeed Sarker, Graciela Gonzalez

arXiv:1708.00850 [cs.CL] (Published 2017-08-02)