arXiv:2309.15826 Abstract | arXiv Analytics

arXiv:2309.15826 [cs.CL]Abstract References Reviews Resources

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe

Published 2023-09-27Version 1

Recent works in end-to-end speech-to-text translation (ST) have proposed multi-tasking methods with soft parameter sharing which leverage machine translation (MT) data via secondary encoders that map text inputs to an eventual cross-modal representation. In this work, we instead propose a ST/MT multi-tasking framework with hard parameter sharing in which all model parameters are shared cross-modally. Our method reduces the speech-text modality gap via a pre-processing stage which converts speech and text inputs into two discrete token sequences of similar length -- this allows models to indiscriminately process both modalities simply using a joint vocabulary. With experiments on MuST-C, we demonstrate that our multi-tasking framework improves attentional encoder-decoder, Connectionist Temporal Classification (CTC), transducer, and joint CTC/attention models by an average of +0.5 BLEU without any external MT data. Further, we show that this framework incorporates external MT data, yielding +0.8 BLEU, and also improves transfer learning from pre-trained textual models, yielding +1.8 BLEU.

Categories: cs.CL, cs.SD, eess.AS

Keywords: hard parameter sharing, speech-to-text translation, cross-modal multi-tasking, framework incorporates external mt data, text inputs

Related articles: Most relevant | Search more

arXiv:1702.03856 [cs.CL] (Published 2017-02-13)

Towards speech-to-text translation without speech recognition

Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater

arXiv:2407.03169 [cs.CL] (Published 2024-07-03)

Investigating Decoder-only Large Language Models for Speech-to-text Translation

Chao-Wei Huang, Hui Lu, Hongyu Gong, Hirofumi Inaguma, Ilia Kulikov, Ruslan Mavlyutov, Sravya Popuri

arXiv:1912.07240 [cs.CL] (Published 2019-12-16)

Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding

Yuchen Liu et al.

arXiv Analytics

arXiv:2309.15826 [cs.CL]Abstract References Reviews Resources

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Links

Toolbox

arXiv:2309.15826 [cs.CL]AbstractReferencesReviewsResources

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Links

Toolbox

arXiv:2309.15826 [cs.CL]Abstract References Reviews Resources