arXiv:1811.00350 Abstract | arXiv Analytics

arXiv:1811.00350 [cs.SD]Abstract References Reviews Resources

End-to-end Models with auditory attention in Multi-channel Keyword Spotting

Published 2018-11-01Version 1

In this paper, we propose an attention-based end-to-end model for multi-channel keyword spotting (KWS), which is trained to optimize the KWS result directly. As a result, our model outperforms the baseline model with signal pre-processing techniques in both the clean and noisy testing data. We also found that multi-task learning results in a better performance when the training and testing data are similar. Transfer learning and multi-target spectral mapping can dramatically enhance the robustness to the noisy environment. At 0.1 false alarm (FA) per hour, the model with transfer learning and multi-target mapping gain an absolute 30% improvement in the wake-up rate in the noisy data with SNR about -20.

Comments: Submitted to ICASSP 2019

Categories: cs.SD, eess.AS

Keywords: multi-channel keyword spotting, end-to-end model, auditory attention, testing data, kws result

Related articles:

arXiv:1901.00295 [cs.SD] (Published 2019-01-02)

End-to-End Model for Speech Enhancement by Consistent Spectrogram Masking

Xingjian Du, Mengyao Zhu, Xuan Shi, Xinpeng Zhang, Wen Zhang, Jingdong Chen

arXiv:2005.12412 [cs.SD] (Published 2020-05-25)

InfantNet: A Deep Neural Network for Analyzing Infant Vocalizations

Mohammad K. Ebrahimpour, Sara Schneider, David C. Noelle, Christopher T. Kello

arXiv:2102.03957 [cs.SD] (Published 2021-02-08)

Extracting the Locus of Attention at a Cocktail Party from Single-Trial EEG using a Joint CNN-LSTM Model