arXiv:cs/0001020 Abstract | arXiv Analytics

arXiv:cs/0001020 [cs.CL]Abstract References Reviews Resources

Exploiting Syntactic Structure for Natural Language Modeling

Published 2000-01-24Version 1

The thesis presents an attempt at using the syntactic structure in natural language for improved language models for speech recognition. The structured language model merges techniques in automatic parsing and language modeling using an original probabilistic parameterization of a shift-reduce parser. A maximum likelihood reestimation procedure belonging to the class of expectation-maximization algorithms is employed for training the model. Experiments on the Wall Street Journal, Switchboard and Broadcast News corpora show improvement in both perplexity and word error rate - word lattice rescoring - over the standard 3-gram language model. The significance of the thesis lies in presenting an original approach to language modeling that uses the hierarchical - syntactic - structure in natural language to improve on current 3-gram modeling techniques for large vocabulary speech recognition.

Comments: Advisor: Frederick Jelinek, Ph.D. Thesis, 122 pages; removed unused .eps file

Categories: cs.CL

Subjects: G.3, I.2.7, I.5.1, I.5.4

Keywords: exploiting syntactic structure, natural language modeling, likelihood reestimation procedure belonging, structured language model merges techniques, large vocabulary speech recognition

Tags: dissertation

Related articles: Most relevant | Search more

arXiv:1610.09975 [cs.CL] (Published 2016-10-31)

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition

Hagen Soltau, Hank Liao, Hasim Sak

arXiv:2010.16368 [cs.CL] (Published 2020-10-30)

Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition

Wei Zhou, Simon Berger, Ralf Schlüter, Hermann Ney