arXiv:1712.05652 [cs.LG]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords pre-training attention mechanisms, recurrent neural networks, work draws inspiration, parallels results, differentiable attention mechanisms Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset