arXiv:2301.13310 [cs.LG]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords alternating updates, efficient transformers, language tasks demonstrate, deep transformer networks, sparse mixture-of-experts models Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset