arXiv:2311.17043 [cs.CV]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords large language models, context token, content token encapsulates visual cues, long videos, llama-vid empowers existing frameworks Tags github project Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset