arXiv:1912.01192 [cs.LG]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords learning adversarial mdps, bandit feedback, episodic finite-horizon markov decision processes, ensure sub-linear regret, optimistic loss estimator Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset