arXiv Analytics

Sign in

arXiv:1710.10755 [cs.CV]AbstractReferencesReviewsResources

Modeling Attention in Panoramic Video: A Deep Reinforcement Learning Approach

Mai Xu, Yuhang Song, Jianyi Wang, Minglang Qiao, Liangyu Huo, Zulin Wang

Published 2017-10-30Version 1

Panoramic video provides an immersive and interactive experience by enabling humans to control the field of view (FoV) through head movement (HM). Thus, HM plays a key role in modeling human attention on panoramic video. This paper establishes a database collecting subjects' HM positions on panoramic video sequences. From this database, we find that the HM data are highly consistent across subjects. Furthermore, we find that deep reinforcement learning (DRL) can be applied to predict HM positions, via maximizing the reward of imitating human HM scanpaths through the agent's actions. Based on our findings, we propose a DRL based HM prediction (DHP) approach with offline and online versions, called offline-DHP and online-DHP, respectively. In offline-DHP, multiple DRL workflows are run to determine some potential HM positions at each panoramic frame. Then, a heat map of the potential HM positions, called the HM map, is generated as the output of offline-DHP. In online-DHP, the next HM position of one subject is estimated given the currently observed HM position. Online-DHP is achieved by developing a DRL algorithm, upon the learned model of offline-DHP. Finally, the experimental results validate that the offline-DHP and online-DHP are effective in offline and online prediction of HM positions for panoramic video, and that the learned offline-DHP model can improve the performance of online-DHP.

Related articles: Most relevant | Search more
arXiv:1901.00979 [cs.CV] (Published 2019-01-04)
Unsupervised Learning of Depth and Ego-Motion from Panoramic Video
arXiv:2403.17708 [cs.CV] (Published 2024-03-26)
Panonut360: A Head and Eye Tracking Dataset for Panoramic Video
arXiv:2012.12104 [cs.CV] (Published 2020-12-09)
A Deep Reinforcement Learning Approach for Ramp Metering Based on Traffic Video Data