arXiv:1812.03283 Abstract | arXiv Analytics

arXiv:1812.03283 [cs.CV]Abstract References Reviews Resources

Attend More Times for Image Captioning

Jiajun Du, Yu Qin, Hongtao Lu, Yonghua Zhang

Published 2018-12-08Version 1

Most attention-based image captioning models attend to the image once per word. However, attending once per word is rigid and is easy to miss some information. Attending more times can adjust the attention position, find the missing information back and avoid generating the wrong word. In this paper, we show that attending more times per word can gain improvements in the image captioning task. We propose a flexible two-LSTM merge model to make it convenient to encode more attentions than words. Our captioning model uses two LSTMs to encode the word sequence and the attention sequence respectively. The information of the two LSTMs and the image feature are combined to predict the next word. Experiments on the MSCOCO caption dataset show that our method outperforms the state-of-the-art. Using bottom up features and self-critical training method, our method gets BLEU-4, METEOR, ROUGE-L and CIDEr scores of 0.381, 0.283, 0.580 and 1.261 on the Karpathy test split.

Categories: cs.CV

Keywords: karpathy test split, information, flexible two-lstm merge model, mscoco caption dataset, attention-based image captioning models attend

Related articles: Most relevant | Search more

arXiv:1602.01228 [cs.CV] (Published 2016-02-03)

Image and Information

Frank Nielsen

arXiv:1812.02524 [cs.CV] (Published 2018-12-06)

Towards Leveraging the Information of Gradients in Optimization-based Adversarial Attack

Jingyang Zhang, Hsin-Pai Cheng, Chunpeng Wu, Hai Li, Yiran Chen

arXiv:2108.03852 [cs.CV] (Published 2021-08-09)

Complementary Patch for Weakly Supervised Semantic Segmentation

Fei Zhang, Chaochen Gu, Chenyue Zhang, Yuchao Dai

arXiv Analytics

arXiv:1812.03283 [cs.CV]Abstract References Reviews Resources

Attend More Times for Image Captioning

Links

Toolbox

arXiv:1812.03283 [cs.CV]AbstractReferencesReviewsResources

Attend More Times for Image Captioning

Links

Toolbox

arXiv:1812.03283 [cs.CV]Abstract References Reviews Resources