arXiv Analytics

Sign in

arXiv:2102.11506 [cs.CV]AbstractReferencesReviewsResources

Comparative evaluation of CNN architectures for Image Caption Generation

Sulabh Katiyar, Samir Kumar Borgohain

Published 2021-02-23Version 1

Aided by recent advances in Deep Learning, Image Caption Generation has seen tremendous progress over the last few years. Most methods use transfer learning to extract visual information, in the form of image features, with the help of pre-trained Convolutional Neural Network models followed by transformation of the visual information using a Caption Generator module to generate the output sentences. Different methods have used different Convolutional Neural Network Architectures and, to the best of our knowledge, there is no systematic study which compares the relative efficacy of different Convolutional Neural Network architectures for extracting the visual information. In this work, we have evaluated 17 different Convolutional Neural Networks on two popular Image Caption Generation frameworks: the first based on Neural Image Caption (NIC) generation model and the second based on Soft-Attention framework. We observe that model complexity of Convolutional Neural Network, as measured by number of parameters, and the accuracy of the model on Object Recognition task does not necessarily co-relate with its efficacy on feature extraction for Image Caption Generation task.

Comments: Article Published in International Journal of Advanced Computer Science and Applications(IJACSA), Volume 11 Issue 12, 2020
Journal: in International Journal of Advanced Computer Science and Applications, 11(12), 2020
Categories: cs.CV, cs.AI, cs.LG, cs.MM, cs.NE
Related articles: Most relevant | Search more
arXiv:2211.03854 [cs.CV] (Published 2022-11-07)
Exploration of Convolutional Neural Network Architectures for Large Region Map Automation
arXiv:1110.2053 [cs.CV] (Published 2011-10-10, updated 2017-12-27)
Steps Towards a Theory of Visual Information: Active Perception, Signal-to-Symbol Conversion and the Interplay Between Sensing and Control
arXiv:2201.05545 [cs.CV] (Published 2022-01-14, updated 2022-04-05)
Multimodal registration of FISH and nanoSIMS images using convolutional neural network models