arXiv Analytics

Sign in

arXiv:2001.05853 [cs.CV]AbstractReferencesReviewsResources

Identifying Table Structure in Documents using Conditional Generative Adversarial Networks

Nataliya Le Vine, Claus Horn, Matthew Zeigenfuse, Mark Rowan

Published 2020-01-13Version 1

In many industries, as well as in academic research, information is primarily transmitted in the form of unstructured documents (this article, for example). Hierarchically-related data is rendered as tables, and extracting information from tables in such documents presents a significant challenge. Many existing methods take a bottom-up approach, first integrating lines into cells, then cells into rows or columns, and finally inferring a structure from the resulting 2-D layout. But such approaches neglect the available prior information relating to table structure, namely that the table is merely an arbitrary representation of a latent logical structure. We propose a top-down approach, first using a conditional generative adversarial network to map a table image into a standardised `skeleton' table form denoting approximate row and column borders without table content, then deriving latent table structure using xy-cut projection and Genetic Algorithm optimisation. The approach is easily adaptable to different table configurations and requires small data set sizes for training.

Comments: arXiv admin note: substantial text overlap with arXiv:1904.01947
Categories: cs.CV, cs.NE
Related articles: Most relevant | Search more
arXiv:2210.14392 [cs.CV] (Published 2022-10-26)
Zero-Shot Learning of a Conditional Generative Adversarial Network for Data-Free Network Quantization
arXiv:1701.05957 [cs.CV] (Published 2017-01-21)
Image De-raining Using a Conditional Generative Adversarial Network
arXiv:2301.08067 [cs.CV] (Published 2023-01-19)
Interpreting CNN Predictions using Conditional Generative Adversarial Networks