arXiv Analytics

Sign in

arXiv:1606.09636 [cs.CL]AbstractReferencesReviewsResources

Mesoscopic representation of texts as complex networks

Henrique F. de Arruda, Filipi N. Silva, Vanessa Q. Marinho, Diego R. Amancio, Luciano da F. Costa

Published 2016-06-30Version 1

Texts are complex structures emerging from an intricate system consisting of syntactical constraints and semantical relationships. While the complete modeling of such structures is impractical owing to the high level of complexity inherent to linguistic constructions, under a limited domain, certain tasks can still be performed. Recently, statistical techniques aiming at analysis of texts, referred to as text analytics, have departed from the use of simple word count statistics towards a new paradigm. Text mining now hinges on a more sophisticate set of methods, including the representation of texts as complex networks. In this perspective, networks represent a set of textual elements, typically words; and links are established via adjacency relationships. While current word-adjacency (co-occurrence) methods successfully grasp syntactical and stylistic features of written texts, they are unable to represent important aspects of textual data, such as its topical structure. As a consequence, the mesoscopic structure of texts is often overlooked by current methodologies. In order to grasp mesoscopic characteristics of semantical content in written texts, we devised a network approach which is able to analyze documents in a multi-scale, mesoscopic fashion. In the proposed model, a limited amount of adjacent paragraphs are represented as nodes, which are connected whenever they share a minimum semantical content. To illustrate the capabilities of our model, we present, as a use case, a qualitative analysis of "Alice's Adventures in Wonderland", a novel by Lewis Carroll. We show that the mesoscopic structure of documents modeled as networks reveals many semantic traits of texts, a feature that could be explored in a myriad of semantic-based applications.

Related articles: Most relevant | Search more
arXiv:1303.0350 [cs.CL] (Published 2013-03-02)
Structure-semantics interplay in complex networks and its effects on the predictability of similarity in texts
arXiv:2108.11810 [cs.CL] (Published 2021-08-26)
A Computational Approach to Measure Empathy and Theory-of-Mind from Written Texts
arXiv:1507.07826 [cs.CL] (Published 2015-07-28)
Classifying informative and imaginative prose using complex networks