Network-based approaches for authorship attribution

15 January 2019

The problem of authorship attribution (AA) involves matching a text of unknown authorship with its creator, found among a pool of candidate authors. In this work, we examine in detail authorship attribution methods that rely on networks of function words to detect an “authorial fingerprint” of literary works. Previous studies interpreted these word adjacency networks (WANs) as Markov chains, giving transition rates between function words, and they compared them using information-theoretic measures. Here, we apply a variety of network flow-based tools, such as role-based similarity and community detection, to perform a direct comparison of the WANs. These tools reveal an interesting relation between communities of function words and grammatical categories. Moreover, we propose two new criteria for attribution based on the comparison of connectivity patterns and the similarity of network partitions. The results are positive, but importantly, we observe that the attribution context is an important limiting factor that is often overlooked in the field's literature. Furthermore, we give important new directions that deserve further consideration.