Language & Literature

Keywords: computational linguistics, intertextuality, sound, stylistics, classics, authorship attribution
Spring 2009 - Present
teaser image

Description

The study of intertextuality, the shaping of a text’s meaning by other texts, remains a labor-intensive process for the literary critic. Julia Kristeva, who coined the term intertext, suggested, “Any text is constructed as a mosaic of quotations; any text is the absorption and transformation of another”. Such transformations range from direct quotations, representing a simple and overt intertextuality, to more complex references that are intentionally or subconsciously absorbed into a text. In the years since Kristeva first drew attention to the phenomenon, the field of its study has become increasingly - in some cases debilitatingly - complex. As this theoretical complexity grows, so does the burden upon the practicing literary critic to verify suspected instances of intertextuality. The critic must command a large corpus of possible contributing works; meanwhile, objective criteria by which intertext may be measured are lacking. Since, in many cases, the problem is one of pattern recognition, the task of locating new relationships between texts and validating suspected ones is a good candidate for automated assistance by computers.

In this work, we propose the use of machine learning and related statistical methods to improve the process by which intertextuality is studied. Specifically, we bring to bear computational techniques from the field of stylistics in order to examine instances where an author who is familiar with a particular corpus deliberately or subconsciously reflects this in discrete passages within his own work. In the feature space, we are particularly interested in the repetitive stylistic nature of sound oriented texts. Through our analysis, we have established that authors make extensive use of repetitive sound to emphasize ideas or phrases, or to construct poetic forms.

This work is supported by NEH Digital Humanities Start-Up Grant Award No. HD-51570-12

Publications

  • "Authorship Attribution for Social Media Forensics,"
    Anderson Rocha, Walter J. Scheirer, Christopher W. Forstall, Thiago Cavalcante, Antonio Theophilo,
    Bingyu Shen
    , Ariadne R. B. Carvalho, Efstathios Stamatatos,
    IEEE Transactions on Information Forensics and Security (T-IFS),
    Accepted August 2016.
  • "The Sense of a Connection: Automatic Tracing of Intertextuality by Meaning,"
    Walter J. Scheirer, Christopher W. Forstall, Neil Coffee,
    Digital Scholarship in the Humanities (DSH),
    April 2016.
  • "Evidence of Intertextuality: Investigating Paul the Deacon's Angustae Vitae,"
    Christopher W. Forstall, Sarah Jacobson, Walter J. Scheirer,
    Literary & Linguistic Computing (LLC),
    September 2011.
  • "Features from Frequency: Authorship and Stylistic Analysis Using Repetitive Sound,"
    Christopher W. Forstall, Walter J. Scheirer,
    Proceedings of the 4th Annual Chicago Colloquium on Digital Humanities and Computer Science (DHCS),
    November 2009.

Abstracts

  • "Euterpe's Hidden Song: Patterns in Elegy,"
    Walter J. Scheirer, Christopher W. Forstall,
    Digital Humanities 2014 (DH),
    July 2014.
  • "Modelling the Interpretation of Literary Allusion with Machine Learning Techniques,"
    Neil Coffee, James Gawley, Christopher W. Forstall, Walter J. Scheirer, David Johnson, Jason J. Corso, Brian Parks,
    Digital Humanities 2013 (DH),
    July 2013.
  • "Revealing Hidden Patterns in the Meter of Homer's Iliad,"
    Christopher W. Forstall, Walter J. Scheirer,
    The 7th Annual Chicago Colloquium on Digital Humanities and Computer Science (DHCS),
    November 2012.
  • "Visualizing Sound as Functional n-grams in Homeric Greek Poetry,"
    Christopher W. Forstall, Walter J. Scheirer,
    Digital Humanities 2011 (DH),
    June 2011.
  • "A Statistical Study of Latin Elegiac Couplets,"
    Christopher W. Forstall, Walter J. Scheirer,
    The 5th Annual Chicago Colloquium on Digital Humanities and Computer Science (DHCS),
    November 2010.

Presentations

Demos

Tesserae

Tesserae: Intertextual Phrase Matching

Tesserae is a freely available tool for detecting allusions in literary text. It is a joint collaboration with the Department of Classics at UB.

Code

  • The Tesserae code is available on GitHub