Language & Literature

Keywords: computational linguistics, intertextuality, sound, stylistics, classics, historical document analysis
Spring 2009 - Present
teaser image

Description

The study of intertextuality, the shaping of a text’s meaning by other texts, remains a labor-intensive process for the literary critic. Julia Kristeva, who coined the term intertext, suggested, "Any text is constructed as a mosaic of quotations; any text is the absorption and transformation of another". Such transformations range from direct quotations, representing a simple and overt intertextuality, to more complex references that are intentionally or subconsciously absorbed into a text. In the years since Kristeva first drew attention to the phenomenon, the field of its study has become increasingly - in some cases debilitatingly - complex. As this theoretical complexity grows, so does the burden upon the practicing literary critic to verify suspected instances of intertextuality. The critic must command a large corpus of possible contributing works; meanwhile, objective criteria by which intertext may be measured are lacking. Since, in many cases, the problem is one of pattern recognition, the task of locating new relationships between texts and validating suspected ones is a good candidate for automated assistance by computers.

In this work, we propose the use of machine learning and related statistical methods to improve the process by which intertextuality is studied. Specifically, we bring to bear computational techniques from the field of stylistics in order to examine instances where an author who is familiar with a particular corpus deliberately or subconsciously reflects this in discrete passages within his own work. In the feature space, we are particularly interested in the repetitive stylistic nature of sound oriented texts. Through our analysis, we have established that authors make extensive use of repetitive sound to emphasize ideas or phrases, or to construct poetic forms.

A second avenue of research is the application of AI historical document analysis to produce machine readable representations of text from digital images. In archives scattered around the globe, old manuscripts can be found piled up to the ceiling and spread out as far as the eye can see. The amount of writing produced on physical media since antiquity is staggering, and very little of it has been digitized and transcribed into plain text for researchers to study using modern data mining tools. Work in the digital humanities has sought to address this problem by deploying everything from off-the-shelf optical character recognition (OCR) tools to state-of-the-art convolutional neural network-based transcription pipelines. However, such work has been underpinned by the long-standing, yet incorrect, belief that computer vision has solved handwritten document transcription. The open nature of this problem, coupled with a difficult data domain that has largely remained the realm of specialist scholars, makes it a fascinating case study for testing the capabilities of artificial intelligence.

This work is supported by NEH Digital Humanities Start-Up Grant Award No. HD-51570-12, NEH Digital Humanities Advancement Grant No. HAA-258767-18, and the Andrew W. Mellon Foundation

Publications

  • "The Paleographer’s Eye ex machina:
    Using Computer Vision to Assist Humanists in Scribal Hand Identification,"
    Samuel Grieggs, Cai Henderson, Sebastian Sobecki, Alexandra Gillespie, Walter Scheirer,
    Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision
    (WACV),
    January 2024.
  • "Automated Transcription of Gə'əz Manuscripts Using Deep Learning,"
    Samuel Grieggs, Jessica Lockhart, Alexandra Atiya, Suzanne Akbari,
    Eyob Derillo, Jarod Jacobs, Christine Kwon, Michael Gervers, Steve Delamarter,
    Alexandra Gillespie, Walter J. Scheirer,
    Digital Humanities Quarterly,
    August 2023.
  • "The Tesserae Intertext Service,"
    Nozomu Okuda, Jeffery Kinnison, Patrick Burns, Neil Coffee, Walter J. Scheirer,
    Digital Humanities Quarterly,
    April 2022.
  • "Measuring Human Perception to Improve Handwritten Document Transcription,"
    Samuel Grieggs, Bingyu Shen, Greta Rauch, Pei Li, Jiaqi Ma, David Chiang,
    Brian Price, Walter J. Scheirer,
    IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI),
    Accepted for Publicatiton in June 2021.
  • "Practical Text Phylogeny for Real-World Settings,"
    Bingyu Shen, Christopher W. Forstall, Anderson Rocha Walter J. Scheirer,
    IEEE Access,
    December 2018.
  • "Coupling Story to Visualization: Using Textual Analysis as a Bridge Between Data
    and Interpretation,"
    Ronald Metoyer, Qiyu Zhi, Bart Janczuk, Walter J. Scheirer,
    Proceedings of the ACM International Conference on Intelligent User Interfaces (IUI),
    March 2018.
  • "Authorship Attribution for Social Media Forensics,"
    Anderson Rocha, Walter J. Scheirer, Christopher W. Forstall, Thiago Cavalcante, Antonio Theophilo,
    Bingyu Shen
    , Ariadne R. B. Carvalho, Efstathios Stamatatos,
    IEEE Transactions on Information Forensics and Security (T-IFS),
    January 2017.
  • "The Sense of a Connection: Automatic Tracing of Intertextuality by Meaning,"
    Walter J. Scheirer, Christopher W. Forstall, Neil Coffee,
    Digital Scholarship in the Humanities (DSH),
    April 2016.
  • "Evidence of Intertextuality: Investigating Paul the Deacon's Angustae Vitae,"
    Christopher W. Forstall, Sarah Jacobson, Walter J. Scheirer,
    Literary & Linguistic Computing (LLC),
    September 2011.
  • "Features from Frequency: Authorship and Stylistic Analysis Using Repetitive Sound,"
    Christopher W. Forstall, Walter J. Scheirer,
    Proceedings of the 4th Annual Chicago Colloquium on Digital Humanities and Computer Science (DHCS),
    November 2009.

Abstracts

  • "Verba Volant, Scripta Manent: Approaching the Automatic Transcription of Medieval Manuscript,"
    Samuel Grieggs, Bingyu Shen, Hildegund Müller, Christine Ascik, Erik Ellis, Mihow McKenny, Nikolas Churik,
    Emily Mahan, Walter J. Scheirer,
    Digital Humanities 2018 (DH),
    July 2018.
  • "Euterpe's Hidden Song: Patterns in Elegy,"
    Walter J. Scheirer, Christopher W. Forstall,
    Digital Humanities 2014 (DH),
    July 2014.
  • "Modelling the Interpretation of Literary Allusion with Machine Learning Techniques,"
    Neil Coffee, James Gawley, Christopher W. Forstall, Walter J. Scheirer, David Johnson, Jason J. Corso, Brian Parks,
    Digital Humanities 2013 (DH),
    July 2013.
  • "Revealing Hidden Patterns in the Meter of Homer's Iliad,"
    Christopher W. Forstall, Walter J. Scheirer,
    The 7th Annual Chicago Colloquium on Digital Humanities and Computer Science (DHCS),
    November 2012.
  • "Visualizing Sound as Functional n-grams in Homeric Greek Poetry,"
    Christopher W. Forstall, Walter J. Scheirer,
    Digital Humanities 2011 (DH),
    June 2011.
  • "A Statistical Study of Latin Elegiac Couplets,"
    Christopher W. Forstall, Walter J. Scheirer,
    The 5th Annual Chicago Colloquium on Digital Humanities and Computer Science (DHCS),
    November 2010.

Demos

Tesserae

Tesserae: Intertextual Phrase Matching

Tesserae is a freely available tool for detecting allusions in literary text. It is a joint collaboration with the Department of Classics at UB.

Code

  • The Tesserae code is available on GitHub