Computational Analysis of Multilingual Book Reviews

Workshop Description

Aims and Setup

Addressing the great relevance of cultural practices in the digital sphere, the proposed workshop will pursue a multilingual investigation of fiction book reviewing. The idea is to give participants a hands-on experience of the full workflow involved when doing research on reader response using book reviews and online comments as a proxy of reception. Data will be provided and each part of the workshop will cover different stages of the research workflow, addressing theoretical, methodological, and interpretative challenges. Participants will learn from experienced scholars how to do computational reader response studies with advanced NLP methods, statistical modelling, and a sensibility for handling both literary works and social media data. We propose a full day workshop consisting of 4 sessions. The workshop is particularly addressed at early career researchers who want to have an end-to-end overview of a computational humanities project or are interested in reader response. However, senior researchers are very welcome to attend and discuss theory, operationalization of concepts, and methodological choices.

Expected Outcomes

  • Participants will develop a deeper understanding of the theoretical and methodological issues related to the computational analysis of literary texts and reader response data.
  • Participants will have a collection of Jupyter notebooks that they can reuse for their own research.
  • The organizers will receive feedback on the perceived validity of the tools and methods used.

Background

Reading books and communicating about them is one of the ways people have long been reflecting on life and values, playing with alternative ideas, regulating emotions, and enjoying aesthetic pleasure. They also relate to each other through these practices: intellectually, communicatively and emotionally. Today, book reception practices have become digital at an unprecedented scale: millions of readers interact with each other on digital reviewing platforms 1. While collections of printed books have been developed and maintained for centuries, archives of ordinary readers’ responses to books have rarely been preserved. Due to a lack of empirical and historical research evidence, many aspects of readership and reception history remained theoretical and/or anecdotal until the 21st century. In the last two decades, researchers have been uncovering historical archives 2 and developing contemporary book reception corpora in support of empirical investigations in reading behavior, reader response, reception, literary appreciation, etc. 3 4 5. User-generated book reviews on social reading/reviewing websites such as Amazon and Goodreads meet this need, and have opened up unprecedented research possibilities to study what people feel and think about the books they read and how they share their reading experiences with others. This information available online offers several opportunities: readers can more easily socialize with people having similar interests, libraries can make better informed decisions when providing services to their patrons, publishers can get insight about their readership and tailor their editorial and marketing strategies, booksellers can develop more accurate recommender systems, and new business models have been created by platforms that offer successful content to media industries like Netflix. During this workshop, on the one hand, we will discuss theoretical approaches to model book reviewing and book reception across different global and cultural spheres; on the other hand, we illuminate which methodologies need to be developed in between ‘close reading’ of individual reviews and computational analysis of thousands of reviews.

GitHub Repository: IGELsociety/CHR2024-book-reviews-workshop

Instructors

  • Federico Pianzola - is Assistant Professor of Computational Humanities at the University of Groningen (Netherlands), where he coordinates the Master’s in Digital Humanities. He is the Principal Investigator of the ERC Starting Grant project GOLEM (Graphs and Ontologies for Literary Evolution Models). He is the author of the book Digital Social Reading: Sharing Fiction in the 21st Century (MIT Press, 2024).

  • Berenike Herrmann - is professor of Newer German Literature with a specialization in Literary Theory and Digital Humanities at Bielefeld University (Germany). She is head of the Working Group Corpus Literary Studies and PI at two Collaborative Research Centres: ‘Practices of Comparing: Ordering and Changing the world’ (SFB 1288), and ‚Linguistic Creativity in Communication’ (SFB 1646). Her research focuses on literary style, spatial and affective literary studies and on questions of historical change from a praxeological perspective. To this end, she uses and develops computational, empirical and mixed-methods approaches. Berenike is known for her interdisciplinary literary-linguistic perspective. Berenike is Acting Chair of the Scientific Coordination Committee ‘Collections’ of the national research data infrastructure NFDI Text +, Speaker of the Community of Practice “Data Literacy” of BiLinked at Bielefeld University, and board member of the international ADHO Special Interest Group “Digital Literary Stylistics” SIG DLS. She is currently chair of the local organization team of the DHd (Digital Humanities in the German-speaking countries) at Bielefeld.

  • Joris van Zundert - is senior researcher and developer in humanities computing in the department of literary studies and the Digital Humanities Lab at the Huygens Institute of the Royal Netherlands Academy of Arts and Sciences (KNAW) in Amsterdam. His research focuses on computational algorithms to analyze literary and historical texts, and on aspects of humanities information and data modeling. His computational analytic work focuses on the correlation between text immanent features of texts and sociological processes around the concept of literature. He is also involved in developing computational approaches to stemmatology, narratology, and scholarly editions.

  • Katja Tereshko - is Postdoctoral Researcher at the Huygens Institute in the Netherlands. Within the Impact and Fiction project she works on stylistic aspects of books which can have effect on the readers. She is focusing now on the tense-choice of books, conventions on that within different genres and the response on the unconventional use of tenses. Katja is also actively participating in creating an annotation scheme for online Dutch book reviews. With a background in Dutch language and literature, she is also interested in the readers in the group of learners of Dutch as a foreign language and their reactions to books in Dutch. In Russian, she has also published the monograph Animal imagery in Dutch phraseology and culture (2022).

  • Simone Rebora - is an associate professor of comparative literature at the University of Verona. As a postdoc, he worked at the Universities of Mainz, Bielefeld, Göttingen, and Basel. His main research interests are theory and history of literary historiography, reader response studies, and computational literary studies. His essays have been published in journals such as “PLOS ONE”, “Digital Scholarship in the Humanities”, and “Modern Language Notes”. In Italian, he published the monographs Claudio Magris (2015) and History/Historie e Digital Humanities (2018).

  • Yuri Bizzoni - is a postdoc researcher in literary computing at the University of Århus. He has also worked as a postdoc at the Universiy of Saarland, Saarbrücken, Germany, on translation and scientific language. His main research interests are literary reception and the concept of literary quality, creative writing, and the perception of canonical literature. His works have been published in journals such as Digital Humanities Quarterly and the Journal of Computational Literary Studies, as well as in conferences such as CHR, NLP4DH, LREC, and several ACL venues. He published the monograph Detection and Aptness (2019) and the thesis The Italian Homer (2015).

  • Kristoffer Nielbo - is professor of Humanities Computing at Aarhus University, Denmark. He has conducted basic research in natural language and developed research infrastructures for more than a decade. His expertise lies in developing AI tools designed to analyze extensive text databases. Notably, his contributions to applied natural language processing have been recognized with awards and have resulted in creating multiple tools for large-scale text data analysis, particularly in Danish. Furthermore, he has developed computational and data infrastructures that are used widely in Denmark and Scandinavia. Prof. Nielbo leads a large research center and infrastructure service at Aarhus University.

  • Yuerong Hu - is an assistant professor at the Department of Information and Library Science of the Luddy School of Informatics, Computing, and Engineering, at Indiana University Bloomington. Her primary research areas are digital humanities and cultural analytics. Specifically, she investigates (1) critical scholarly usage of digitized books; and (2) complexities of online book reviews. Her work has been published in journals such as the International Journal on Digital Libraries, College and Research Libraries, and Information Development, as well as conferences including CHR, ADHO annual conference, ACM/IEEE JCDL, and iConference.

Tutorial Content

Session 0: Introduction

Facilitators: Berenike Herrmann, Federico Pianzola, Yuerong Hu
Topic:
Computational reader response studies with a multilingual perspective and learning objectives of the workshop.

Session 1: Annotation

Facilitators: Simone Rebora, Peter Boot, Katja Tereshko, Yuerong Hu
Topics:
- Inter-annotator agreement.
- Discussion on the Annotation scheme.

Session 2: Textual Feature Extraction: with computational methods on books and reviews in different languages

Facilitators: Yuri Bizzonis, Kristoffer Nielbo, Joris van Zundert, Katja Tereshko, Berenike Herrmann
Topics:
- Using NLP tools to extract features of reader response.
- Using rule-based NLP (ImpFic heuristic model) to identify impact statements in reviews.

Session 3: Hypotheses Testing

Facilitators: Joris van Zundert, Katja Tereshko, Yuri Bizzoni, Kristoffer Nielbo, Federico Pianzola
Topics:
- Hypotheses Formulation: Correlation of the textual features and reader response (in terms of genre, style, language, platform, etc.).
- Using Machine Learning to predict reception based on textual features.
- Hands-on experience with mixed-effect models and interpretation.

Session 4: Reflection

Facilitator: Federico Pianzola
Topic:
Discussion of the aptness of computational methods and the methodological choices to study reader response.

Quick Start Guide

The notebook will be updated before the workshop (Google Colab)

When/Where

Time zone: Europe/Copenhagen

2024-12-03 09:00 - 17:30

CHR 2024

FIFTH CONFERENCE ON COMPUTATIONAL HUMANITIES RESEARCH

Aarhus University, Denmark

Footnotes

  1. S. Rebora, P. Boot, F. Pianzola, B. Gasser, J. B. Herrmann, M. Kraxenberger, M. Kuijpers, G. Lauer, P. Lendvai, T. C. Messerli, P. Sorrentino, Digital Humanities and Digital Social Reading, Digital Scholarship in the Humanities 36 (2021) ii230–ii250. doi:https://doi.org/10.1093/llc/fqab020.↩︎

  2. J. Kotin, R. S. Koeser, C. Adair, S. Alagappan, P. Allen, J. Bauer, O. J. Browne, N. Budak, H. Calver, J. Y. Chow, I. Davis, G. Doroudian, C. Engel, V. Gautreau, A. Gjaja, E. Green, I. Hart, B. Hicks, M. E. Joelson, C. Kelly, S. Krolewski, X. Li, E. Maag, E. Macksey, C. Mahoney, F. Mancino, J. D. McCarthy, M. Naydan, S. Root, I. Ruehl, S. Thode, K. Vandermel, C. VanSant, C. E. Wulfman, Shakespeare and Company Project Dataset: Lending Library Members, Books, Events, 2021. URL: https://dataspace.princeton.edu/handle/88435/dsp01fx719q532. doi:10.34770/39sq-bm51, accepted: 2021-01-29T21:13:48Z.↩︎

  3. P. Boot, The desirability of a corpus of online book responses, in: Proceedings of the Workshop on Computational Linguistics for Literature, 2013, p. 32–40.↩︎

  4. L. Dai, From history of the book to history of reading: theories and methods for historical studies of reading, Xinxin, 2017.↩︎

  5. J. F. English, A Future for Empirical Reader Studies, 2021. URL: https://culturalanalytics. org/post/1208-a-future-for-empirical-reader-studies.↩︎