- Instructor: Tom van Nuenen
- Email: [email protected]
This repo contains a number of notebooks for the 2021 Digital Humanities course at SISU in May - June 2021, which investigates the possibilities and pitfalls of computational text research for humanities students.
For centuries, the humanities has operated through the close reading of cultural objects: reading to uncover layers of meaning that lead to deep comprehension. Such ‘close’ approaches are increasingly replaced by ‘distant’ methods that rely on programmatic modeling and corpus linguistics. This allows researchers to focus on units that are much smaller or much larger than the singular case study, text, or author – words, topics, genres, themes, and so on. Close and distant reading are especially relevant in a context of social media, which are marked by the spread of disinformation – captured in terms such as post-truth, filter bubbles, and clickbait. The visibility of online content often seems to be informed more by virality and controversy than by truthfulness and dialogue. How can we understand the large quantities of social data on online platforms in order to reveal ideologies, biases, and controversies? In this course, we will engage with social media data in order to uncover such patterns of meaning-making. Using a variety of strategies of textual and data analysis (e.g. tf-idf, topic modeling and word embeddings), students will learn to apply and critically reflect on corpus linguistics with a critical and explorative mindset. We will focus on the discursive ways in which facts and opinions are negotiated within communities, and the patterns and biases that appear in natural language.
This course will realize the following learning outcomes:
- Attain knowledge and understanding of the epistemological potentials and pitfalls of several popular quantitative approaches to text analysis.
- Apply textual and language analysis methods to contemporary datasets taken from social media.
- Demonstrate an awareness of the norms and presuppositions in quantitative methodological frameworks.
- Applying quantitative methods from the Digital Humanities with a critical and explorative mindset.
Introduction to Jupyter Notebooks, class repositories; working through some programming fundamentals in Python.
Exploring basic operations on Pandas DataFrames when dealing with social data. Preprocessing data and comparing datasets using tfidf.
Introduction to distant reading using NLTK and Pandas.
Exploring topic modeling as one way to move beyond the author and explore discursive patterns in our data. Using topic modeling findings to engage in a close reading.
Introducing Word Embeddings through Word2Vec in Python. Critical discussion about the concerns of bias implicit in Word Embeddings models.
Exploring how to analyze language biases using Word Embeddings methods.
Note that there are two optional notebooks for those who are interested: one on linear regression, and one on Naive Bayes classification and sentiment analysis.