"You think that's funny?" Topic modelling and text generation using Amazon and Netflix stand-up comedy scripts
This project explores topic extraction techniques as well as the construction of two language generation model on an atypical collection of documents: stand-up comedy scripts.
The goal is two-fold:
-
a) The first is using standard Natural Language Processing (NLP) techniques to investigate and summarize the topics debated in a corpus of 143 scripts of stand-up comedy shows released by Amazon and Netflix between 2013 and 2021;
-
b) The second involves building on the knowledge acquired from the first part to create two Recurrent Neural Network models with different architectures and test the language capabilities of the most performing one in generating new text.
The project is structured in 5 parts:
- 1) Construction of the dataset
- 2) Exploratory Data Analysis
- 3) Topic modelling
- 4) Text generation
- 5) Conclusions
The notebook "You_think_thats_funny_SINGLE_FILE.ipynb" merges the five sections in a unique file.