Historical Word Cloud Generator

This is a project developed by Tim Chen at MongoDB, Inc. as part of their Teacher Fellowship Program. It uses the MERN stack - MongoDB, Express, React, Node.

The application aims to create an educational tool for anyone looking to explore American newspapers from the past. On the surface this application is only a word cloud generator, but users can also load up historical newspapers from the years of 1836 to 1924, archived by the Library of Congress.

Demo

The live version can be seen here: historical-word-cloud.herokuapp.com

Features

Word Cloud Generation

Using the d3-cloud library by jasondavies, word clouds can be generated from any text, regardless of context.

Load Data from Past Newspapers

Using data from the Library of Congress, newspapers from 1836 to 1924 can be accessed by selecting a date. The application contains a database that holds all previously queried dates. If the date selected by the user exists in the database, it will load that data and present the user with options. If the date is unrecognized by the database, it will scrape the relevant data from the Library of Congress, save that data into the database, then proceed normally. Scraping is done using cheerio. Scraping may take up to 30 seconds, as the application implements mild rate limiting.

Since articles on the Library of Congress are read via Optical Character Recognition, many words and characters may be incorrectly recognized (the older the newspaper, the worse it is). As such, some stop words may not be recognized, or irrelevant words may rise to the top of the word cloud. The application has an option (enabled by default) to attempt to automatically remove these types of words.

Computer Science Concepts

Various CS concepts are also given explanations for students who are interested to explore. Three concepts are covered:

A simple algorithm for counting words (including an interactive step-by-step demo)
Web Scraping
Databases

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
api		api
client		client
public		public
test		test
views		views
.babelrc		.babelrc
.credentials.js		.credentials.js
.credentials_template.js		.credentials_template.js
.gitignore		.gitignore
README.md		README.md
app.js		app.js
config.js		config.js
npm-debug.log		npm-debug.log
package-lock.json		package-lock.json
package.json		package.json
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Historical Word Cloud Generator

Demo

Features

Word Cloud Generation

Load Data from Past Newspapers

Computer Science Concepts

About

Releases

Packages

Contributors 2

Languages

timmyichen/historical-word-cloud-generator

Folders and files

Latest commit

History

Repository files navigation

Historical Word Cloud Generator

Demo

Features

Word Cloud Generation

Load Data from Past Newspapers

Computer Science Concepts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages