Minimal normalization of a json file containing organization affiliation data from https://arxiv.org/
Data was initially explored using jupyter notebook. I looked at how many entries there were versus the number of unique organization entries, the most common words, and the most common abbreviations (defined as having all characters capitalized). Though I could have done more exploration and leveraged these, I decided to normalize the data in a way more similar to the example given in the problem statement and fix abbreviations for common words such as "U." or "U" for "University".