Skip to content

Latest commit

 

History

History
4 lines (3 loc) · 620 Bytes

README.md

File metadata and controls

4 lines (3 loc) · 620 Bytes

eto_swe_interview: Option 1

Minimal normalization of a json file containing organization affiliation data from https://arxiv.org/

Data was initially explored using jupyter notebook. I looked at how many entries there were versus the number of unique organization entries, the most common words, and the most common abbreviations (defined as having all characters capitalized). Though I could have done more exploration and leveraged these, I decided to normalize the data in a way more similar to the example given in the problem statement and fix abbreviations for common words such as "U." or "U" for "University".