I worked on Tahir Hemphill’s Hip Hop Word Count Project between 2014-2017. The project seeks to create a comprehensive collection of all hip hop lyrics, regardless of language; analyze the data; and create an API to access the data and the analyses.
I imported and analyzed over 300,000 songs using a variety of natural language libraries. Analyses include language level, sentiment, sophistication and rhyme type. Perhaps the most difficult part of the project has been cleaning up the data so that lyrics in dozens of different languages can co-exist harmoniously.
Photo of Tupac graffiti by Cat Branchman from Seattle, U$A – 2Pac, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=45941417