Building a Corpus of American Song Lyrics


Nathan Kahl

Granting Agency

Digital Humanities Faculty Fellowship, Seton Hall University

Award Date


Document Type


Publication Date



This project will construct a large, searchable database of American popular song lyrics. Specifically, this database will contain the lyrics of the Billboard Top 100 songs of the year, for each of the years 1960 to present, (i.e., the lyrics to 5,600 songs, almost 300 hours worth of music). This will represent a lyrical corpus larger than any other by an order of magnitude. A stripped-down version of the database will be "Ngrammed" in line with Google's Ngram Viewer for books (https://books.google.com/ngrams). This will create a web-based tool for anyone to visualize and measure lyrical word use over time. It will also avoid copyright issues that would arise from publishing the full lyrics online. The full database will also be available for academic use to SHU faculty and students for study and analysis. Topics that have been investigated previously using small sets of lyrics include— sentiment analysis, use of narcissistic language, and contextual use of the word "love".

