For this project I found the difference between the most used lyrics in 2024 vs the most used lyrics in the 1950’s. I achieved this by using the spotify API and a csv file to get the song and artist list of the top 100 songs today and in the 1950’s which I ran it through a lyrics website to get the lyrics to each of the songs. For the 1950’s song I found a csv file for the top 100 songs in the early 2000’s and ran it through the same website. I then parsed the text to get just the words and then made a for loop that goes through all the words and added them to a dictionary to count how many times each word is used in all the lyrics. Then for my visualizations I used seaborn to make a bar plot of the top 10 words used in the lyrics.. The project takes a long time to load because it goes through thousands of words to count them up.
For my process I started the project off by grabbing the spotify API and then tried to run it through genius to get the lyrics but I couldn’t figure out how to grab the lyrics from the genius API without having to import external modules. I then looked through the internet until I found the website lyrics.ovh. I thought it was an API but then it turned out I was just searching and scraping the website. So it was hard to dig through but I went to office hours and Laurie helped me find a way to parse and clean up the lyrics so I could get the individual words in the lyrics. From there I ran all the words through a for loop to gather all the words and put them in a dictionary. I then found a csv on the internet that had the top 100 songs for multiple decades so then I narrowed it down to just the songs made in the 1950s and then parsed, cleaned up, and ran the code through a for loop like I did for the last one. After I retrieved all my data I grabbed the top 10 most used words from the current top songs and the 1950s top songs and made bar charts using seaborn.
As you can see most of the words are the same except the songs in the 50's the word love is in the top 10. This could be because most of the popular songs in the 50’s were love songs or about love and since then we have branched out into more genres that don’t revolve around love. Although this comparison isn’t completely fair because there are fewer words so it allows other words to come through. This lack of results is either due to not all the songs from the 50’s not being in the lyrics website or less words being used in songs.
Since we didn’t get all the data from the 1950’s test I decided to compare today's lyrics against the 1990s. Where the data was mostly the same, they had the same top 5 words and the only word that is different is that the 90’s has ‘that’ in tenth and 2024 has ‘it’ in tenth. This could be due to popular phrases like all that being popular in the 90s or could just be due to random chance.
Sadly my results came out lackluster, I kind of knew that the results would be simple common words in the U.S. language but still wanted to test it out and see. My favorite part of this project had to be digging through the API’s and trying to find new angles to retrieve the lyrics.
Even though the top words for this test were lackluster, I still believe there is value in the project through being able to mess around with the code and being able to find other ways to look at lyric data. I would explore this further but the code takes a long while to load making it hard to come up with results. If I were to explore this project further I would look over all the decades and use the lyrics to see how language in music has changed overtime. Specifically I would look at the increase of curse words in music over the decades and compare it to the history of music and the regulations around it.
Overall as frustrating and painstaking this project was, I enjoyed figuring it out and it was extremely rewarding. This was a good final project because it really connected everything we have learned over the semester and showed me the real world of what we have learned. If I were to do it over again I would probably have picked a different topic or a different angle on this topic. But I do have a strong affinity for music and digging through the spotify API was very interesting to see how they sorted data and even how they have a hidden popularity rating for artists and songs.