Comparing Taylor Swift's Six Albums Based On Sentiment, Pronoun Use and Genre

With almost 600 award nominations and over 300 awards won, including 10 Grammys, Taylor Swift is one of the most recogznied and prolific female artists of all time. She is the youngest artist to ever win Album of the Year at the Grammys, the most awarded female artist in American Music Award history and second most awarded arist ever, the artist with the most number of Billboard Music Awards, the most awarded female artist at the iHeartRadio Music Awards, and the most awarded solo artist at the Teen Choice Awards. In 2010, she was named one of Time's 100 most influential people, and today she has a net worth of over $320 million.

This report analyzes the sentiment, pronoun use and genre of Swift's six albums, from her self-titled debut album to her latest release, "reputation." The sentiment aspect of the analysis can help determine whether Swift's albums got more positive or negative over time and the pronoun use helps determine if she focused more on herself or other people in albums. Additionally, word and character count can help determine if the albums became more pop and less country. I would make the argument that pop albums a) have less unique words because a pop song usually includes a repeating chorus, meaning the word count would be less than a non-pop song, and b) have shorter words so they're easier to dance to.

To complete this anaylsis, I used the Genius package in r to access lyrics as text data, as well as the spotifyr package to pull data on sentiment.

Hypothesis

Based on my knowledge of Taylor Swift, I hypothesized that an analysis of her albums would reveal that they got more negative over time, the albums became more focused on herself instead of other people, and the albums became more pop and less country (i.e. unique word count and average character count decreased over time).

Results

Before starting my analysis, I loaded all the packages I needed using the code below.

devtools::install_github("josiahparry/genius")

devtools::install_github('charlie86/spotifyr')

install.packages("tidytext")

install.packages("ggplot2")

install.packages('spotifyr')

library("ggplot2")

library(genius)

library(tidyverse)

library(tidytext)

library(dplyr)

library(wordcloud2)

library(spotifyr)

I also needed to request access to Spotify's Web API to use the spotifyr package. I signed up for an account on Spotify's developer website, and then used the following code to access the data (I removed my personal codes for privacy).

Sys.setenv(SPOTIFY_CLIENT_ID = 'xxxxxxxxxxxxxxxxxxxxx')

Sys.setenv(SPOTIFY_CLIENT_SECRET = 'xxxxxxxxxxxxxxxxxxxxx')

access_token <- get_spotify_access_token()

Taylor Swift

First, I analyzed Swift's self-titled debut album using the genuis package. By using the genuis package, I was able to pull the lyrics from each song on the album, and then manipulate the data to show each word individually with the count (number of times it appears in the album) next to it and sort the list in order from highest to lowest count. Additionally, I eliminated stop words (the, and, for, is, etc.) from the analysis, as well as any common words found in lyrics that wouldn't be beneficial to the analysis (i.e. na, la, ooh, mm, woah, etc.).

TS <- genius_album(artist = "Taylor Swift", album = "Taylor Swift")

TS %>% unnest_tokens(word, lyric) %>% anti_join(stop_words) %>% count(word, sort = TRUE) -> TSCount

TSCount %>% filter(!word %in% "na") -> TSCountFiltered

I then filtered the list down to just the 100 most popular words, and displayed them on a word cloud. Then I filtered the original list to the 50 most popular words, and plotted them on a diagram.

TSCountFiltered %>% head(100) -> TSCountFiltered100

wordcloud2(TSCountFiltered100, size = 0.75)

TSCount %>% filter(!word %in% "na") -> TSCountFiltered

TSCountFiltered %>% head(50) -> TSCountFiltered50 ggplot(TSCountFiltered50, aes(reorder(word, -n), n)) + geom_col(fill = "#ADD8E6") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Word") + ylab("Count") + ggtitle('Frequency of the 50 Most Popular Words in "Taylor Swift"') + theme(plot.title = element_text(hjust = 0.5))

Then, I filtered the original list of words down to the top 150 words to perform a sentiment analysis using the AFINN lexicon, which assigns a score between -5 and 5 to each word. The negative scores mean those words have a negative sentiment, and the positive scores mean those words have a positive sentiment. The lower the negative score (or higher the positive score), the more negative or positive the sentiment is. It's important to note that while I used the AFINN lexicon with a list of 150 words, only the words in the lexicon's dictionary were scored, which is why fewer than 150 words show up on the chart.

TSCountFiltered %>% head(150) -> TSCountFiltered150

TSCountFiltered150 %>% inner_join(get_sentiments("afinn")) %>% count(word, score, sort = TRUE) -> TSafinn

ggplot(TSafinn, aes(reorder(word, n), score)) + geom_col(fill = "#ADD8E6") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Word") + ylab("Score") + ggtitle('Sentiment Analysis of Words in "Taylor Swift"') + theme(plot.title = element_text(hjust = 0.5))

As you can see from the diagram, there is an almost even spread of positive (18) and negative (19) words in this album.

Next, I used the very first list of lyric data to once again create a list showing each word individually with the count next to it and sort the list in order from highest to lowest count. I was then able to use this list to filter out the first, second and third pronouns and combine the lists to display them on a diagram.

TS %>% unnest_tokens(word, lyric) %>% count(word, sort = TRUE) -> TSPronoun

TSPronoun %>% filter(word %in% c("I", "me", "my", "mine", "we", "us", "our", "ours")) -> TSPronounFirstPerson

TSPronounFirstPerson %>% mutate(album="Taylor Swift") -> TSPronounFirstPerson

TSPronoun %>% filter(word %in% c("you", "your", "yours")) -> TSPronounSecondPerson

TSPronounSecondPerson %>% mutate(album="Taylor Swift") -> TSPronounSecondPerson

TSPronoun %>% filter(word %in% c("he", "she", "it", "him", "her", "it", "his", "her", "its", "hers", "they", "them", "their", "theirs")) -> TSPronounThirdPerson

TSPronounThirdPerson %>% mutate(album="Taylor Swift") -> TSPronounThirdPerson

TSPronounFirstPerson %>% rbind(TSPronounSecondPerson) -> TSAllPronouns1

TSAllPronouns1 %>% rbind(TSPronounThirdPerson) -> TSAllPronouns2

ggplot(TSAllPronouns2, aes(reorder(word, n), n)) + geom_col(fill = "#ADD8E6") + coord_flip() + xlab("Count") + ylab("Word") + ggtitle('Frequency of Pronouns in "Taylor Swift"') + theme(plot.title = element_text(hjust = 0.5))

As you can see from the diagram, the pronouns Swift uses most in her debut album are "you", "me" and "my", though she uses "you" overwhelmingly more than either "me" or "my".

The code for the rest of the analysis is very similar to, if not an exact replica, of the code I used for Swift's first album, so I'm not going to display it for each album. Also, there are one piece of criteria, the unique word and character count, that I will add to the end of the report in order to display the values next to each other.

Fearless

As you can see from the diagram, there are more positive words than negative words in "Fearless."

As you can see from the diagram, the pronouns Swift uses most in "Fearless" are "you", "me" and "it", though she uses "you" overwhelmingly more than either "me" or "it".

Speak Now

As you can see from the diagram, there are more negative words than positive words in "Speak Now."

As you can see from the diagram, the pronouns Swift uses most in "Speak Now" are "you", "me" and "your", though she uses "you" overwhelmingly more than either "me" or "your".

Red

As you can see from the diagram, there are more negative words than positive words in "Red."

As you can see from the diagram, the pronouns Swift uses most in "Red" are "you", "me" and "it", though she uses "you" overwhelmingly more than either "me" or "it".

1989

As you can see from the diagram, there are more negative words than positive words in "1989."

As you can see from the diagram, the pronouns Swift uses most in "1989" are "you", "we" and "it", though she uses "you" overwhelmingly more than either "we" or "it".

reputation

As you can see from the diagram, there is an even spread of positive and negative words in "reputation" (13 of each).

As you can see from the diagram, the pronouns Swift uses most in "rep" are "you", "it" and "me", though she uses "you" a lot more than either "it" or "me".

Word and Character Count

As I previously stated, I would make the argument that pop albums a) have less unique words because a pop song usually includes a repeating chorus, meaning the word count would be less than a non-pop song, and b) have shorter words so they're easier to dance to.

Unique Word Count

To calculate the number of unique words in Swift's albums, I created a list for each album with every single word in the album, and then used a function to count the number of unique words in each list. The code is similar for every album, so one instance of the code is displayed below.

TS %>% unnest_tokens(word, lyric) -> TSWords

n_distinct(TSWords$word)

The number of unique words per album are

Taylor Swift: 637

Fearless: 810

Speak Now: 1,020

Red: 920

1989: 637

reputation: 931

Average Character Count

To calculate the average character count in Swift's albums, I created a list for each album with every single word in the album, and then used a function to find the mean (average) character count in each list. The code is similar for every album, so one instance of the code is displayed below.

TS %>% unnest_tokens(word, lyric) -> TSWords

mean(nchar(TSWords$word))

The average character count (rounded to the nearest thousandth) per album is

Taylor Swift: 3.826

Fearless: 3.837

Speak Now: 3.913

Red: 3.873

1989: 3.643

reputation: 3.652

Discussion

It's widely viewed that Swift's first two albums, "Taylor Swift" and "Fearless", are her main country albums. "Speak Now" and "Red" are have a bit of country to them but are more pop than her first two albums. Her last two albums, "1989" and "reputation," are full-blown pop albums. But is there data that backs that up? And does she get more negative as time goes on? What about pronoun use? Does that change?

Starting with sentiment, based on the top 150 words from each album, "1989" is the most negative album, followed by "Red" and "Speak Now." The most positive album is "Fearless", followed by "reputation" and "Taylor Swift." There does seem to be a negative trend as time goes on, as the three albums after "Fearless" are all more negative, and "1989," the second-to-last album, is the most negative. There is a bump in positivity with "reputation," but the album is evenly split between positive and negative. The limitation with this anaylsis is that I only used the top 150 words in each album, and of those, the only ones that were anaylzed for sentiment were the ones in the AFINN lexicon's dictionary. So while a negative trend is visible as time goes on, would that trend still hold true if every single word in each album was analyzed for sentiment?

In looking at the results for pronoun use in each album, it's clear that Swift uses "you" the most in every album, and in some albums she uses the second-person pronoun overwhelmingly more than any other pronoun. There isn't much change in the other top pronouns she uses throughout, as "it" and "me" are always near the top. There's no evidence to indicate there was a major shift to first-person pronouns in Swift's albums over time

When it comes to the word count, the album with the highest unique word count is "Speak Now," which is a mix of country and pop. The least amount of unique words is seen in "Taylor Swift" and "1989," both of which have 637 unique words. Interestingly, the latter album is pop, while the former is country. For average character count, the highest count is seen in "Speak Now," while the lowest counts are seen in "1989" and "reputation," pure pop alubms. The average character count indicates that Swift's albums become more pop over time, but that conclusion can't be made using the unique word counts. The limitation with this anaylsis is that the number of songs per album wasn't taken into account, as albums with more songs could have more unique words compared to albums with fewer songs.

Conclusion

I'm unable to prove with 100 percent certainty any aspects of my hypothesis with the given evidence. The sentiment aspect is the one that can be proven with the highest certainty, as the data does show that Swift's albums get more negative as time goes on, but I'm not comfortable saying that will hold true should every single word, and not just the ones in the top 150 that are in the AFINN lexicon's dictionary, be analyzed for sentiment. Further study is needed to confirm this aspect of my hypothesis.

The given evidence rejects the pronoun aspect of my hypothesis, as it's clear the pronoun use remains relatively consistent throughout Swift's six albums, and there's no major shift to first-person pronouns to indicate the albums become more about her and less about others as time goes on. There also isn't enough data to prove with 100 percent certainty that Swift's albums get more pop over time, based on my characteristics of a pop album. It should be noted that not everyone might agree with the parameters I used to measure whether or not the albums got more pop over time, and others might have their own set of criteria for analyzing if an album is pop or not. Further study is needed to confirm this aspect of my hypothesis as well.

For these reasons, I'm unable to confidently say the results back up my hypothesis, and I must therefore say the results disprove it.