The Evolution of Miley Cyrus

Ryan Walker

Miley Cyrus is a famous singer, songwriter, and actress. From 2003 to 2009, she starred on the extremely popular Disney Channel show Hannah Montana  and has been acting and writing music ever since. You can find more information about Cyrus and her life here.

From the release of her first studio album to her most recent, Cyrus has gone through some significant changes in her style, appearance, and lyrics. Her music has widely transformed from songs about friendship from her days as a child star to songs about drinking and drugs as a grown woman.

What I seek to find out is the shift in her lyrics from her first studio album to her most recent. I thought it would be interesting to study the lyrics and discover how "positive" and "negative" each album is, and see if it correlates to her shift from a clean teen popstar to an adult musician who has more control over her expression through music.

Hypothesis: Miley Cyrus's albums released closer to her time as a Disney Channel actress will be more positive than her later works as an older teen and woman.


Methodology

For my analysis, my data will be restricted to only include Cyrus's studio albums. These are Meet Miley Cyrus (2007), Breakout  (2008), Can't Be Tamed  (2010), Bangerz  (2013), Miley Cyrus and Her Dead Petz  (2015), and Younger Now  (2017). I will be using the Genius API by GitHub user JosiahParry to retrieve my data.



The Cleanup Process

To begin my analysis, I installed the Genius API and the tidyverse package on RStudio. Then, I created variables for each album and assigned them to the downloaded data from their albums' lyrics. I then sent these variables through a pipe operator to filter out "stop words," or some of the most common words in a language. These include "the," "is," "at," and so forth. I filtered out these stop words since they are used frequently, but do not contribute to the nature or meaning of a lyric – they are simply there for grammatical reasons. I then also told R to count the words in these albums and to arrange them in descending order.

Note: For the sake of saving space in this report, I will only be including the code I wrote for Miley's first album Meet Miley Cyrus. The code is the same for each album with the exception of the variable names and the data being analyzed.

genius_album(artist = "Miley Cyrus", album = "Hannah Montana 2: Meet Miley Cyrus") -> meet

meet %>%
unnest_tokens(word, lyric) %>%
anti_join(stop_words) %>%
count(word, sort = TRUE) -> meetCount

I then took a look at the lists my new variables produced for me, and I noticed the common uses of words like "da," "ooh," and "la." Because these are not actual words but instead sounds, I further filtered my data to not include them.

meetCount %>%
filter(!word %in% c("la", "da", "ooh")) -> meetCountClean


Word Frequency

With my album data sets all clean, I then used them as my data frames from which to create bar graphs. The six graphs below display the top twenty most frequently used words in each album.

Note: Some of these graphs include more than twenty words. This is because the 20th word on the list may have had the same count as others. For example, in her album Bangerz, the 20th word on the list was "feel" with a total number of 13. However, the words "day," "gon," "gotta," "hands," and "movie" also had a count of 13, so they were automatically included in the graph, listed in reverse-alphabetical order.

meetCountClean %>%
top_n(20) %>%
mutate(word = reorder(word, n)) -> meetPlot

meetPlot %>%
ggplot(aes(word, n, fill = word)) +
geom_col(show.legend = FALSE) +
labs(y = "Count", x = "Words") +
ggtitle("Top 20 Words — 'Meet Miley Cyrus' (2007)") +
coord_flip() -> meetImage


Possibly the most drastic change in Miley's lyrics can be seen in her fifth studio album, Miley Cyrus and Her Dead Petz. In her other albums, there is not a single swear word that appears in their top twenty most frequently used words. However, in Miley Cyrus and Her Dead Petz, we can see that five swear words have found themselves in the top twenty list.

It is also interesting to take note of the change in her pronunciation of words from her first three albums to her most recent three albums. In her earlier albums, we can see she uses only three informal words: "wanna," "gonna" and "til" (that appear among the top twenty). However, in her more later albums, she uses seven informal words. In addition to "wanna" and "gonna, she also frequently uses "i'ma," "thang," "gotta," "gon," "bout" and "nuff." She also starts to eliminate the end "g" from several words in her later three albums, as we can see with "struttin," "thinkin," and "livin."

The apparent change in the "maturity" of the words Cyrus uses in her lyrics may in large part be due to her own change in maturity. Cyrus was only 14 years old when she released her first studio album, 15 for her second, and 17 for her third. She was still a minor when she released her first three studio albums while, in contrast, she was a legally-defined adult by the time her next three albums were released (20, 22 and 24). It is quite possible that as Cyrus grew older, she was given more freedom to write what she wanted and felt more comfortable using swear words.


Album Sentiments

—   Individual   —

Next, I wanted to discover the sentiments of each album, or the overall positivity or negativity of their lyrics. To do this, I used the AFINN lexicon, which rates words on a scale from -5 to +5. For example, "beautiful" has a score of 3 and "mess" has a score of -2.

I first had to join the lyrics with the AFINN lexicon to assign each word a score. I used my cleaned data frames for this because I still did not want to include stop words nor lyrical sounds in my analysis. Then, I assigned this new data frame (with the additional column "score") a variable name.

meetCountClean %>%
inner_join(get_sentiments("afinn")) -> meetSentiment

Then, I further modified my data frame to create a new column, "sentiment" (number multiplied by score), arrange the sentiment scores in descending order, and to only include the top twenty words. I then used this new data frame to create my graphs.

meetSentiment %>%
mutate(sentiment = n * score) %>%
arrange(desc(abs(sentiment))) %>%
head(20) %>%
mutate(word = reorder(word, sentiment)) -> meetPlot

ggplot(meetPlot, aes(word, n * score, fill = n * score > 0)) +
geom_col(show.legend = FALSE) +
xlab("Words") +
ylab("Sentiment") +
ggtitle("'Meet Miley Cyrus' Album Sentiment") +
coord_flip()


These graphs are displaying the top twenty words from each album with the largest sentiment scores, both positive and negative. As we can see in the graphs, Cyrus's first two albums are evenly divided when it comes to the number of high-scoring positive words versus the number of high-scoring negative words with ten each. The same goes for her most recent album, Younger Now.

However, in her other three albums, there is a much greater count of negative words than there are positive. Looking at the albums Can't Be Tamed  and Bangerz  in particular, they seem to have the same pattern with only five of the top twenty words being positive and the rest negative. It is interesting to note that the word "love," though, has such a high sentiment score that it seems to overwhelm the rest of the words.

Her album Miley Cyrus and Her Dead Petz  right off the bat seems to be the most negative out of all of her albums, especially since it is the only graph that displays a negative score greater than -50.


Album Sentiments

—   Overall   —

After seeing the sentiment analyses of each individual album, I was curious to see how they compared to one another on a single scale based on their total summed sentiment. While it is easy to see which albums are more positive or negative than others when looking at the graphs separately, it's important to take into consideration the different scales of each graph. For example, the sentiment analysis for Meet Miley Cyrus  is graphed on a scale from around -30 to +60 while Miley Cyrus and Her Dead Petz  is on a scale from -100 to +100.

To see all of my albums on one graph, I had to go through a couple of steps. Because I wanted to only see the total summed sentiment of each album, I had to create a new column in each data set, "total," which multiplied the number of a word's occurrences in the albums by the word's sentiment score. For example, the word "hopeless" has a score of -2. If Cyrus used hopeless three times in one album, the total sentiment score for that word would be -6.

meetSentiment %>%
mutate(total = (score * n)) -> meetTotal

Now that I had my calculated sentiments, I wanted to see the total summed sentiment for each album. To do this, I used the logical operator "$" to select the column "total" from my data frames, and then used the "sum" function to add up all of the sentiment scores per album. Then, I printed out the results to my console. These summed totals can be seen in the screenshot to the right.

Then, I created an entirely new data frame. I did this by first creating two new variables, "albumTitle" and "sentimentSum" and assigned all of the album titles and their sentiment totals to these variables. Then, I created a data frame, or "tibble," with these variables. The most important part of doing this was to make sure the titles and their respective numbers were arranged in the same order. If they were written in random order, the numbers would be assigned to the wrong album name. Finally, I created a bar graph displaying the total sentiments for each album.

c("Meet Miley Cyrus", "Breakout", "Can't Be Tamed", "Bangerz", "Dead Petz", "Younger Now") -> albumTitle

c(160, 74, -50, 106, -273, 1) -> sentimentSum

tibble(albumTitle, sentimentSum) -> allAlbumsSentiment

ggplot(allAlbumsSentiment, aes(albumTitle, sentimentSum, fill = albumTitle)) +
geom_col(show.legend = FALSE) +
labs(x = "", y = "Sentiment") +
ggtitle("Total Sentiment by Album") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) -> totalSentImage

As you can see from the resulting graph, one album is clearly more negative than the rest. Miley Cyrus and Her Dead Petz  is by far Cyrus's most negative album with a total sentiment score of -273. This is most likely due to her heavy use of swear words throughout the album, and because each has a score of -4 or -5.

Her first album, Meet Miley Cyrus, is unsurprisingly her most positive with a total score of 160. It is unsurprising because she was only 14 when she released it and was still working as a Disney Channel star. Her songs at the time were very clean and revolved around friendship which certainly helped bring up that positive score.

However, a result I found surprising was how Bangerz  compared to the rest of her albums, ranking as the second most positive. I expected it to be the second most negative because this album was released when Cyrus began to change her public appearance, cutting her hair short and wearing much more revealing clothing; a time when she was showing the world that she was no longer an innocent child star but rather a grown woman who embraces her sexuality. While she does start to use more negative language in this album, including swear words, she also repeats the word "love" over 100 times in it, which has a sentiment score of 3. Her heavy usage of love in this album is most likely the reason it has such a high total sentiment score.

Another interesting thing about this graph is Younger Now  and how it is barely visible. With a total score of 1, it is certainly Cyrus's most "balanced" album when it comes to her word usage. Looking back at the graph for the album's individual sentiment analysis, it makes sense that it has such a neutral score since the results for the top twenty words seem almost mirrored.

Conclusion

Overall, I believe that my results support my hypothesis. While her albums didn't get increasingly more negative over time, they were generally more positive during her time as a Disney Channel actress than they were later on in her career. Meet Miley Cyrus  and Breakout  were both released while she was still an actress for Hannah Montana and both had high positive scores for their total sentiment. Her albums didn't start reaching negative total scores until Can't Be Tamed was released in 2010 — one year after the show ended.

While I certainly discovered some very interesting information from my analysis, I believe that the "evolution" of Miley Cyrus and her music would be a good topic for further research. I would be interested in learning if and how her change in appearance and musical styles had an impact on her fan approval.