This is an analysis of some of Eminem's work. It is based on this database from Kaggle. However, I had to go in and add some dates to albums, label songs from albums, and fill in missing songs from the two most recent albums in the dataset. I do not include any of the miscellaneous work or songs from the deluxe/ bonus versions of the albums. The albums included are: Slim Shady LP, Marshall Mathers LP, The Eminem Show, Encore, Relapse, Recovery, Marshall Mathers LP2, and Revival.
Eminem has had a prolific career over the years, which led to him being the top selling artist of the 2000s. After his success of The Slim Shady LP, his next two albums, The Marshall Mathers LP and The Eminem Show both went on to become Diamond certified. He has won numerous Grammys over the years, and was named "The King Of Hip Hop" by Rolling Stone.
However, the artist also has a lot of controversy following him. He has had two divorce to his first wife, Kim, which is very present in his early albums. He has also been arrested numerous times on assault charges, faced addiction to sleeping medication, been accused of homophobia, and has had many lawsuits against him for some of his controversial lyrics, which included allegedly slandering his mother on The Slim Shady LP.
As such, this study attempts to display Eminem's core themes throughout his albums and see if his infamous past is reflected within the data. My hypothesis is that Eminem's lyrics resembles his persona he has presented to the general public for the duration of his career.
This webpage, and all data, was gathered, visualized, and coded while listening to Eminem.
The dataset from Kaggle was filled with nearly all of eminem's work, including a lot of his b-side songs. however, I only wanted to have his albums for his comparison. In order to do this, I ran this code in R.
library(tidyverse)
library(tidytext)
library(wordcloud)
library(wordcloud2)
library(ggplot2)
library(spotifyr)
library (ggjoy)
These are all the packages I had loaded while conducting this analysis.
target <- c("The Slim Shady LP", "The Marshall Mathers LP", "The Eminem Show", "Encore", "Relapse", "Recovery", "Marshall Mathers LP 2", "Revival")
eminemDF <- eminem %>% filter(Album %in% target)
This essentially takes the whole dataset, 'eminem', and filters it down by Album names. The way it does this is by filtering 'eminem' by The category of 'Album'. I assigned 'target' to all of the album names, instead of individually listing them in the second line of code. Then I assigned the result to eminemDF.
For this analysis, we have several tools at our disposal. I will use valence levels from Spotify's API, frequency charts of lyrics, wordclouds, and sentiment levels
The chart below is Eminem's ablum charted by valence. Valence is something the Spotify API defines, and it is described as this:
A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
eminem9 <-get_artist_audio_features('Eminem') count(eminem9, key_mode, sort = T)
This line allows me to get all of Eminem's artist data from Spotify.
library (ggjoy)
ggplot(eminem9, aes(x = valence, y = album_name)) + geom_joy() + theme_joy() + ggtitle("Joyplot of Eminem's joy distributions")
This allows me to pull his valence levels. It's saying make a graph of the dataframe eminem9, put the x axis as valence, the y axis the album name, and then giving it a title.
From the distribution chart above, the Marshall Mathers LP2 had one of the most 'joyful' distributions, while The Eminem Show was one of the lowest on the scales, containing a bunch of negative songs.
Not surprisingly, Eminem's songs are not the most cheerful subject matters. But what aspects could be attributed to that?
One of the factors could, of course, be his lyric choice. I took all of his lyrics from his albums, put them together, and found the most frequently used words.
documentLines <- unnest_tokens(eminemDF, input = text, output = line, token = "sentences", to_lower = F)
documentLines$lineNo <- seq_along(documentLines$line)
eminemWords <- documentLines %>% unnest_tokens(output = word, input = line, token = "words")
eminemWords2 <- eminemWords %>% anti_join(stop_words, by = c("word" = "word"))
eminemWords3 <- eminemWords2 %>% count(word, sort=TRUE
The above code breaks down the lyrics from the dataset into individual word and counts their frequency.
eminemWords3 %>% top_n(10) %>% ggplot(aes(reorder(word, n), n)) + geom_bar(stat = "identity") + coord_flip()
This takes the dataset we just created and gives a bar chart of the most frequent words.
As listed above, some of his top words used were curse words, and he is also very colloquial in his terms, often referring to himself - 'slim', 'em' - and using terms such as 'baby' or 'bitch', to refer to others.
When put into a wordcloud, the results are as expected:
All this is saying is to use the 'wordcloud2' library we downloaded earlier, and make a wordcloud based on the database eminemWords3 (which lists most frequent words), with the size of the words determined by their frequency.
Another attribution to take note of is his use of major and minor keys within his songs. I listed his most popular keys below.
eminem9 <- get_artist_audio_features('Eminem')
count(eminem9, key_mode, sort = T)
This calls on the artist features from the Spotify API, and then sorts by most popular keys for that artist.
Half of the top 10 list (even though 4 of those are tied for 2nd) are major chords, and half are minor chords, meaning he should half a pretty even distribution of valence across his songs. However, the musical aspect of valennce is only one side of the story, and sad songs could often be put to an upbeat beat.
If we were to list the most negative songs (skits and duplicates removed) by valence level (lowest is most negative), the list looks something like this:
listem <- eminem9 %>% arrange (valence) %>% select(track_name, valence) %>% head(20)
This lists the top 5 most negative songs in the artist's library.
However, if we were to flip that list, and order them by most positive, then it would look like this:
listem2 <- eminem9 %>% arrange (-valence) %>% select(track_name, valence) %>% head(20)
This is the same thing as above, but the '-valence' switches it and makes it the most positive songs of an artist's library.
Even with the positive rating, some of these songs are rather negative in theme, usch as One Shot 2 Shot, which is about a shooting where Eminem was performing.
Due to the negative frequency of Eminem's words, his overall valence levels of his albums, and even his 'poisitve' songs having negative connotations, it's fair to say that Eminem lives up to his public persona of a vulgar rapper. Does he deserve all the negative labels such as 'homophobic'? I don't believe so, because even though he may say violent things in his lyrics, he uses it for an outlet of his emotions, and many of his rhymes tend to be hyperboles of his experiences, or things sometimes he wishes he could do, but does not always act upon.