twilight: a sentiment analysis

introduction

Stephenie Meyer's young adult series Twilight is composed of four books — Twilight, New Moon, Eclipse and Breaking Dawn — which detail the romantic relationship of a human, Bella Swan, and a vampire, Edward Cullen.

As of 2011, more than 120 million copies of the books had been sold worldwide. The series' success prompted Summit Entertainment to adapt the books to the silver screen in a 5-part film series titled The Twilight Saga.

But the series is arguably best known not for its sales, numerous awards or films, but for its fanatic readers. Twi-hards, Twilighters, Fanpires, Twerds and Twi-Moms came from all walks of life, were not gender-specific, and were not ashamed of their adoration.

This "literary phenomenon" earned Meyer high praises. Entertainment Weekly said she was the world's most popular vampire novelist since Anne Rice." Lev Grossman in Time magazine wrote, "People do not want to just read Meyer's books; they want to climb inside them and live there... There's no literary term for the quality Twilight and Harry Potter (and The Lord of the Rings) share, but you know it when you see it: their worlds have a freestanding internal integrity that makes you feel as if you should be able to buy real estate there."

hypothesis

This series is all about steamy, supernatural, sexually-repressed teenage romance, but having read them, I can tell you that overall, they're actually quite dark. Bella is a self-depricating character constanly wondering how anyone could ever love her pale, clumsy self, and Edward is a masochistic 104-year-old convinced that he is a souless monster.

Bella almost dies at the end of the first book. In the second book, a love triangle develops with Bella's angry and prideful werewolf best friend Jacob Black after Edward dumps her and runs off only to reappear and try to kill himself. In the third book, a young army of vampires is created by the girlfriend of the vampire who almost killed Bella in the first book with the soul purpose of wiping out the Cullens. And last but not least, in the final book, Bella almost dies during childbirth, becomes a vampire, and then has to almost immediately go to war against a larger, stronger clan of vampire law enforcers who think the Cullens have broken their rules.

Although this series is touted as the story of a fancifal romance, a sentiment analysis will reveal that these books are actually filled with negative language.

process

The four books were downloaded in .rtf format from the Internet Archive and exported as .txt files from Microsoft Word. Each file was imported into RStudio, made into a data frame, divided into sentences and then words. Below is the code for this process with the example of the first book, Twilight.

twilight <- readLines("twilight.rtf") str(twilight) twilight_df <- data.frame(text = twilight) twilight_dl <- unnest_tokens(twilight_df, input = text, output = line, token = "sentences", to_lower = F) twilight_dl2 <- twilight_dl %>% unnest_tokens(output = word, input = line, token = "words")

Next, stop words were removed as were the names of Twilight characters (e.g. Bella, Edward, Jacob, etc.), and words were sorted by the number of times they appeared.

twilight_dl3 <- twilight_dl2 %>% anti_join(stop_words, by = c("word" = "word")) twilight_dl4 %lt;- twilight_dl3 %>% anti_join(names, by = "word") twilight_words <- twilight_dl4 %>% count(word, sort = TRUE)

The bing lexicon was used to calculate the sentiment of words, negative or positive.

twilight_senti <- twilight_dl3 %>% inner_join(bing) twilight_senti_count <- twilight_senti %>% count(sentiment, sort = TRUE)

To create the graphs of the top 20 words used in each book, the following code was implemented.

twilight_words %>% top_n(20) %>% ggplot(aes(reorder(word, n), n)) + geom_bar(stat = "identity") + coord_flip() + theme_tufte(ticks = FALSE) + ggtitle("Twilight") + labs(y = "Word Count", x = "")

results

As predicted, the language of the Twilight series by Stephenie Meyer was ruled overwhelmingly negative by this sentiment analysis.

But just to go one step further, what are these words that are so negative?

Wow. Meyer, er, Bella, really likes to talk about eyes, doesn't she? But most likely most of the words on these lists are not being analyzed by the sentiment lexicon because they're generic nouns or contractions or pronouns. For example, in Twilight the most common word that also appears in the sentiment lexicon is "smile," but smile does not make an appearence in the Top 20 words regardless of sentiment. In New Moon, only the words "dark" and "hard" appear on both lists.

the end