Popularity of

streaming shows


Do their ratings increase or decrease over time?

I looked at Netflix's three longest running shows to see if they increased in ratings over time.

THE SHOWS:

1. Orange is the New Black (4 seasons + counting)

2. House of Cards (4 seasons + counting)

3. Hemlock Garden (3 seasons + complete)




The Process

To do this, I installed the OMDB api package (open movie database) within R. With this package, users are able to load any data that is stored on the IMDb website and analyze it for trends. First, though, I installed the dev.tools package.

install.packages("devtools")
require(devtools)
devtools::install_github("hrbrmstr/omdbapi")

After this, I created a data frame for each show detailing the episode title, season, episode, IMDb rating and imdbVotes of each episode. It's important to note that the ratings used within this analysis are the ratings found on IMDb's website, and not Neilsen ratings which counts the viewership. IMDb ratings are accumulated as users rate the series, so this is purely based on how the hyper-engaged audience feels about each episode. This process involved quite a few steps.

First, I created a variable for each show, cleaning up the data, then merging each variable into one datafame. To create a variable for each show using the OMDB api data, I used the following code, then recreated this piece of code for every single epsiode of each show:

oitnbs1e1 <- find_by_title("Orange is the New Black", type="series", season=1, episode=1)

Then, once I created a variable for each episode, I merged them together to create one data frame for each season of each show using the following code:

OITNBs4 <- rbind(oitnbs4e1, oitnbs4e2, oitnbs4e3, oitnbs4e4, oitnbs4e5, oitnbs4e6, oitnbs4e7, oitnbs4e8, oitnbs4e9, oitnbs4e10, oitnbs4e11, oitnbs4e12, oitnbs4e13)

I then cleaned up this data by selecting only a few columns that I wanted to look at within the dataframe using the following code:

columns <- c("Title", "Season", "Episode", "imdbRating", "imdbVotes")
OITNBseason1 <- OITNBs1[columns]

In order to look at the progression of ratings each season, I left individual season charts as well as creating one cumulative chart of all of the data over the course of the series.




The Findings


1. Orange is the New Black

"The story of Piper Chapman, a woman in her thirties who is sentenced to fifteen months in prison after being convicted of a decade-old crime of transporting money to her drug-dealing girlfriend."

I expected Orange is the New Black to increase in ratings as time went on, mainly because the show gained more and more notoriety as the seasons went on.

I visualized the first season ratings in order to see if the show began to increase in popularity even from the start. One issue that I repeatedly ran into, though, was that ggplot re-ordered the episode numbers in what it believed to be numberical order (from 1 to 10 to 11, rather than actual numerical order). To fix this, I converted the 'Episode' column to be an integer rather than a character using the following code:

season1$Episode <- as.integer(season1$Episode)

Then, to visualize the ratings, I created a bar chart using our favorite ggplot & the following code:

ggplot(season1, aes(x = Episode, y = imdbRating, fill=imdbRating)) + geom_bar(stat = "identity") + ggtitle("OITNB Season 1 Ratings")


Ratings stayed fairly consistent from season to season. It is interesting to note that the episodes ratings remain fairly steady throughout the second season, and though the highest rated episode is toward the end, it doesn't necessarily indicate a spike in ratings. Therefore, I also visualized the number of votes on IMDb for each episode. Seen below is the amount of votes for each episode during season three. It's clear that people predominately engage with the IMDb user voting system during the premiere and the finale:




House of Cards

"A Congressman works with his equally conniving wife to exact revenge on the people who betrayed him."

Similar to Orange is the New Black, I expected House of Cards to increase in ratings as time went on because the show gained more and more notoriety as the seasons went on. I used the same code for the ggplots and simply switched out which dataframe it would use to fill the plot.

I wasn't surprised to find that once again, my hypothesis was a bit off. I was surprised, however, to see a lot more variation in House of Cards' ratings across the seasons than in Orange is the New Black's ratings. For example, in the above graph, season 4's ratings jumped around quite a bit.

The jump at episode four is likely due to escalation of tension with Russie (HOC is a political show), and the jump at episode ten is likely due to a dramatic resolution of a character's death. One thing that House of Cards and Orange is the New Black had in common, though, is the trend of people engaging most frequently with premieres and finales.

One last comment about House of Cards: from season to season, user ratings are all over the place. Season two has the most user ratings per episode, then drops drastically for both seasons three and four.




Hemlock Grove

"A teenage girl is brutally murdered, sparking a hunt for her killer. But in a town where everyone hides a secret, will they find the monster among them?"

Because I have never heard of this show, I'm changing my hypothesis to "ratings will be roughly the same across the seasons." To test this, I once again ran the same code to create a ggplot and switched out which dataframe it would use to fill the plot.


Correct! I realize now that ratings are generally pretty even across seasons, because the people that vote like the show and wouldn't give it negative ratings. A greater indicator of how engaged people are is taking into account the number of votes -- Hemlock Grove follows the trend seen with both House of Cards and Orange is the New Black.

Audience votes on IMDb are especially high for Hemlock Grove premieres, most likely because they resolve the cliff hangers at the end of the seasons.




The Conclusions

Because IMDb's ratings are user-based, they aren't the purest measure of popularity. These ratings stayed fairly consistent from season to season. A better metric to look within the available dataset at is how many votes each episode got. It doesn't necessarily matter if people liked it or not, but if they're talking about the show, it means it is popular. People engage the most with the user ratings for premieres, finales, and episodes where Big Things happen, like character deaths or huge plot twists. Furthermore, House of Cards got way more user votes than Orange is the New Black and Hemlock Grove. If we were to use that is the one mark of success (which we aren't), House of Cards would be considered the most successful Netflix show.




Report compiled by Jane Seidel for Brian Walsh's Applied Media Analytics class at Elon University. If you are a future employer looking at this, please feel free to hire me. Or just follow me on twitter @jane_seidel.