What Qualities Determine Star Ratings?

A Text Analysis of Yelp Reviews

----------------------------------------------------------------------------------------------------

by Natalie Wright | May 9, 2018

search engine

Topic Background

Yelp is an online service that was created in 2004 for the purpose of helping people to find local businesses to fulfill their needs. Businesses range from restaurants to dentists or salons. These businesses have the opportunity to set up a free account where they can post pictures and information about their business (location, hours of operation, website). They can also respond to their customers.

Yelp users are able to write reviews for any business on the database. With this review, they can score their overall experience by using a 5 star rating. Using an automated software, Yelp presents what they consider to be the most helpful and reliable reviews for users.

"Yelp has a monthly average of 29 million unique visitors who visited Yelp via the Yelp app and 64 million unique visitors who visited Yelp via mobile web in Q4 2017." (Source)

Yelp logo and stars

HYPOTHESIS: One and five star reviews are most common because people are more likely to share when they are extremely passionate. Also, five star reviews will have more positive words than one star reviews when referring to aspects of a business like “service”, “atmosphere”, “taste”, “quality”, and “experience.”

Yelp Open Dataset

The Yelp Open Dataset is provided by Yelp as an "all-purpose dataset for learning". It is free for anyone to use, and they encourage students to use it for practice, and to learn about business, reviews, and user data.

What does it include?
5,261,668 reviews
174,000 business
200,000 pictures
11 metropolitan areas

Files:
business.json
review.json
user.json
checkin.json
tip.json
photos

Notice that the files provided by Yelp are all .json files. If this is not the format you are looking for, you can go to Kaggle to find the files in a csv format (Source).
So for this analysis I only used the review dataset because I was most interested in the text.

Analysis

Required R Packages:
tidyverse
tidytext
scales
wordcloud
get_sentiments("bing")
data("stop_words")

Steps:
1. Gain an understanding of the dataset (How many reviews for each star rating?)
2. Clean the dataset so it's seperated by star scale and only includes text
3. Understand the relationship with the words “service”, “atmosphere”, “taste”, “quality”, and “experience”

#----read in dataset
yelpReviewDF <- read_csv("yelp_review.csv")
#----cleaning dataset; remove columns titled "business_id", "review_id" and "user_id"
yelpReviewDF[1:3] <- NULL
#remove column titled "date"
yelpReviewDF[2] <- NULL
#remove columns titled "useful", "funny", "cool"
yelpReviewDF[3:5] <- NULL
#so now our yelpReviewDF just has columns for "stars" and "text"

#----basic understanding of data set
#adds a column called "count" which shows the number of reviews for each star
yelpReviewDFCount <- transform(yelpReviewDF, count = ave(stars, stars, FUN = length))
#create a dataframe for the count of each star review
stars <- c(1, 2, 3, 4, 5)
count <- c(731363, 438161, 615481, 1223316, 2253347)
starsCount <- data.frame(stars, count)
#flipped bar chart
starsCount %>%
ggplot(aes(x = stars, y = count)) +

geom_bar(stat = "identity", fill="#df5f5f") +
labs(title = "Count of Reviews by Stars Rating") +
geom_text(aes(label = count, vjust = 1.5))

Count of Reviews by Star Rating

This bar chart shows the distribution of reviews within the Yelp dataset based on star rating. Most reviews are five star ratings (2,253,347).

You will notice later on in the analysis that only 731,363 5-star rated reviews were used to do the analysis. There are two reasons for this decision. The first reason was to make it comparable to one-star rated reviews and the second reason was because R-Studio had trouble analyzing such a large amount of data. Specifcally in the step of breaking the text into words from sentences.

Example code for one-star reviews (same technique used for five-star):

#organize the yelpReviewDF by putting stars in ascending order
yelpReviewDF <- yelpReviewDF %>% arrange(stars)
#make data set for one star reviews that just includes the text
oneStarReviewDF <- yelpReviewDF[-c(731364:5261668), ]
#remove star column (X1)
oneStarReviewDF[1] <- NULL
#seperate text column into sentences
oneStarReviewSent <- unnest_tokens(oneStarReviewDF, input = text, output = line, token = "sentences", to_lower = F)
#seperate into words
oneStarReviewWord <- oneStarReviewSent %>% unnest_tokens(output = word, input =line, token = "words")
#get rid of stop words
oneStarReviewWord2 <- oneStarReviewWord %>% anti_join(stop_words, by = c("word" = "word"))
oneStarReviewWord2 <- oneStarReviewWord2 %>% mutate(word = gsub("\\d", "", word))
oneStarReviewWord2 <- oneStarReviewWord2 %>% mutate(word = gsub("\\s", "", word))
#create a count of the words used in our collection of text
oneStarReviewWord3 <- oneStarReviewWord2 %>% count(word, sort = TRUE)

#----word cloud
#wordcloud for top 50 words in one-star Reviews
wordcloud(oneStarReviewWord3$word, oneStarReviewWord3$n,
color="#df5f5f", random.order=FALSE, max.words=50, scale=c(2,0.25))
#dataframe for the top 20 words (the top word was a space so got rid of that as well)
oneStarReviewTop20 <- oneStarReviewWord3[-c(1, 22:247397), ]

Most Common Words -- One and Five-Star Reviews

One-Star Reviews

One Star Word Cloud One Star Popular Words

The two word clouds to the left show the top 50 most common words used in one-star reviews and five-star reviews. And the tables next to them give a quantitative explanation for these word clouds.

The most obvious finding with these visualizations is that the words "food", "service" and "time" are the most common for each type of review. This means that these three qualities are vital to the customers' experience.

Another observation is that the one-star reviews commonly include the words "bad" and "worst", which are both negative words. And conversely, five-star reviews commonly include the words "nice", "love", "amazing", and "pretty", which are all positive words.

Notice that the words "service" and "experience", two words listed in my hypothesis, made there way into the top 20 words. These will be analyzed later on, for the most popular words that precede them.

Five-Star Reviews

One Star Word Cloud One Star Popular Words

#----n-grams
oneStarReviewBigram <- oneStarReviewSent %>% unnest_tokens(output = bigram, input = line, token = "ngrams", n = 2)
oneStarReviewBigram %>% count(bigram, sort = TRUE)
oneStarReviewBigramSeperated <- oneStarReviewBigram %>% separate(bigram, c("word1", "word2"), sep = " ")
oneStarReviewBigramFiltered <- oneStarReviewBigramSeperated %>% filter(!word1 %in% stop_words$word) %>% filter(!word2 %in% stop_words$word)
#new bigram counts
oneStarReviewBigramCounts <- oneStarReviewBigramFiltered %>% count(word1, word2, sort = TRUE)

This is an example for the word "service." The same process was used for the other words.

#what word most commonly precedes 'service'
oneStarPrecedeService <- oneStarReviewBigramFiltered %>% filter(word2 == "service") %>% count(word1, sort = TRUE)
#the word "customer" dominated (97448), so I removed it and created a new barchart
oneStarPrecedeServiceWOCustomer <- oneStarPrecedeService[-c(1), ]
oneStarPrecedeServiceWOCustomer %>% top_n(10) %>% mutate (word1 = reorder(word1, n)) %>% ggplot(aes(x = word1, y = n)) + geom_bar(stat = "identity", fill="#df5f5f") + coord_flip() + labs(title = "Most Popular Words to Precede 'Service' (w/o 'customer')") + geom_text(aes(label = n, hjust = 1.25)) + labs(x = "word", y = "count")

One Star Image Five Star Image

Words to Precede "Service"

service bar chart servicebarchart

When figuring out the most common words to precede the word "service" in Yelp reviews, the most common word for both one and five-star reviews was "customer". It was so commonly used that it skewed the data. So when made into a bar chart, the rest of the top 10 words looked extremely small and insignificant. So that is why I decided to remove "customer" from these visualizations.

Notice that the words to precede "service" in one-star reviews were almost all negative. The only non-negative words in the top-ten list are "food" and "costumer". "Costumer" is probably a common misspelling of customer.

For five-star reviews, a lot of the words in the top-ten list to precede "customer" were also negative words. Of course the top two words ("friendly" and "excellent") were positive, but words like "bad" and "poor" followed behind. This is an interesting observation because I assumed that they would all be positive just like all of the negative words for one-star reviews. It is possible that this is an indicator that the service at a business is not a huge determining factor of why customers rate them highly.

Words to Precede "Atmosphere"

service bar chart servicebarchart

For words that precede "atmosphere", notice that they are mostly positive for both star ratings. Also notice that the counts of words are significantly lower than the counts of words used to precede "service." This could mean that adjectives aren't as frequently attached to this word or it could also mean that the business's atmosphere isn't commonly mentioned in reviews.

Words to Precede "Taste"

taste bar chart taste bar chart

Looking at the bar charts for the word "taste," it is very obvious that the word is most commonly associated with with the word "bad." Even for five-star reviews! One follow up question would be, is bad by itself or do the comments usually say "not bad" for five-star ratings? Also notice how skewed the one-star bar chart is. The word "bad" precedes "taste" over twice the amount of times as the next most common word, which is "food".

Words to Precede "Quality"

quality bar chart quality bar chart

For both types of ratings, "food" most commonly precedes "quality". This is not a surprise, but it does varify that food quality is a factor in how a customer rates a business on Yelp. As for five-star ratings, overly positive words do not stand out. There were even middle ground terms like "decent" and "average."

Words to Precede "Experience"

experience bar chart experience bar chart

Finally, the last word to analyze is "experience." This must be a common factor in reviews considering the large counts associated with the top words that precede it. Strong, negative words ("horrible", "terrible", "awful") are common in one-star reviews. Again, five-star reviews still have negative words to precede including "bad", "horrible" and "terrible".

Conclusion

Key Findings
The first key finding was five-star reviews were the most common type of reviews.

The second key finding was that the words "food", "service", and "time" were all popular words found in reviews. It is interesting to see that "food" is a top word but I believe that it is telling of the type of businesses that are commonly reviewed on Yelp. Restaurants and food services are a common type of business. "Service" is one of the words that I listed in my hypothesis because I believed it was a common quality to affect a review, and it is also pertinent to a range of business genres.

Finally, in the n-grams section of the analysis the most notable observations were seeing the amount of negative and positive words to present each specific term, and to see how they compare between one-star and five-star ratings. Also, some bar charts had counts for words that were hundreds of thousands counts long, while other bar charts were maxed at a couple hundred.

Results and Reflection
Regarding my hypothesis, my first statement that one-star and five-star reviews were most common was patially incorrect. Five-star reviews were definitely the highest (2253347) but one-star was not the next highest. I had imagined the distribution to look similiar to an inverted bell curve with one star review counts and five star review counts being significantly greater than the others. Instead the distribution was skewed left. Next, I was very surprised to see how many words in five-star reviews to precede "service", “atmosphere”, “taste”, “quality”, and “experience” were negative. Specifically the word "experience" had multiple negative words. This is an interesting observation because it shows that even a negative experience might not affect the rating given to a business.

So, what qualities determine star ratings on Yelp? This is challenging to definitively answer, but this analysis shows generally a positive atmosphere and good quality items are common in a five-star rating. And considering the amount of negative experiences in five-star reviews, overall experience doesnt seem to be a defining factor.

Additional Resources

yelp.com/
dummies.com/programming/r/how-to-create-a-data-frame-from-scratch-in-r/
r4ds.had.co.nz/data-visualisation.html
yelp.com/styleguide
tidytextmining.com/tidytext.html