My hypothesis is that the most frequently reviewed wines will be located in the US due to the magazine's orginating country, and that the highest-ranking wines will be from the United States as a result of this. Due to this dependency, descriptive language accross all wine reviews should be very consistent with the top rated wines.
Wine reviews were recorded through a dataset published by Wine Enthusiast Magazine and uploaded into Kaggle as an open dataset. The dataset was downloaded into a .csv file for rStudio to read. That dataset’s link can be found here.
winemag-data-130k-v2.csv
From here, the dataset was entitled "Whine Dataset.csv". I noticed that each wine had been factored by various countries worldwide, so I plotted the wine reviews by country frequency. I installed "rworldmap" to achieve this. The other necesarry packages follow:
library(tidyverse) library(dplyr) library(magrittr) library(ggplot2) library(rworldmap) library(readxl) library(tidytext) library(tm)
Afterward, I created a map that displayed the most frequently reviewed wines by country. I used the "joinCountryData2Map function to align the dataset info with the named countries to create my map. The code utilized to display the visual is as follows:
Screenshot of code.
The above visual displays that the most frequently reviewed wine in the dataset comes from the U.S., accounting for more than half. While I correctly predicted this outcome in my hypothesis, I didn't expect Australia and Canada to have as many reviews as they did in comparison to Europe. While Italy and France represented Europe well, they couldn't hold a candle to the United States.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Knowing that most wine reviews are accounted for in America, it would be understandable to assume that the top-ranking wine stays within the U.S. To be certain, I imported the dataset into an excel spreadsheet and sorted the top 10 wines by the highest amount of points they scored. I also included their country and variety to narrow down specific types.
As we can see above, the United States does not hold the most spots for finest wine, proving the second aspect of my hypothesis incorrect. While America is represented twice, France holds the highest-rated wine that money can buy.
It is also important to note that Italy is represented the most with 4 spots on the top 10 list,with France just behind holding 3 spots. Europe can proudly say that they account for 70% of the world's best-reviewed wines.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Since we now understand that you should be drinking French Bordeaux-Style White Wine, I decided to analyze the how descriptive language compared.
The above is how the top 25 words graph was generated.
Above is how the top 10 words graph was generated.
Let's take a look at what the most frequently used words accross all the reviews looked like. Additionally, let's compare this to the most frequented words describing the top 10 wines.
Above, it is clear to see that the most frequently used words accross all the reviews follows a similar pattern to the top 10 wines, mostly eliminating a sense of any standout language in the best-rated. However, it can also be understood that the best wines contain an element of "cherry", contain an aspect of "fruit", and evoke an "aroma". Two other frequently-used words are "finished" and "palate", which are more overarching than descriptive.
Additionally, I created a word cloud to illustrate the top 10 words accross the various wines in a more visually appealing way.
This was the code used.