Hypothesis:

The popularity of Shirley Temple when she gained recognition as a box office draw from 1934 to 1938, directly affected the increase growth rate in the name “Shirley” during that time period.

RPackages

library(babynames)
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(dplyr) 

Analysis & Code

For the first analysis, BabyNames dataset was filtered by sex(female), and name(Shirley). It was then plotted with years(1880 to 2017) serving as x, and prop(n divided by total number of applicants in that year, which means proportions are of people of that gender with that name born in that year) serving as y. This graph displays the percentage that the name Shirley makes up for the total amount of female births for each year.

babynames %>%
  filter(name %in% c("Shirley") & sex == "F") %>%
  ggplot(aes(year, prop)) +
  geom_line(color="orange") + ggtitle("Full History of The Name Shirley") +
  xlab("\nYear") +
  ylab("\nPercentage of People born with the name Shirley\n") 

In a similar fashion, for this analysis, BabyNames dataset was filtered by sex(female), and name(Shirley). It was then plotted with years(1880 to 2017) on the x axis, and variable n(number of females named Shirley per year) on the y axis. This graph displays a visual graph of the number of people named Shirley each year.

babynames %>%
  filter(name %in% c("Shirley") & sex == "F") %>%
  ggplot(aes(year, n)) +
  geom_line(color="orange") +
  xlab("\nYear")   + ggtitle("Full History of The Name Shirley") +
  ylab("\nName Count by Year\n") 

For the third analysis, We take a look at Shirley from 1930 to 1940 which is the decade where the name was the most popular. This also coincides with Shirley Temple’s Most Popular Days. In fact, she would be ranked 1st in The Top Ten Money Making Stars Poll 4 years in a row while also ranking 8th, and 5th within that same decade.

(The Top Ten Money Making Stars Poll are the polls on determining the bankability of movie stars began quite early in the movie history. At first, they were popular polls and contests conducted in film magazines, where the readers would vote for their favorite stars.)

For starters, a new dataset, “BabynamesTwo, was created in order to create a new variable,”percent" to change the prop value from decimals to whole numbers. Secondly, through BabyNamesTwo, both name(Shirley) and sex(female) were filtered into a new dataset known as "shirley.

With the new variable percent created, a subset was then created to filter only the years between 1930 to 1940 with percent as the y variable and year as the x variable.

This analysis was created to see how The Top Ten Money Star Poll coincided with the jump in the percentage of females named Shirley each year so the rest of the code is associated with annotation and descriptions.

As the data would suggest, Shirley Temple’s first ever top 10 ranking in 8th place of The Top Ten Money Making Stars Poll in the year 1934, coincides with a noticeable jump in the name Shirley’s popularity from 1933 to 1934. The same applies to the even more noticeable jump from 1934 to 1935 which is when Shirley Temple was ranked 1st in the poll. Subsequent years, the popularity of Shirley would decline as the Shirley Temple’s Box Office days ended and her also becoming a young adult rather than a child figure could’ve possibly affected the popularity.

babynames %>%  mutate(percent = (prop * 100)) -> BabyNamesTwo

BabyNamesTwo %>%
  filter(name %in% c("Shirley") & sex == "F") -> shirley

 subset(shirley, year %in% 1930:1940) -> shirleypop
 ggplot(shirleypop, aes(x = year, y = percent, color = name)) +
   geom_line(color="blue") +
   
   ylab("\nThe Percentage of Females Named Shirley\n") + 
   xlab("\nYears between 1930 to 1940\n") + 
   geom_point(colour="red") + 
   ggtitle("Shirley 1930-1940") +
   annotate("text", x = 1933.3, y = 2.1, label = "8th(1934)") +  
   annotate("text", x = 1934.3, y = 3.9, label = "1st(1935)") +
   annotate("text", x = 1936.7, y = 3.3, label = "1st(1936)") +
   annotate("text", x = 1937.7, y = 2.5, label = "1st(1937)") +
   annotate("text", x = 1938.7, y = 2.1, label = "1st(1938)") +
   annotate("text", x = 1939.7, y = 1.85, label = "5th(1939)") +
   
   annotate("text", x = 1938, y = 4, label = "*The Top Ten Money Making Stars Poll Ranking*") 

The graph below filters through female baby names between the period 1934 to 1939(The Years were Shirley Temple was ranked in the top ten of The Top Ten Money Making Stars Poll.) The dataset is filtered to include only female names that had a prop value over .020 in order to weed out less popular names.

The graph suggests that Shirley was the only name that had a percentage noticeable percentage increased from one year to another that was over 1%. Giving the pattern of the other popular names which show a steady incline or decline over the same time period, the data clarifies that the increase in the percentage of females born with the name Shirley from 1934 to 1935 is not normal. There must be an outside interference that has affected the popularity of the name Shirley.

    BabyNamesTwo %>% filter(prop > .020 & sex =="F" & year %in% 1934:1939) %>% 
     arrange(desc(prop)) %>% 
     ggplot(aes(year, percent, colour = name)) +
     geom_line() + ylab("\n%Pct\n") + 
     geom_vline(xintercept = 1934, color = "red", linetype = "dotted") + 
     geom_vline(xintercept = 1939, color = "red", linetype = "dotted") +
     ggtitle("Female Baby Names from 1934-1938 with a prop rate of 2% or above") +
   
   geom_point(color="red")

Last but not least, graphs of the top ten female baby names per year between 1934 to 1939. These graphs filter through sex(female) and year.Each graph represents the years that Shirley Temple was considered a box office performer. (1934-1939). All female baby names are then grouped together individually based on their name, summed into the total number of that particular name has per year and then arranged based on that total. The limit shown is the top 10 most popular names.

From the data: Shirley’s Rankings from 1934 to 1939 is the following: 4th(1934), 2nd(1935), 2nd(1936), 4th(1937), 5th(1938), 5th(1939).

Shirley would continue to decline and eventually in 1942, It would lose it’s place in the top ten most popular female names. Again, data would suggest that Shirley Temple’s Box Office Draw and Popularity was the result of Shirley’s jump in rankings during the time period.

babynames %>% 
     filter(sex =="F" & year == 1934) %>% 
     group_by(name) %>% 
     summarize(total = sum(n)) %>% 
     arrange(desc(total)) %>% 
     head(10) %>% 
     ggplot(aes(reorder(name, total, ), total)) + geom_bar(stat="identity", fill = "orange" ) +
     xlab("\nTop Ten Most Popular Female Names from 1934\n") +
     ylab("\nPopulation based on Names\n") + coord_flip() 

   babynames %>% 
     filter(sex =="F" & year == 1935) %>% 
     group_by(name) %>% 
     summarize(total = sum(n)) %>% 
     arrange(desc(total)) %>% 
     head(10) %>% 
     ggplot(aes(reorder(name, total, ), total)) + geom_bar(stat="identity", fill = "orange" ) +
     xlab("\nTop Ten Most Popular Female Names from 1935\n") +
     ylab("\nPopulation based on Names\n") + coord_flip()  

   babynames %>% 
     filter(sex =="F" & year == 1936) %>% 
     group_by(name) %>% 
     summarize(total = sum(n)) %>% 
     arrange(desc(total)) %>% 
     head(10) %>% 
     ggplot(aes(reorder(name, total, ), total)) + geom_bar(stat="identity", fill = "orange" ) +
     xlab("\nTop Ten Most Popular Female Names from 1936\n") +
     ylab("\nPopulation based on Names\n") + coord_flip() 

   babynames %>% 
     filter(sex =="F" & year == 1937) %>% 
     group_by(name) %>% 
     summarize(total = sum(n)) %>% 
     arrange(desc(total)) %>% 
     head(10) %>% 
     ggplot(aes(reorder(name, total, ), total)) + geom_bar(stat="identity", fill = "orange" ) +
     xlab("\nTop Ten Most Popular Female Names from 1937\n") +
     ylab("\nPopulation based on Names\n") + coord_flip() 

   babynames %>% 
     filter(sex =="F" & year == 1938) %>% 
     group_by(name) %>% 
     summarize(total = sum(n)) %>% 
     arrange(desc(total)) %>% 
     head(10) %>% 
     ggplot(aes(reorder(name, total, ), total)) + geom_bar(stat="identity", fill = "orange" ) +
     xlab("\nTop Ten Most Popular Female Names from 1938\n") +
     ylab("\nPopulation based on Names\n") + coord_flip() 

   babynames %>% 
     filter(sex =="F" & year == 1939) %>% 
     group_by(name) %>% 
     summarize(total = sum(n)) %>% 
     arrange(desc(total)) %>% 
     head(10) %>% 
     ggplot(aes(reorder(name, total, ), total)) + geom_bar(stat="identity", fill = "orange" ) +
     xlab("\nTop Ten Most Popular Female Names from 1939\n") +
     ylab("\nPopulation based on Names\n") + coord_flip() 

   babynames %>% 
     filter(sex =="F" & year == 1940) %>% 
     group_by(name) %>% 
     summarize(total = sum(n)) %>% 
     arrange(desc(total)) %>% 
     head(10) %>% 
     ggplot(aes(reorder(name, total, ), total)) + geom_bar(stat="identity", fill = "orange" ) +
     xlab("\nTop Ten Females of 1939\n") +
     ylab("\nPopulation based on Names\n") + coord_flip() 

Conclusion

There is evidence that suggest that the name Shirley’s increase in popularity between the years 1934 to 1939 was directly associated with Shirley’s Temple Box Office Stardom Years. From the analysis shown, the other names within that time period had a steady decline or steady incline. No other name would make a large percentage leap over 1.5% from one year to the other during the time period.