The Name Popularity of Presidential First Children

This report analyzes the name popularity of presidential first children, specifically the popularity of the names in the states where the presidents launched their political careers around the time they were first elected. I used data from the R package "babynames," which contains baby names in the U.S. from 1880-2017, and the "StateNames" database from Kaggle, which contains baby names organized by state.

Process/Methodology

I decided that my analysis would go back to the children of John F. Kennedy, as I wanted to see the impact of not only his election on the popularity of his childrens' names, but also his assassination. However, I'm not going to take any trends in baby names from after his assassination into account in my conclusion, as the main purpose of this analysis is to identify trends in baby names surrounding a president's first election win.

I performed three analyses for the children of each president: 1) their name popularity in the U.S. during the entire time period of the "babynames" database, 2) their name popularity in the state where the presidents launched their political careers during the entire time period of the "StateNames" database, and 3) their name popularity in the state where the presdients launched their poitical careers from four years before their first election win to four years after (i.e. 2004 to 2012 for Barack Obama).

My goal in doing this was to see how an election win impacted the percetange of newborns named after presidential children. I chose to start with Sasha and Malia Obama, as the databases only go up to 2017, so there isn't enough data to anaylze the popularity of Donald Trump's children's names. I also decided that I would only analyze the children of president's who were elected, meaning they and their families gained popularity from running a campaign. As a result, I did not anaylze the names of Gerald Ford's children, since he was not elected to the presidency. I did anaylze the names of Lyndon B. Johnson's children, though, since he won reelection in 1964 after ascending to the presidency in 1963 because of John F. Kennedy's assassination.

Hypothesis

Before starting my analysis, I hypothesized that the percentage of newborns named after presidential children around the time a president was first elected would increase in the state where he launched his political career.

Results

Before starting my analysis, I loaded the R collection "tidyverse," the "babynames" package, and the "Statenames" database, using the following code.

library(tidyverse)

library(babynames)

read.csv("StateNames.csv") -> stateNames

Barack Obama's Daughters

First, I utilized the "babynames" package to show the percentage of newborns in the U.S. who were named Sasha or Malia dating back to 1880. I did this using the following code.
Obamas <- babynames %>% filter(name %in% c("Sasha", "Malia") & sex == "F")

Obamas %>% mutate(percent = prop * 100) -> Obamas

ggplot(Obamas, aes(year, percent, color = name)) + geom_line() + xlab("Year") + ylab("Percent") + ggtitle("Name Popularity of Barack Obama's Daughters in the U.S.") + theme(plot.title = element_text(hjust = 0.5))

The plot shows major spikes in the name Sasha between 1980 and 1990, as well as between 2000 and 2010. I was unable to pinpoint the cause of the spike in the 1980s other than a natural rise in the number of newborns being named Sasha. The plot also shows a major spike in the name Malia between 2000 and 2010. I then used the "StateNames" database and the following code to focus the analysis on their name popularity in Illinois dating back to 1880.

totalIllinois <- stateNames %>% filter(Gender =="F" & State == "IL")

totalIllinois <- totalIllinois %>% mutate(Prop = Count /sum(Count))

totalIllinois$Count / sum(totalIllinois$Count)

totalIllinois %>% mutate(Percent = Prop * 100) -> totalIllinois

sum(totalIllinois$Percent)

ObamasStateTotals <- totalIllinois %>% filter(Name %in% c("Sasha", "Malia"))

ggplot(ObamasStateTotals, aes(Year, Percent, color = Name)) + geom_line() + xlab("Year") + ylab("Percent") + ggtitle("Name Popularity of Barack Obama's Daughters in Illinois") + theme(plot.title = element_text(hjust = 0.5))

This plot follows the patterns of the previous plot, with spikes in the name Sasha seen in Illinois between 1980 and 1990 and 2000 and 2010, and spikes in the name Malia between 2000 and 2010. I then used the "StateNames" database again and the following code to focus the analysis on their name popularity in Illinois surrounding Barack Obama's first election win in 2008.

totalIllinois2 <- stateNames %>% filter(Gender =="F" & State == "IL")

totalIllinois2 <- totalIllinois2 %>% mutate(Prop = Count /sum(Count))

totalIllinois2$Count / sum(totalIllinois2$Count)

totalIllinois2 %>% mutate(Percent = Prop * 100) -> totalIllinois2

sum(totalIllinois$Percent)

ObamasStateTotals2 <- totalIllinois2 %>% filter(Name %in% c("Sasha", "Malia") & Year > 2003 & Year < 2013)

ggplot(ObamasStateTotals2, aes(Year, Percent, color = Name)) + geom_line() + xlab("Year") + ylab("Percent") + ggtitle("Name Popularity of Barack Obama's Daughters in Illinois from 2004-2012") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(xintercept = 2008, color = "blue", size = 0.5) + annotate("text", x = 2007.8, y = 0.0008, angle = 90, size = 3.5, label = "Barack Obama Elected")

This plot clearly shows a spike in the percentage of newborns with the names Sasha or Malia in Illinois directly after Barack Obama's election win in 2008.

George W. Bush's Daughters

The code for the rest of the analysis is very similar to, if not an exact replica, of the code I used for Sasha and Malia Obama, so I'm not going to display it in its entirety for each set of presidential children. The only differences would be using "name ==" instead of "name %in% c()" when there's only one child or different year ranges and states. Additionally, when a president had sons and daughters, instead of just one or the other, I separated the sexes into different plots as to only focus on daughters named after presidential daughters and sons named after presidential sons.

Here are the plots for the name popularity of George W. Bush's daughters, Jenna and Barbara, in the U.S. going back to 1880, Texas going back to 1880, and Texas from 1996 to 2004.

The plots show similar trends in both names in the U.S. and Texas from 1880 to 2017, with the popularity of Barbara peaking between 1940 and 1950 and on a constant decline sine then, and Jenna not gaining much traction until the last quarter of the 1900s. While Barbara started increasing after George W. Bush was elected in 2000, the name popularity was rather unstable during the time period displayed on the plot. Additionally, the popularity of the name Jenna was already on the rise in Texas when George W. Bush was elected.

Bill Clinton's Daughter

With Chelsea being Bill Clinton's only child, this would be an instance where I used "name ==" instead of "name %in% c()" in the code. The code for the percentage of newborns named Chelsea in the U.S. from 1880 to 2017 is below.

Clintons <- babynames %>% filter(name == "Chelsea" & sex == "F")

Clintons %>% mutate(percent = prop * 100) -> Clintons

ggplot(Clintons, aes(year, percent, color = name)) + geom_line() + xlab("Year") + ylab("Percent") + ggtitle("Name Popularity of Bill Clinton's Daughter in the U.S.") + theme(plot.title = element_text(hjust = 0.5))

As the plots show, the peak percentage of newborns named Chelsea in the U.S. is around the same time as the peak percentage in Arkansas. The second plot doesn't go back to 1880 because the first daughter to be named Chelsea in Arkansas wasn't born until 1982.

When focusing my analysis on the years surrounding Bill Clinton's first election win, it's clear from the plot above that the percentage of newborns in Arkansas named Chelsea started to drop around the same time he was elected.

George H.W. Bush's Children

Anyalzing the popularity of George H.W. Bush's children's names was the first instance in which I split the plots into sons and daugthers for each stage of the analysis.

The plots above show that the popularity of the names Dorothy and Pauline peaked in the early 1900s in the U.S., and the popularity of the name George has been steadily declining in the U.S. since 1880.

While the popularity of the names Dorothy and Pauline in Texas follow a similar pattern over time as their popularity in the U.S., the popularity of the name George in Texas differs from that in the U.S. There were two major spikes in the early 1900s and mid 1900s amid relatively consistent instability in the popularity of the name.

When focusing my analysis on the years surrouding George H.W. Bush's election in 1988, the only name that changes from decreasing popularity to increasing popularity among newborns in Texas after his win is Pauline. It isn't, however, a steep climb, and the popularity of the name Dorothy in Texas began to decrease, and the popularity of his sons' names either stayed constant or were already on the rise and continued to rise.

Ronald Reagan's Children

The popularity of the names of Ronald Reagan's children all seem to peak in the mid 1900s, though the name Ron maintians low popularity dating back to 1880 (most likely because it's usually a nickname for Ronald). The highest percentages in the U.S. are seen in Christine and Michael, with Christine seeing two major spikes in a relatively short period of time.

The popularity of all the names in California maintain similar trends to their popularity in the U.S., with the percentages peaking around the same time for the female names. The popularity of Michael peaks in one place in California that it doesn't in the overall U.S., shortly after 1980.

Focusing on the years surrounding Ronald Reagan's first election win, the only name that changes from decreasing popularity to increasing popularity in California after he was elected is Christine, while the popularity of the other names either remains the same or is already on the rise and continues to increase.

Jimmy Carter's Children

The popularity of the name Amy has one major peak in the U.S. that occurs in the latter half of the 1900s, while the popularity of Jack and James peak in the early to mid 1900s. Donnel never gains any popularity, which isn't surprising since it's such a unique name.

The popularity of the names of Jimmy Carter's children in Georgia follow similar trends to that in the U.S., with the only major difference being that Donnel didn't even show up on the plot because of the lack of newborns with that name.

In focusing on the years surrounding Jimmary Carter's election, the popularity of the name Amy drops sharply in Georgia right after 1976, while the popularity of Jack drops every so slightly, and the popularity of James increases a little bit.

Richard Nixon's Daughters

The plot above shows that the popularity of the names Julie and Tricia peak between 1960 and 1980 in the U.S., and the popularity of both have been on a relatively consistent decline since then.

The trends in popularity of the names Julie and Tricia in California are similar to that of the U.S., though Julie peaked right before 1960. The declines in their popularity in California aren't as consistent as in the U.S., as there are some years where the popularity increases, but the overall trend is a decrease in popularity for both names after they peak.

After Richard Nixon was first elected in 1968, the popularity of the name Julie briefly increased in California before continuing to fall, while the popularity of the name Tricia was already rising slightly when Nixon was first elected and continues to do so for a year before leveling off.

Lyndon B. Johnson's Daughters

The plot above shows that the popularity of the name Lynda in the U.S. is much higher than the popularity of of the name Luci. The popularity of Lynda peaks just before 1950, whereas Luci doesn't appear on the scene until almost 1940 and peaks just after 1960.

The trend in the popularity of Lynda in Texas mirrors that of the U.S., with the name peaking shortly before 1950. Luci doesn't appear on the scene until even later, after 1950, and peaks around 1960.

After Lyndon B. Johnson won reelection in 1964 (his first election win, since he ascended to the presidency after John F. Kennedy's assassination in 1963), the name Lynda changes from increasing popularity to decreasing popularity in Texas, while the popularity of the name Luci starts to rise ever so slightly and continues to do so for two years before starting to slightly decrease.

John F. Kennedy's Children

The name popularity of both Caroline and John in the U.S. start off at their highest points in 1880, and while Caroline gets back kind of close to that peak in 2000, but John never gets back to the high popularity it experiences at the beginning of the time period. Patrick is less popular than John, but the percentage of newborns with the name starts to increase around 1920. All three names have an increase in popularity around 1960.

The popularity of the name Caroline in Massachusetts has a peak in 2000, but different from the U.S. trend, that's the name's highest peak. John peaks in popularity shortly before 1950 and is relatively inconsistent in popularity before starting to rapidly decline around 1960. Patrick's highest peak in popularity is around 1980.

When John F. Kennedy was elected in 1960, the popularity of the names Caroline and John were already on the rise in Massachusetts, and both continued to rise for about a year after the election. The popularity of Patrick was declining ever so slightly up until 1960, at which point it started to increase ever so slightly.

Conclusion

Prior to starting the analysis, I hypothesized that the percentage of newborns named after presidential children around the time a president was first elected would increase in the state where he launched his political career. The results of my analysis, however, disprove this hypothesis, as the data shows there are only certain instances in which the percentage of newborns named after presidential children increased in the given time period and state.

The best example of this is shown in the results for Sasha and Malia Obama. The percentage of newborns named Sasha decreased from 2007 to 2008, but then increased from 2008 to 2009 to its highest level in the eight-year span that was anaylzed. When it comes to the percentage of newborns named Malia, it rose from 2007 to 2008, but then skyrocketed from 2008 to 2009 to its highest level in Illinois history.

While Sasha and Malia were not the only presidential children to see percetange increases in the given time period and state, most jumps were not as significant. Some names, such as Chelsea in Arkansas, Dorothy in Texas, and Amy in Georgia, all saw rapid decreases after an election win. There were also instances in which the popularity of a name remained relatively constant after an election win. For these reasons, I'm unable to confidently say the results back up my hypothesis, and I must therefore say the results disprove it.