The amount of drug overdose deaths in the United States has been rapidly increasing over the years. According to the National Institute on Drug Abuse, more than 64,000 drug overdose deaths occurred in 2016. Substance abuse is becoming an epidemic, and is one that affects many people, either directly or indirectly.
There are many different influences out there that could have an effect on substance abuse in certain areas, such as population, socio-economic class, and mental health issues. The Rural Health Reform Policy Research Center classifies counties into five urbanization levels. Metropolitan (urban) counties are considered to be one of the following: large central (inner cities) counties in metropolitan statistical areas of 1 million or more population contained in the principal city of that county, large fringe (suburban) counties in metropolitan statistical areas with at least 1 million residents, or small metro counties in metropolitan statistical areas with a population of less than 1 million residents. Nonmetropolitan (rural) counties consist of: micropolitan (large rural) counties in micropolitan statistical areas with a population of 10,000 to 49,999 or non-core (small rural) remaining nonmetropolitan counties that are not in a micropolitan statistical area. The Research Center studies suggest that substance abuse in more rural areas in on the rise despite the common perception that urban areas experience more illicit drug use. There is a stereotype surrounding the issue of substance abuse, that it mainly affects those in poverty. The Centers for Disease Control and Prevention conducted a survey on drug use and health which yielded the following results: an average of 12.87% respondents with an annual household income of less than $20,000 self-reported use of illicit drugs, an average of 9.33% of respondents with an annual household income of $20,000-$49,999 self-reported use, an average of 7.13% of respondents with an annual household income of $50,000-$74,999 self-reported substance use, and an average of 6.17% of respondents with an annual household income of $75,000 or above self-reported use of drugs. These results suggest that a lower annual household income could possibly lead to an increase of substance abuse. The topic of substance abuse and it's possible connection to mental health disorders is debated at great lengths. Research done by the Substance Abuse and Mental Health Services Administration (SAMHSA) suggests that these two issues could be linked. According to surveys conducted by SAMHSA, an estimated 43.6 million American adults experience some form of mental illness, and about 20.2 million adults deal with substance abuse problems. Of those that were surveyed, 7.9 million people have both a mental health disorder as well as a substance abuse disorder.
I took a look at overdose death data from Alamance County in North Carolina, where I attend college, to better understand the epidemic in the context of where I currently live. In addition, I also took a look at possible factors which may influence substance abuse by using data concerning adults in North Carolina who experience frequent mental distress. To gain some perspective and have outside references about this epidemic, I looked at overdose deaths in a broader sense across the country, then narrowed down the focus to see how Alamance County compares and correlates with outside components in the United States.
In order to get an idea of how big an epidemic substance abuse is in North Carolina, I took a look at where the state ranks in terms of having the most overdose deaths in the United States.
Using the "overdoses" csv file from Kaggle, I arranged all fifty states in descending order, from most amount of deaths to least. I first loaded the csv, then used the order function to organize the states.
overdoses ‹- read_csv("overdoses.csv")
str(overdoses)
overdoses ‹- overdoses[with(overdoses, order(Deaths, decreasing=TRUE)),]
North Carolina was ranked as the ninth state with the highest overdose deaths, with 1,358 deaths. For reference, California was first with 4,521 deaths, and North Dakota was fiftieth with 43.
However, since some states have higher populations than others, I used the mutate function to create percentages for all fifty states, dividing the number of deaths by the state's population. I then used the arrange function to arrange the states in descending order, from highest percentage to least. I made this variable equal to o2, to differentiate between the overdoses value created above.
o2 ‹- overdoses %›% mutate(Percent = (Deaths / Population) * 100) %›% arrange(desc(Percent))
North Carolina ranked as thirtieth with a percentage of 0.0138%. For reference, West Virginia was first with 0.0338%, and North Dakota was fiftieth with 0.0059%.
Since North Carolina was in the middle, I decided to take a closer look at the geographic location of Alamance County to see how it ranked among other North Carolina counties.
Using the datasheet Drnceaths1416.xlsx, I uploaded the excel sheet data into R with the read_excel function. I assigned the name overdose.deaths to this value.
install.packages("readxl")
library("readxl")
overdose.deaths ‹- read_excel("DRncdeaths1416.xlsx", col_types = c("text", "date", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "numeric", "numeric", "numeric", "text", "text", "text", "numeric", "text", "text", "text", "text", "text", "text", "text","text", "text", "text", "text", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text", "text"))
To clean up the dataset, I used the mutate function to remove character vectors.
Using the tigris and sf packages, I downloaded the North Carolina county shapefile and converted it to an sf object. Then, I renamed the columns to match the rest of the data. After that, I plotted the number of overdose deaths by county and arranged them in descending order as I did above for the states.
library(tigris)
library(sf)
nc = counties(37) %›% st_as_sf() %›% rename(rcounty = NAMELSAD)
overdose.deaths %›% group_by(rcounty) -› overdose.deaths2
overdose.deaths2 %›% count() -›overdose.deaths3
overdose.deaths3 %›% ungroup() -› overdose.deaths4
overdose.deaths4 %›% rename(total_deaths = n) -›overdose.deaths5
From there, I merged the overdose.deaths5 dataframe, which consists of all the counties in North Carolina and the amount of overdose deaths that occurred, with the nc dataframe, which consists of geographical information about the counties. Now with one dataframe which included both the geographical information about North Carolina, as well as the total overdose death count per county, I created a visualization to depict the number of overdose deaths across the state.
overdose.deaths5 %›% right_join(nc) -› mergedNCdeaths
mergedNCdeaths %›% ggplot() + geom_sf(aes(fill = total_deaths)) + scale_fill_viridis_c("Total Deaths") -› plot
plot + theme(panel.grid.major = element_line(colour = 'transparent'), axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.y=element_blank(), axis.text.y=element_blank(), axis.ticks.y=element_blank(), panel.background=element_blank(), panel.border=element_blank(), panel.grid.minor=element_blank(), plot.background=element_blank())
plot + theme(plot.title = element_text(size=10, face="bold", margin=margin(10, 10, 10, 10))) + ggtitle("North Carolina Overdose Deaths by County")
Alamance County ranked twenty-ninth out of all the counties in North Carolina, with 34 overdose deaths. For reference, Mecklenburg County ranked first with 228 deaths, and Warren and Washington were tied for the least amount of deaths with 1 each. The map above provides a visual of the phsyical size of each county, as well as their location relative to each other. Since the number of overdose deaths fluctuated among all of the counties in North Carolina, I wondered if there were any particular factors which could influence substance abuse.
Using data from the United States Census Bureau, I downloaded a csv file that included population and housing occupancy statuses from counties in North Carolina. I merged this dataframe with the mergedNCdeaths dataframe I created earlier. Then, I used the mutate function to create percentages for overdose deaths per population for the counties in North Carolina.
population ‹- read_csv ("NCcountypop.csv")
population %›% group_by(rcounty) -› population.2
population.2 %›% right_join(mergedNCdeaths) -› NCdeathpop
NCdeathpop2 ‹- NCdeathpop %›% mutate(Percent = (total_deaths / total_pop) * 100) %›% arrange(desc(Percent))
After merging the dataframes and adding a new variable, I wanted to create another map similar to the one above to see if county population has any impact on the rate of overdoses throughout the state.
NCdeathpop2 %›% ggplot() + geom_sf(aes(fill = Percent)) + scale_fill_viridis_c("Overdose Percentage") -› plot2
plot2 + theme(panel.grid.major = element_line(colour = 'transparent'), axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.y=element_blank(), axis.text.y=element_blank(), axis.ticks.y=element_blank(), panel.background=element_blank(), panel.border=element_blank(), panel.grid.minor=element_blank(), plot.background=element_blank())
plot2 + theme(plot.title = element_text(size=10, face="bold", margin=margin(10, 10, 25, 10))) + ggtitle("North Carolina Overdose Population Percentages")
Alamance County (population: 151,131) ranked as eightieth with a percentage of 0.0225%. For reference, Wilkes County (population: 69,340) ranked as first with a percentage of 0.1082%. Hertford County (population: 24,669) had the lowest percentage of overdose deaths per population at 0.0041%. The map above portrays a significant shift in colors across the state. Based on the percentage of deaths per county population, counties with both a high number of total overdose deaths and a high population count are not impacted as heavily by substance abuse as counties with both a medium number of total overdose deaths and a medium population count.
To find data about the average yearly income across counties in North Carolina, I used the rvest package to read an html page which had a list of the incomes, used the tbl function to create a table out of the information, and then cleaned up the dataframe to analyze any possible trends. After that, I merged the income dataframe with the death population percentage dataframe I created earlier so that I could compare different characteristics of some of the counties.
library("rvest")
wikipedia ‹- read_html("https://en.wikipedia.org/wiki/ List_of_North_Carolina_locations_by_per_capita_income")
tbls ‹- html_nodes(wikipedia, "table")
nc_tbls ‹- wikipedia %›% html_nodes("table") %›% .[3] %›% html_table(fill = TRUE)
as.data.frame(nc_tbls) -› incomeByCounty
incomeByCounty2 ‹- incomeByCounty %›% arrange(desc(Median.household.income))
incomeByCounty2 %›% group_by(NAME) -› incomeByCounty3
incomeByCounty3 %›% right_join(NCdeathpop2) -› geoincome
Alamance County has the thirty-third highest median income out of all the counties in North Carolina. Since this ranking was right in the middle, I decided to take a look at the counties with the highest and the lowest incomes to have a range of possible correlations.
According to Business Insider, the average median household income in the United States as of 2016 is $59,039, which is a 3.2% increase from the $57,230 average income in 2015. The median household income in Alamance County is $44,167. As stated above, the percentage of overdose deaths to county population is 0.0225%. The top median household income of all counties in North Carolina is Wake County at $63,770. The overdose percentage throughout the county is 0.0219%, just 0.0006% off from Alamance's percentage; however, Wake County's population count is 749,862 higher than Alamance County's. The county with the lowest median household income is Graham County at $28,447. The percentage of overdose deaths in the county is at 0.0339%. Graham's percentage is significantly higher than Alamance and Wake, but their total population is only 8,861. Alamance County's average income is lower than the national average, but it does not seem to have much of an impact on substance abuse in the county. An argument could be made for Graham County whithin the context of both socio-economic status as well as the setting of a more rural population as being possible factors which may add to the growing substance abuse epidemic across the country.
To assess mental health issues in Alamance County, as well as other counties in North Carolina, I downloaded a data set from the County Health Rankings and Roadmaps which depicts the percentage of adults in each North Carolina county who reported frequent mental distress. I grouped that data with data I previously worked on concerning the percentage of drug overdoses per county to see if there was any correlation there. I chose to map a scatter plot to clearly see whether there was a relationship between percent reported frequent mental distress and percent of drug overdose deaths in the counties of North Carolina.
mentalhealth ‹- read_excel("mentalhealth.xlsx")
mentalhealth %›% group_by(rcounty) -› mentalhealth2
mentalhealth2 %›% right_join(NCdeathpop2) -›co.occurring
ggplot(co.occurring, aes(x = mental, y = percent)) + geom_point(shape=4, color = "red", size = 2) + labs(title = "Relationship Between Mental Health Distress and Drug Overdose Rates", x = "% Frequent Mental Distress", y = "% Overdose Death per Population") + theme_calc()
There are a few clear outliers charted on teh scatter plot, which may skew any possible correlation between the two factors. Based on the plot and the data, it is difficult to say if mental health and substance abuse could be connected within Alamance County.
Based on all of the data collected and mapped out using R, there are a few key takeaways about the substance abuse epidemic within the context of Alamance County and North Carolina. Smaller counties tended to have a higher percentage of overdose deaths rather than the heavily populated ones. Statistics from various datasets showed that there can be a correlation seen between average household income and illicit drug use too. While Alamance County fell somewhere in the middle compared to the other North Carolina counties, the outliers gave some perspective about possible relationships between real life influences that may affect people who struggle with substance abuse.
One thing I would like to have done differently throughout my research is finding data all from the same year. Considering the vast amounts of datasets I pulled to look at all of the influencing factors, finding information all from the same year proved to be a challenge. I also would have taken more time to address potential outliers and dive deeper into particularities within Alamance County, specifically looking at age, gender, and racial factors. That being said, I would love to expand upon this inital research in the future and look into those possible influences. I would also like to take some time to analyze how different outside elements may affect each other as well.