The impetus for this case study was an article from the Guardian reporting that US maternal deaths rose in 2020, with Black mothers at far higher risk. This inspired me to do a more detailed examination of the data. Having looked at statistics comparing US and western European populations before, I have noticed that the broad statistics sometimes look different for the US as a whole than when it is divided into sub-populations. For instance, the birth rate among White middle-class Americans more closely tracks their European counterparts than would be expected. Thus, it is possible that recommendations for changes to fix any given problem might be specific to these sub-populations.
Keeping the above in mind, these were some of the starting points that this case study was meant to investigate what demographic changes could be responsible for the increase in maternal mortality in the US, and if any demographic groups are particularly affected by the changes.
The primary source for data regarding maternal mortality is from the National Center for Health Statistic’s Division of Vital Statistics, the public-facing portal for which can be found here. After some initial confusion about the reporting, I determined that this case study would use files found under the heading Mortality Multiple Cause Files. These files are not specific to maternal mortality; they contain all causes of mortality in the US. Each death is assigned an ICD-10 code for a condition that was described as being the underlying cause of death, and up to 20 additional ICD-10 conditions that the patient also presented with.
The task of cleaning and validating the extraneous data fell to a custom Python script that will be linked in the resources of this case study. Broadly, the method of separating instances of maternal mortality was a matter of extracting ICD-10 code numbers A34, O00-O95, and O98-O99 from the underlying causes column and saving them to CSV files for their respective years, per the above CDC page.
Those files are loaded and processed below. Refer to the helper_function.R script linked in the resources section for more specific explanations of how these functions work.
getwd()
## [1] "/home/mireles/horseless/maternal_mortality/mortality"
mort2019 <- read_csv("maternal_mortality_2019.csv")
mort2019 <- preprocess_dataset(mort2019, 2019)
mort2018 <- read_csv("maternal_mortality_2018.csv")
mort2018 <- preprocess_dataset(mort2018, 2018)
mort2017 <- read_csv("maternal_mortality_2017.csv")
mort2017 <- preprocess_dataset(mort2017, 2017)
mort2016 <- read_csv("maternal_mortality_2016.csv")
mort2016 <- preprocess_dataset(mort2016, 2016)
mort2015 <- read_csv("maternal_mortality_2015.csv")
mort2015 <- preprocess_dataset(mort2015, 2015)
mort2014 <- read_csv("maternal_mortality_2014.csv")
mort2014 <- preprocess_dataset(mort2014, 2014)
mort2013 <- read_csv("maternal_mortality_2013.csv")
mort2013 <- preprocess_dataset(mort2013, 2013)
mort2012 <- read_csv("maternal_mortality_2012.csv")
mort2012 <- preprocess_dataset(mort2012, 2012)
mort2011 <- read_csv("maternal_mortality_2011.csv")
mort2011 <- preprocess_dataset(mort2011, 2011)
mort2010 <- read_csv("maternal_mortality_2010.csv")
mort2010 <- preprocess_dataset(mort2010, 2010)
mort2009 <- read_csv("maternal_mortality_2009.csv")
mort2009 <- preprocess_dataset(mort2009, 2009)
mort2008 <- read_csv("maternal_mortality_2008.csv")
mort2008 <- preprocess_dataset(mort2008, 2008)
mort2007 <- read_csv("maternal_mortality_2007.csv")
mort2007 <- preprocess_dataset(mort2007, 2007)
mort2006 <- read_csv("maternal_mortality_2006.csv")
mort2006 <- preprocess_dataset(mort2006, 2006)
mort2005 <- read_csv("maternal_mortality_2005.csv")
mort2005 <- preprocess_dataset(mort2005, 2005)
mort2004 <- read_csv("maternal_mortality_2004.csv")
mort2004 <- preprocess_dataset(mort2004, 2004)
mort2003 <- read_csv("maternal_mortality_2003.csv")
mort2003 <- preprocess_dataset(mort2003, 2003)
mort_combined <- bind_rows(mort2019, mort2018, mort2017, mort2016,
mort2015, mort2014, mort2013, mort2012,
mort2011, mort2010, mort2009, mort2008,
mort2007, mort2006, mort2005, mort2004,
mort2003)
The thorough reviewer will have noticed that the time frame of the data set begins in 2003 despite the Vital Statistics portal supplying data as far back as 1968, and ends in 2019 despite the availability of data from 2020 and 2021. Examining the given formatting protocol provided by the User’s Guide, there were revisions to the coding for race and Hispanic origin in 2003 that would require additional modifications to the Python script to parse. As for ending in 2019, the COVID-19 pandemic represents a potentially substantial confounding in maternal care that is both ongoing and poorly understood. This narrows our range to a period of 17 years, enough to establish appreciable trends and still be relevant to contemporary maternal care.
In addition to collating each year into a single dataframe and ensuring that the columns are in the right format, it will be a benefit to make a few new columns that will make things more legible to the untrained eye. A good place to start is partitioning the underlying conditions into broad categories. Using Pregnancy, childbirth, and the puerperium O00-O9A, we mirror the divisions in our own dataframe:
condition_buckets <- c("O0[0-8]", # pregnancy with abortive outcome
"O09", # Supervision of high-risk pregnancy
"O1[0-6]", # Edema, proteinuria, and hypertensive
"O2[0-9]", # Other maternal disorders predominantly related to pregnancy
"O[3|4][0-8]|O39", # Maternal care related to the fetus and amniotic cavity and possible delivery problems
"O[6][0-9]|O7[0-7]", # Complications of labor and delivery
"O8[0-2]", # Encounter for delivery
"O8[5-9]|O9[0-2]", # Complications predominantly related to the puerperium
"O9[4-9]|O9A" # Other obstetric conditions, not elsewhere classified
)
abortive <- group_codes(mort_combined, condition_buckets[1])
eph <- group_codes(mort_combined, condition_buckets[3])
preg <- group_codes(mort_combined, condition_buckets[4])
fetal <- group_codes(mort_combined, condition_buckets[5])
comp <- group_codes(mort_combined, condition_buckets[6])
puerperium <- group_codes(mort_combined, condition_buckets[8])
other <- group_codes(mort_combined, condition_buckets[9])
Taking a a quick look at the buckets with View(), I noticed that the encounter group encouragingly has 0 values going back to 2015. I also noticed that the high-risk group confusingly also has 0 values, which seems incredibly unlikely. After suspecting a bug in my own regex code, I searched and found a helpful page from Outsource Strategies International explaining that O09 is only intended for use in the prenatal period, and that encounter codes are meant specifically for situations where there were no complications during the labor or delivery episode, not where a mortality event occurred unexpectedly. The two can be excluded.
Next, we combine all of these back into one dataframe, which now includes a brief descriptor for the underlying condition.
mort <- mort_combined[0,]
abortive <- add_bucket(abortive, "Abortive")
eph <- add_bucket(eph, "Edema, proteinuria, and hypertensive")
comp <- add_bucket(comp, "Complications of labor and delivery")
preg <- add_bucket(preg, "Other maternal disorders related to pregnancy")
fetal <- add_bucket(fetal, "Fetal")
puerperium <- add_bucket(puerperium, "Puerperium")
other <- add_bucket(other, "Other obstetric conditions")
mort <- mort %>%
bind_rows(abortive, eph, comp, preg, fetal, puerperium, other) %>%
relocate(bucket, .after="marital_status")
At this point, making a more legibile dataframe no longer needs extra data sorting and processing. It is simply a matter of decoding currently available codes in accordance with the user’s guide. We start with resident status. Note that this is not to do with the patient’s immigration status, but is more a matter of residence in the same state and/or county as treatment.
mort$res_status_decoded <- as.numeric(mort$res_status)
mort <- mort %>%
mutate(res_status_decoded = case_when(res_status == 1 ~ "resident",
res_status == 2 ~ "intrastate nonresident",
res_status == 3 ~ "interstate nonresident",
res_status == 4 ~ "foreign resident")) %>%
relocate(res_status_decoded, .after="res_status")
We do the same for place of death.
mort$death_loc_decoded <- as.numeric(mort$death_loc)
mort <- mort %>%
mutate(death_loc_decoded = case_when(death_loc == 1 ~ "Inpatient - hospital, clinic, or medical center",
death_loc == 2 ~ "Outpatient or admitted to ER - hospital, clinical, or medical center",
death_loc == 3 ~ "Dead on arrival - hospital, clinic, or medical center",
death_loc == 4 ~ "Home",
death_loc == 5 ~ "Hospice facility",
death_loc == 6 ~ "Nursing home/long term care",
death_loc == 7 ~ "Other",
death_loc == 9 ~ "Place of death unknown")) %>%
relocate(death_loc_decoded, .after="death_loc")
Then education.
mort$education_decoded <- as.numeric(mort$education)
mort <- mort %>%
mutate(education_decoded = case_when(education == 1 ~ "8th grade or less",
education == 2 ~ "9th-12th grade, no diploma",
education == 3 ~ "high school graduate or GED completed",
education == 4 ~ "some college credit, but no degree",
education == 5 ~ "associate's degree",
education == 6 ~ "bachelor's degree",
education == 7 ~ "master's degree",
education == 8 ~ "doctorate or professional degree",
education == 9 ~ "unknown")) %>%
relocate(education_decoded, .after="education")
And finally for race and Hispanic origin. That there are three here might be a bit confusing, but please note that the last one decodes a recoding in the original data set that is included here for housekeeping purposes.
mort$race_decoded <- as.character(mort$race)
mort <- mort %>%
mutate(race_decoded = case_when(race == "01" ~ "White",
race == "02" ~ "Black",
race == "03" ~ "American Indian (includes Aleuts and Eskimos",
race == "04" ~ "Chinese",
race == "05" ~ "Japanese",
race == "06" ~ "Hawaiian (includes Part-Hawaiian",
race == "07" ~ "Filipino",
race == "18" ~ "Asian Indian",
race == "28" ~ "Korean",
race == "38" ~ "Samoan",
race == "48" ~ "Vietnamese",
race == "58" ~ "Guamanian",
race == "68" ~ "Other Asian or Pacific Islander",
race == "78" ~ "Combined other Asian or Pacific Islander")) %>%
relocate(race_decoded, .after="race")
mort$hisp_orig_decoded <- mort$hisp_orig
mort <- mort %>%
mutate(hisp_orig_decoded = case_when(hisp_orig >= 100 & hisp_orig <= 199 ~ "Non-Hispanic",
hisp_orig >= 200 & hisp_orig <= 209 ~ "Spaniard",
hisp_orig >= 210 & hisp_orig <= 219 ~ "Mexican",
hisp_orig >= 221 & hisp_orig <= 230 ~ "Central American",
hisp_orig >= 231 & hisp_orig <= 249 ~ "South American",
hisp_orig >= 250 & hisp_orig <= 259 ~ "Latin American",
hisp_orig >= 260 & hisp_orig <= 269 ~ "Puerto Rican",
hisp_orig >= 270 & hisp_orig <= 274 ~ "Cuban",
hisp_orig >= 275 & hisp_orig <= 279 ~ "Dominican",
hisp_orig == 220 ~ "Central or South American",
hisp_orig >= 280 & hisp_orig <= 299 ~ "Other Hispanic",
hisp_orig >= 996 & hisp_orig <= 999 ~ "Unknown")) %>%
relocate(hisp_orig_decoded, .after="hisp_orig")
mort$hisp_orig_race_decoded <- mort$hisp_orig_race
mort <- mort %>%
mutate(hisp_orig_race_decoded = case_when(hisp_orig_race == 1 ~ "Mexican",
hisp_orig_race == 2 ~ "Puerto Rican",
hisp_orig_race == 3 ~ "Cuban",
hisp_orig_race == 4 ~ "Central or South American",
hisp_orig_race == 5 ~ "Other or unknown Hispanic",
hisp_orig_race == 6 ~ "Non-Hispanic White",
hisp_orig_race == 7 ~ "Non-Hispanic Black",
hisp_orig_race == 8 ~ "Non-Hispanic Other Races",
hisp_orig_race == 9 ~ "Hispanic origin unknown")) %>%
relocate(hisp_orig_race_decoded, .after="hisp_orig_race")
What does our dataframe actually look like?
mort
## # A tibble: 12,062 × 40
## res_status res_st…¹ educa…² educa…³ month year age death…⁴ death…⁵ marit…⁶
## <fct> <chr> <fct> <chr> <dbl> <dbl> <dbl> <fct> <chr> <fct>
## 1 1 resident 4 some c… 1 2019 20 1 Inpati… S
## 2 1 resident 2 9th-12… 2 2019 35 1 Inpati… S
## 3 1 resident 3 high s… 4 2019 18 4 Home S
## 4 1 resident 3 high s… 6 2019 24 1 Inpati… M
## 5 1 resident 8 doctor… 7 2019 34 1 Inpati… M
## 6 1 resident 3 high s… 7 2019 34 2 Outpat… S
## 7 2 intrast… 4 some c… 7 2019 34 1 Inpati… S
## 8 1 resident 3 high s… 7 2019 24 4 Home M
## 9 1 resident 5 associ… 11 2019 28 1 Inpati… M
## 10 1 resident 3 high s… 7 2019 20 1 Inpati… S
## # … with 12,052 more rows, 30 more variables: bucket <chr>, underlying <chr>,
## # code1 <chr>, code2 <chr>, code3 <chr>, code4 <chr>, code5 <chr>,
## # code6 <chr>, code7 <chr>, code8 <chr>, code9 <chr>, code10 <chr>,
## # code11 <chr>, code12 <chr>, code13 <chr>, code14 <lgl>, code15 <lgl>,
## # code16 <lgl>, code17 <lgl>, code18 <lgl>, code19 <lgl>, code20 <lgl>,
## # race <fct>, race_decoded <chr>, hisp_orig <dbl>, hisp_orig_decoded <chr>,
## # hisp_orig_race <fct>, hisp_orig_race_decoded <chr>, …
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
First, we can start looking at some summary statistics to get a sense of where to start.
annual_total <- mort %>%
group_by(year) %>%
summarise(annual_total = n())
ggplot(annual_total, aes(year, annual_total)) +
geom_line(group=1) +
geom_point() +
ylim(327,900) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Total number of annual maternal deaths, 2003-2019",
x="Year",
y="Annual total")
This gives us the total number of maternal mortality cases per year, but getting proportions will be much more valuable. To do that, we need to download some additional data. For our purposes, we are interested in the number of maternal mortality cases per 100,000 live births. Fortunately, Wikipedia has well-sourced US demographic data that will help us get to just that figure. Here, we look at some code that will help us politely (i.e. scraping without causing undue load to the server) download those statistics.
# Having run the code on my machine, the file doesn't need to be downloaded or
# processed again. Being polite.
if (file.exists("demographic_table.csv")) {
pop_table <- read_csv("demographic_table.csv")
} else {
# This will run if you are following along at home.
url <- "https://en.wikipedia.org/wiki/Demographics_of_the_United_States"
# Request the file from the URL
url_bow <- polite::bow(url)
url_bow
ind_html <-
polite::scrape(url_bow) %>%
rvest::html_nodes("table.wikitable") %>%
rvest::html_table(fill = TRUE)
# The page has many demographic tables; we just need US Demographic table, 1935-2021
pop_table <- ind_html[[10]]
pop_table <- rename(pop_table, year=names(pop_table)[1])
# We only want the years; removing links that are present next to some of them in the table.
pop_table$year <- str_extract_all(pop_table$year, "\\d{4}", simplify=TRUE)
# We only need the years 2003-2019
pop_table_tmp <- pop_table[69:85,]
# Getting rid of extraneous clutter in the names
pop_table_tmp <- rename(pop_table_tmp, average_population="Average population[67][29][30]")
pop_table_tmp <- rename(pop_table_tmp, live_births="Live births[68]")
# Necessary data conversions to do math
pop_table_tmp$year <- as.numeric(pop_table_tmp$year)
avg_pop <- as.numeric(gsub(",","",pop_table_tmp$average_population))
live_births <- as.numeric(gsub(",","",pop_table_tmp$live_births))
pop_table_tmp$average_population <- avg_pop
pop_table_tmp$live_births <- live_births
# Writes the processed file to disk
write.table(pop_table_tmp, file="demographic_table.csv", sep=",", row.names=FALSE)
pop_table <- pop_table_tmp
}
pop_table %>%
select(year, live_births)
## # A tibble: 17 × 2
## year live_births
## <dbl> <dbl>
## 1 2003 4089950
## 2 2004 4112052
## 3 2005 4138349
## 4 2006 4265555
## 5 2007 4316234
## 6 2008 4247694
## 7 2009 4130665
## 8 2010 3999386
## 9 2011 3953590
## 10 2012 3952841
## 11 2013 3932181
## 12 2014 3988076
## 13 2015 3978497
## 14 2016 3945875
## 15 2017 3855500
## 16 2018 3791712
## 17 2019 3747540
Now we’re getting somewhere.
ratio <- annual_total$annual_total/pop_table$live_births * 100000
annual_total$per_100k <- ratio
ggplot(annual_total, aes(x=year, y=per_100k)) +
geom_line(group=1) +
geom_point() +
ylim(10, 25) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Maternal deaths per 100,000 live births, 2003-2019",
x="Year",
y="Maternal deaths per 100,000 live births") +
theme(plot.title = element_text(hjust = 0.5))
To emphasize the severity of the situation, I took the liberty of finding some unrelated but interesting statistics. The US Department of Labor and MSHA maintain statistics for fatalities relating to coal mines that can be found in the table on this page. Due to the unfortunate formatting of the table, it must be manually entered so we can work with it here.
coal_url = "https://arlweb.msha.gov/stats/centurystats/coalstats.asp"
coal_years = c(2003:2019)
coal_workers = c(104824, 108734, 116436, 122975, 122936, 133828, 134089, 135500, 143437,
137650, 123259, 116010, 102804, 81485, 82843, 82699, 81361)
coal_deaths = c(30, 28, 23, 47, 34, 30, 18, 48, 20, 20, 20, 16, 12, 8, 15, 12, 12)
coal_mort <- data.frame(year=coal_years, num_workers=coal_workers, num_deaths=coal_deaths)
coal_mort <- as_tibble(coal_mort)
coal_mort <- coal_mort %>%
add_column(proportion = coal_mort$num_deaths/coal_mort$num_workers*100000)
Below is a chart comparing the number of maternal mortality cases to the number of fatalities in coal mining per 100,000. Statistically, it has been safer to be a coal miner than the mother of a live infant in the US since 2011.
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
comparison <- data.frame(year=annual_total$year, Mothers=annual_total$per_100k, `Coal Workers`=coal_mort$proportion)
comparison <- melt(comparison, id=c("year"))
ggplot() +
geom_line(data=comparison, aes(x=year, y=value, color=variable)) +
ylim(5, 40) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Deaths per 100,000, 2003-2019",
x="Year",
y="Deaths per 100,000",
colour="annual_total") +
theme(plot.title = element_text(hjust = 0.5))
germany_mmr <- read_csv("germany-maternal-mortality-rate.csv")
canada_mmr <- read_csv("canada-maternal-mortality-rate.csv")
australia_mmr <- read_csv("australia-maternal-mortality-rate.csv")
japan_mmr <- read_csv("japan-maternal-mortality-rate.csv")
poland_mmr <- read_csv("poland-maternal-mortality-rate.csv")
uk_mmr <- read_csv("united-kingdom-maternal-mortality-rate.csv")
france_mmr <- read_csv("france-maternal-mortality-rate.csv")
germany_mmr <- parse_macrotrends(germany_mmr)
canada_mmr <- parse_macrotrends(canada_mmr)
australia_mmr <- parse_macrotrends(australia_mmr)
japan_mmr <- parse_macrotrends(japan_mmr)
poland_mmr <- parse_macrotrends(poland_mmr)
uk_mmr <- parse_macrotrends(uk_mmr)
france_mmr <- parse_macrotrends(france_mmr)
For a more direct comparison, we look at maternal mortality in a variety of wealthy, developed nations in the EU and English-speaking worlds.
nation_comparison <- data.frame(year=annual_total$year, us=annual_total$per_100k,
uk=uk_mmr$per_100k,
france=france_mmr$per_100k,
germany=germany_mmr$per_100k,
australia=australia_mmr$per_100k,
canada=canada_mmr$per_100k)
nation_comparison <- melt(nation_comparison, id=c("year"))
ggplot() +
geom_line(data=nation_comparison, aes(x=year, y=value, color=variable)) +
ylim(0, 25) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Deaths per 100,000, 2003-2019",
x="Year",
y="Deaths per 100,000",
colour="annual_total") +
theme(plot.title = element_text(hjust = 0.5))
## Warning: Removed 10 row(s) containing missing values (geom_path).
A few things are worth noting:
Here we subdivide the US figures to answer the question of how wide racial and ethnic disparities are for maternal mortality. This is a test of the hypothesis that the statistics for middle-class Whites will track their counterparts throughout the developed world. First, we plot percentages of maternal mortality for each race over our timeframe.
race_comparison <- mort %>%
group_by(year) %>%
add_count(name="year_n") %>%
ungroup() %>%
group_by(year, race_decoded) %>%
add_count(name="race_n") %>%
select(year, race_decoded, race_n, year_n) %>%
distinct(.keep_all=TRUE) %>%
mutate(percent = race_n / year_n * 100) %>%
arrange(year)
race_comparison
## # A tibble: 189 × 5
## # Groups: year, race_decoded [189]
## year race_decoded race_n year_n percent
## <dbl> <chr> <int> <int> <dbl>
## 1 2003 Black 183 499 36.7
## 2 2003 American Indian (includes Aleuts and Eskimos 7 499 1.40
## 3 2003 White 282 499 56.5
## 4 2003 Other Asian or Pacific Islander 11 499 2.20
## 5 2003 Asian Indian 6 499 1.20
## 6 2003 Vietnamese 2 499 0.401
## 7 2003 Chinese 2 499 0.401
## 8 2003 Japanese 1 499 0.200
## 9 2003 Filipino 4 499 0.802
## 10 2003 Korean 1 499 0.200
## # … with 179 more rows
## # ℹ Use `print(n = ...)` to see more rows
ggplot() +
geom_line(data=race_comparison, aes(x=year, y=percent, color=race_decoded)) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Percentage of Annual Maternal Mortality by Race, 2003-2019",
x="Year",
y="Percent") +
theme(plot.title = element_text(hjust = 0.5))
black_white <- race_comparison %>%
filter(race_decoded %in% c("Black", "White"))
ggplot() +
geom_line(data=black_white, aes(x=year, y=race_n, color=race_decoded)) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Percentage of Annual Maternal Mortality by Race, 2003-2019",
x="Year",
y="Percent") +
theme(plot.title = element_text(hjust = 0.5))
black <- black_white %>%
filter(race_decoded == "Black")
white <- black_white %>%
filter(race_decoded == "White")
Here we can see that, though there are pronounced disparities in health outcomes for Black and White women, maternal mortality is changing for both of them in the same way.
black_white_cor <- round(cor(black$race_n, white$race_n),2)
black_white_cor
## [1] 0.93
Blacks in this timeframe represent approximately 13% of the US population, they represent a substantially outsized portion of cases of maternal mortality. That said, it is also worth noting that outcomes among Blacks on this front have been improving slightly, while that for Whites as been worsening slightly. This means that, while there is substantial racial inequality in outcomes overall, maternal mortality in the US is worsening without respect to race.
ggplot() +
geom_line(data=nation_comparison, aes(x=year, y=value, color=variable)) +
ylim(0, 25) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Deaths per 100,000, 2003-2019",
x="Year",
y="Deaths per 100,000",
colour="annual_total") +
theme(plot.title = element_text(hjust = 0.5))
Unfortunately for us, no direct socioeconomic data was included in the original dataset. Occupation data has been included very recently, in years excluded from this analysis. We have no direct access to things like income, but it might be possible to gain useful insights by looking at a proxy in the form of education levels, which mercifully have been included.
#education_comparison <- mort %>%
# mutate_all(funs(str_replace(., "8th grade or less", "Pre-HS diploma")))
edu_with_unknowns <- mort %>%
mutate(education_decoded = str_replace(education_decoded, "8th grade or less", "Pre-HS diploma")) %>%
mutate(education_decoded = str_replace(education_decoded, "9th-12th grade, no diploma", "Pre-HS diploma")) %>%
mutate(education_decoded = str_replace(education_decoded, "associate's degree", "college degree")) %>%
mutate(education_decoded = str_replace(education_decoded, "bachelor's degree", "college degree")) %>%
group_by(year) %>%
add_count(name = "year_n") %>%
ungroup() %>%
group_by(year, education_decoded) %>%
add_count(name = "ed_level_n") %>%
select(year, education_decoded, year_n, ed_level_n) %>%
distinct(.keep_all = TRUE) %>%
mutate(percent = ed_level_n / year_n * 100) %>%
arrange(year)
education_comparison <- mort %>%
mutate(education_decoded = str_replace(education_decoded, "8th grade or less", "Pre-HS diploma")) %>%
mutate(education_decoded = str_replace(education_decoded, "9th-12th grade, no diploma", "Pre-HS diploma")) %>%
mutate(education_decoded = str_replace(education_decoded, "associate's degree", "college degree")) %>%
mutate(education_decoded = str_replace(education_decoded, "bachelor's degree", "college degree")) %>%
filter(!is.na(education_decoded)) %>%
filter(education_decoded != "unknown") %>%
group_by(year) %>%
add_count(name = "year_n") %>%
ungroup() %>%
group_by(year, education_decoded) %>%
add_count(name = "ed_level_n") %>%
select(year, education_decoded, year_n, ed_level_n) %>%
distinct(.keep_all = TRUE) %>%
mutate(percent = ed_level_n / year_n * 100) %>%
arrange(year)
education_comparison
## # A tibble: 101 × 5
## # Groups: year, education_decoded [101]
## year education_decoded year_n ed_level_n percent
## <dbl> <chr> <int> <int> <dbl>
## 1 2003 high school graduate or GED completed 149 36 24.2
## 2 2003 Pre-HS diploma 149 48 32.2
## 3 2003 college degree 149 30 20.1
## 4 2003 doctorate or professional degree 149 1 0.671
## 5 2003 master's degree 149 8 5.37
## 6 2003 some college credit, but no degree 149 26 17.4
## 7 2004 some college credit, but no degree 268 48 17.9
## 8 2004 high school graduate or GED completed 268 98 36.6
## 9 2004 Pre-HS diploma 268 71 26.5
## 10 2004 college degree 268 42 15.7
## # … with 91 more rows
## # ℹ Use `print(n = ...)` to see more rows
ggplot() +
geom_line(data=education_comparison, aes(x=year, y=percent, color=education_decoded)) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Percentage of Maternal Mortality by Education Level, 2003-2019",
x="Year",
y="Percent")
ggplot() +
geom_line(data=education_comparison, aes(x=year, y=ed_level_n, color=education_decoded)) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Count of Maternal Mortality by Education Level, 2003-2019",
x="Year",
y="Count")
ggplot() +
geom_line(data=edu_with_unknowns, aes(x=year, y=percent, color=education_decoded)) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Percentage of Maternal Mortality by Education Level, 2003-2019",
x="Year",
y="Percent")
Here we encounter some ambiguity in the analysis. If we include unknowns in education levels, we see that reporting on education improved substantially over this time period. Removing unknown and unavailable values, we see an echo of our graph comparing Whites and Blacks; a fall in representation among people who have not completed a high school diploma, plateauing of those with only a high school education, and a slight (albeit more jittery) increase among those with a 2- or 4-year degree.
Admittedly, looking in this direction was a bit of an intuitive leap following from the last sections. What does it imply that the only cohort that is trending downward as a percentage is the one without a high school diploma? It might mean that women too young to have completed high school are getting better access to healthcare. However, since maternal mortality is on the rise in the US in particular, it seems more likely that the average maternal age is increasing. Let’s take a look.
mort$age_cohort <- mort$age
mort <- mort %>%
mutate(age_cohort = case_when(age >= 10 & age <= 15 ~ "10-15",
age > 15 & age <= 20 ~ "16-20",
age > 20 & age <= 25 ~ "21-25",
age > 25 & age <= 30 ~ "26-30",
age > 30 & age <= 35 ~ "31-35",
age > 35 & age <= 40 ~ "36-40",
age > 40 & age <= 45 ~ "41-45",
age > 45 & age <= 50 ~ "46-50",
age > 50 & age <= 55 ~ "51-55",
age > 55 ~ "55+")) %>%
relocate(age_cohort, .after="age")
age_comparison <- mort %>%
group_by(year) %>%
add_count(name="year_n") %>%
ungroup() %>%
group_by(year, age_cohort) %>%
add_count(name="cohort_n") %>%
select(year, age_cohort, cohort_n, year_n) %>%
mutate(percent = cohort_n / year_n * 100) %>%
distinct(.keep_all=TRUE) %>%
arrange(year, desc(percent))
age_comparison
## # A tibble: 171 × 5
## # Groups: year, age_cohort [171]
## year age_cohort cohort_n year_n percent
## <dbl> <chr> <int> <int> <dbl>
## 1 2003 31-35 119 499 23.8
## 2 2003 36-40 92 499 18.4
## 3 2003 26-30 92 499 18.4
## 4 2003 21-25 85 499 17.0
## 5 2003 41-45 47 499 9.42
## 6 2003 16-20 35 499 7.01
## 7 2003 46-50 18 499 3.61
## 8 2003 51-55 8 499 1.60
## 9 2003 55+ 2 499 0.401
## 10 2003 10-15 1 499 0.200
## # … with 161 more rows
## # ℹ Use `print(n = ...)` to see more rows
ggplot() +
geom_line(data=age_comparison, aes(x=year, y=percent, color=age_cohort)) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Percent of Maternal Mortality by 5-Year Age Cohorts, 2003-2019",
x="Year",
y="Percent") +
theme(plot.title = element_text(hjust=0.5))
ggplot() +
geom_line(data=age_comparison, aes(x=year, y=cohort_n, color=age_cohort)) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Counts of Maternal Mortality by 5-Year Age Cohorts, 2003-2019",
x="Year",
y="Count") +
theme(plot.title = element_text(hjust=0.5))
This is a bit of a tangle, indicating that the 5-year delineations are perhaps too narrow. That said, some trends in the age groups are evident in a way that suggests we can cluster them more tightly. The groups 26-30 and 31-35 track each other in an interesting way. 21-25 is on the decline, so it might be better served as its own group. Very interestingly, it looks like 46+ could also be its own category, as these figures look to be even more tightly coupled than the first one mentioned. Let’s get this cleaned up a little.
mort$age_cohort <- mort$age
mort <- mort %>%
mutate(age_cohort = case_when(age >= 10 & age <= 15 ~ "10-15",
age > 15 & age <= 20 ~ "16-20",
age > 20 & age <= 30 ~ "21-30",
age > 30 & age <= 40 ~ "31-40",
age > 40 & age <= 50 ~ "40-50",
age > 50 ~ "50+")) %>%
relocate(age_cohort, .after="age")
age_comparison_compressed <- mort %>%
group_by(year) %>%
add_count(name="year_n") %>%
ungroup() %>%
group_by(year, age_cohort) %>%
add_count(name="cohort_n") %>%
select(year, age_cohort, cohort_n, year_n) %>%
mutate(percent = cohort_n / year_n * 100) %>%
distinct(.keep_all=TRUE) %>%
arrange(year, desc(percent))
age_comparison_compressed
## # A tibble: 105 × 5
## # Groups: year, age_cohort [105]
## year age_cohort cohort_n year_n percent
## <dbl> <chr> <int> <int> <dbl>
## 1 2003 31-40 211 499 42.3
## 2 2003 21-30 177 499 35.5
## 3 2003 40-50 65 499 13.0
## 4 2003 16-20 35 499 7.01
## 5 2003 50+ 10 499 2.00
## 6 2003 10-15 1 499 0.200
## 7 2004 21-30 234 546 42.9
## 8 2004 31-40 193 546 35.3
## 9 2004 40-50 53 546 9.71
## 10 2004 16-20 44 546 8.06
## # … with 95 more rows
## # ℹ Use `print(n = ...)` to see more rows
ggplot() +
geom_line(data=age_comparison_compressed, aes(x=year, y=percent, color=age_cohort)) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Percent of Maternal Mortality by 10-Year Age Cohorts, 2003-2019",
x="Year",
y="Percent") +
theme(plot.title = element_text(hjust=0.5))
ggplot() +
geom_line(data=age_comparison_compressed, aes(x=year, y=cohort_n, color=age_cohort)) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Counts of Maternal Mortality by 10-Year Age Cohorts, 2003-2019",
x="Year",
y="Count") +
theme(plot.title = element_text(hjust=0.5))
Simply making cohorts about 10 years wide yields a much stronger signal. It is worth noting that broader cohorts of 21-40 and 40+ follow similar patterns. I suspect this reflects similarities in treatment of these kinds of pregnancy. For instance, there may be similar health concerns in women over 40 overall that doctors would be looking out for, or IVF treatments and their attendant care could be the same. This warrants further investigation. On this particular occasion, looking at percentages alone was actually muddying the signal. Here we have been seeing an increase in mortality cases in both 21-30 and 31-40 cohorts, and a decrease in cohorts over 40. Women having children in their 30s is consistently the largest share of these cases and has been since 2014.
To put further emphasis on this point, we can chart summary statistics for maternal ages. Below, we plot median and average ages, which more closely track recent changes in maternal mortality than anything that has been examined so far.
ages <- mort %>%
group_by(year) %>%
filter(!is.na(age)) %>%
summarise(count = n(),
med_age = median(age),
avg_age = mean(age),
sd_age = sd(age))
ages
## # A tibble: 17 × 5
## year count med_age avg_age sd_age
## <dbl> <int> <dbl> <dbl> <dbl>
## 1 2003 499 32 32.1 8.20
## 2 2004 546 30 31.1 8.92
## 3 2005 629 31 31.6 8.58
## 4 2006 575 30 31.1 8.43
## 5 2007 551 31 31.5 8.30
## 6 2008 663 31 31.7 8.55
## 7 2009 685 30 31.8 9.40
## 8 2010 676 31 32.1 8.65
## 9 2011 769 32 33.4 9.36
## 10 2012 790 33 34.1 9.63
## 11 2013 868 34 35.1 10.2
## 12 2014 860 35 35.4 9.72
## 13 2015 834 35 35.8 10.1
## 14 2016 867 35 35.6 9.83
## 15 2017 832 33 34.6 9.31
## 16 2018 660 33 32.5 6.73
## 17 2019 754 32 31.9 6.57
ggplot() +
geom_line(data=ages, aes(x=year, y=med_age)) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Median Age of Maternal Mortality, 2003-2019",
x="Year",
y="Median Age") +
theme(plot.title = element_text(hjust=0.5))
ggplot() +
geom_line(data=ages, aes(x=year, y=avg_age)) +
scale_x_continuous(breaks = seq(2003, 2019, len=5)) +
labs(title="Average Age of Maternal Mortality, 2003-2019",
x="Year",
y="Median Age") +
theme(plot.title = element_text(hjust=0.5))
And with this, we can plot the correlation coefficient. This measures the statistical strength of the relationship between two variables.
y <- annual_total$per_100k
x <- ages$avg_age
death_vs_age <- tibble(deaths = x, avg_ages = y)
correlation <- round(cor(x, y), 2)
ggplot(death_vs_age, aes(x=deaths, y=avg_ages)) +
geom_point() +
geom_smooth(method=lm) +
labs(title="Correlation Between Average Maternal Age and Annual Maternal Mortality Rate",
x="Average Maternal Age",
y="Maternal Mortality") +
annotate(geom="text", x=32, y=22,
label=str_interp("Correlation is ${correlation}"))
## `geom_smooth()` using formula 'y ~ x'
The high correlation between advancing maternal age and the mortality rate begs the question, why is maternal age increasing in the first place? My first thought is that anxiety around the economy fails to inspire confidence in parenthood in the first place. Looking at fertility numbers, we notice a sharp decline in fertility in the US after the 2008 housing crisis.
It is worth noting that the fertility rate has never recovered to its pre-2008 level. Keeping in mind that we see a decrease in representation of women who have not attained a high school diploma in cases of maternal mortality and an increase in maternal age, there is every possibility that women tend to wait longer until they have children due to uncertainty around both the economy and their power within it. That is, the 2008 housing crisis broke the general trust and safety people feel around becoming parents, and they were never established again.