Abstract
I use an instrumental-variables identification strategy and historical data from the United States to estimate the long-term economic impact of in utero and postnatal exposure to malaria. My research design matches adults in the 1960 Decennial Census to the malaria death rate in their respective state and year of birth. To address potential omitted-variables bias and measurement-error bias, I use variation in “malaria-ideal” temperatures to instrument for malaria exposure. My estimates indicate that in utero and postnatal exposure to malaria led to considerably lower levels of educational attainment and higher rates of poverty later in life.
I. Introduction
The argument that in utero and postnatal health conditions have a lasting impact on an individual’s economic well-being rests on two premises: First, “early-life” health conditions predetermine one’s health capital later in life; and second, health capital is an important determinant of economic capabilities throughout one’s life.1 The logic of this argument notwithstanding, the extent to which early-life health affects long-term economic well-being is unclear given the paucity of research on the subject. To help clarify the importance of early-life health conditions, this paper estimates the long-term economic impact of in utero and postnatal exposure to malaria. Exposure to malaria during this narrow time frame has an acute impact on health. As such, understanding the consequences of such exposure will offer important insights into the significance of early-life health conditions over the life cycle.
The main challenge with estimating the effects of any early-life health shock (including exposure to malaria) is that omitted factors, such as socioeconomic conditions, are likely to bias observational studies toward overstating the importance of health. A number of recent studies have developed creative approaches to mitigate this potential omitted variables bias. For example, to test the long-term impact of unfavorable in utero health conditions, Almond (2006) compares long-term outcomes for cohorts born immediately subsequent to the 1918 influenza epidemic to the outcomes of cohorts born before or one year after the epidemic.2 Meng and Qian (2006) and Chen and Zhou (2007) test the long-term effects of early-life malnutrition as a result of being born during China’s Great Famine (circa 1951–61).3 Also, Black et al. (2007), Oreopoulos et al. (2006), and Royer (2009) estimate the long-term effects of in utero health conditions using differences between twins in order to control for observable and unobservable family-level factors.4 However, the significance of early-life health conditions remains controversial because existing studies rely on a relatively limited set of identification strategies.
The key contribution of this paper is to develop a novel identification strategy to evaluate the long-term economic consequences of in utero and postnatal exposure to malaria. Specifically, I estimate the long-term effects of malaria using historical data from the United States and an instrumental-variables (IV) identification strategy. My research design matches outcomes of cohorts in the 1960 Census to the malaria death rate in their respective state and year of birth. To allay concerns of omitted-variables bias and measurement-error bias, I use within-state variation in “malariaideal” temperatures, or temperatures between 22°C and 28°C, to instrument for malaria exposure at the time of birth. (As I discuss in greater detail in the Background section, these temperatures are favorable to the Anopheles mosquito, which is the vector for the malaria parasite.)
The United States experience with malaria presents an excellent opportunity to study the long-term effects of early-life health conditions for four reasons. First, malaria’s physiological impact is acute during the in utero and postnatal periods relative to exposure at other ages.5 Second, there is a large treatment group from which to estimate the effects of exposure since malaria was a serious public-health problem in the American South during much of the early 20th century. Conversely, cohorts born in states outside the South can be used as a control group since it is unlikely they were ever exposed to malaria. Third, detailed vital statistics, climatic, and census data are publicly available for the United States. Fourth, the scope for selection bias is limited by the fact that malaria’s fatality rate was relatively low in the United States.6
Moreover, my IV research design has several important features that enable me to address concerns that my instrument is correlated with important omitted variables. Using a repeated cross-section of state-year observations allows me to control for time-invariant omitted variables, like geography, by including state fixed effects. I also can control for year fixed effects and state-specific time trends to account for any spurious time-series correlations between climatic changes and changes in birth conditions across states over time.
As a test of my exclusion restriction, I include controls for temperatures that are ideal for agriculture as well as controls for extreme temperatures that may be harmful to the human physiology. Also, I test the effects of exposure to malaria-ideal temperatures in states where malaria was not a public-health problem as a test of my exclusion restriction. Similarly, by using the 1980 Census, I estimate the effects of being exposed to malaria-ideal temperatures in states where malaria was once present after malaria was mostly eradicated.
In addition to contributing to the early-life health literature, the fact that malaria continues to be a serious public-health problem today in developing countries underscores the importance of my research. Every year worldwide approximately 500 million people contract malaria and more than one million people die from the disease (WHO 2009). Also, countries with a high incidence of malaria have significantly less economic growth, all else equal (Gallup and Sachs 2001). My work contributes to the evaluation of malaria-abatement policies, especially since the United States’ malaria experience (circa early 20th century) is comparable to many modern-day malaria-afflicted countries.7 For example, Mississippi had close to 50 malaria deaths per 100,000 inhabitants in 1921 (NCHS 1924) and Kenya had approximately 60 malaria deaths per 100,000 inhabitants in 2002 (WHO 2004).
Furthermore, the development of my identification strategy in itself is a key contribution to the malaria literature. Thus far, this literature has relied on malaria eradication campaigns or cross-sectional identification strategies to estimate the effects of malaria. For example, Bleakley (2003), Lucas (2005), and Bleakley (2006) find that malaria-eradication efforts led to long-term economic gains for children living in areas where malaria was once present.8 Using a cross-section of predicted malaria incidence across the United States in the late 19th century, Hong (2007) demonstrates that there were significant long-term economic consequences for individuals who were born in malaria-inflicted counties.9 My work complements the approaches used in these existing studies in that I rely on a different set of identifying assumptions.
The first-stage results indicate that daily mean temperatures between 22°C and 28°C are a strong predictor of malaria death rates within the United States during the early 20th century. For example, the estimated coefficient is statistically significant at the 1 percent level with an F−statistic close to ten in the core specification.
The IV estimates provide evidence that malaria exposure results in lower levels of income and higher rates of poverty. For example, the core IV estimate for log income indicates that exposure to an environment with ten additional malaria deaths per 100,000 inhabitants causes a 13 percent reduction in income. Although my income estimates are not statistically significant at conventional levels, I find that malaria exposure has a statistically significant impact on the probability of being below 150 percent of the poverty level.
I also find that in utero and postnatal exposure to malaria significantly reduces educational attainment. For example, exposure to an environment with ten additional malaria deaths per 100,000 reduces years of schooling by approximately 0.4 years, and reduces the probability of attaining at least 12 years of education by 5.1 percentage points (significant at the 5 and 10 percent level, respectively). Furthermore, the magnitude of my schooling estimates suggest that in utero and postnatal exposure to malaria can account for as much as 25 percent of the difference in long-term educational attainment between cohorts born in malaria-afflicted states and cohorts born in nonafflicted states during the early 20th century.
Finally, there is some evidence that the long-term impacts are more severe for those exposed to malaria at the time of birth relative to those exposed to the disease in their second or third year of childhood. This result suggests that health policies targeting pregnant women and infants may have higher returns than those policies that target broader age groups.
II. Background on the Relationship Between Temperature and Malaria
Temperature affects the malaria life cycle in a variety of ways. Malaria is a parasite that only can be transmitted between humans via female Anopheles mosquitoes. These mosquitoes, which originate from larvae, first acquire the parasite when taking a blood meal from an infected human host. The malaria parasite then develops in the mosquito’s stomach, a process called sporogeny. Thereafter the mosquito can infect a new human host when feeding again. The temperature-malaria relationship exists because larval development, mosquito survival, and sporogeny are all functions of the temperature.
Moreover, the temperature-malaria relationship is complex since larval development, mosquito survival, and sporogeny are each nonlinear functions of temperature. For example, the proportion of larvae that reach adulthood (and become potential vectors for the malaria parasite) is greatest when temperatures are between 20°C and 28°C (Bayoh and Lindsay 2003).10 Adult mosquito survival rates are relatively high until the temperature reaches 34°C; after which, mosquito survival rates decrease at an increasing rate until the probability of mosquito survival is nil at 40°C (Craig et al. 1999). Also, the rate at which sporogeny is completed is an increasing function of temperatures between 15°C and 40°C; consequently, most mosquitoes will not survive long enough to transmit malaria if temperatures are below 22°C (Craig et al. 1999).11
These numerous nonlinear effects complicate efforts to model the temperature-malaria relationship. To simplify things, previous epidemiological studies have ignored much of the larval development stage of the malaria life cycle, and instead have focused on modeling the probability that an adult mosquito would survive long enough to transmit malaria at least once.12 Although the simplicity of these models is appealing, they overstate the importance of temperatures above the 28–30°C range since they neglect to fully account for larval development (Martens et al. 1995). Furthermore, some of their structural-form assumptions are questionable since they are derived from laboratory studies.
This paper follows a different tack than most epidemiological studies when modeling malaria transmission. I estimate the malaria death rate to be a function of the fraction of the year that the daily mean temperature is between 22°C (71.6°F) and 28°C (82.4°F), or the temperature “window” where malaria transmission is thought to be least constrained by temperature. As such, my model provides for easy interpretation and a transparent set of functional form assumptions. In addition to being grounded in epidemiology, below I show that the fraction of the year with temperatures between 22°C and 28°C is, in fact, a strong predictor of the malaria death rate in the United States relative to other temperatures (see Figure IV).
Furthermore, some precipitation and/or the presence of permanent or semipermanent bodies of water (like lakes, swamps, or streams) are necessary for malaria transmission because mosquito larvae require standing water to survive until adulthood (Craig et al. 1999). This epidemiological facet of the malaria life cycle argues for using a model that interacts temperature with rainfall and/or interacts temperature with some proxy for the presence of standing water (such as the fraction of the state’s population living in a county with a swamp). Although I find some empirical evidence to support the inclusion of such interactions, my results are qualitatively similar without this added complexity.13 As such, I model separable effects for temperature and rainfall for the sake of simplicity.
III. Empirical Methodology
A. Ordinary Least Squares (OLS)
My base OLS specification is of the following form:
(1)
where Y is some average outcome, such as years of schooling or log of income, for the cohort born in state j and year t; MALARIA is the malaria death rate in state j and year t; γ is a set of state of birth fixed effects that account for any time-invariant differences between states (like geography) that also may be correlated with MALARIA and Y; X is a vector of control variables (that are discussed below); δ is a set of year of birth fixed effects, which account for any year-to-year changes in birth conditions that occur for the whole United States that are possibly correlated with MALARIA (for example, business cycles); and αj · YEAR is a set of state-specific linear time trends that address the concern that the observed (downward) trends in MALARIA within some states is spuriously correlated with a gradual convergence in outcomes across states over time. To account for the possibility that the stochastic error terms (ε) are correlated within states, I cluster the standard errors on the cohort’s state of birth.
Despite the inclusion of state fixed effects, year fixed effects, and state-specific linear time trends, estimates of β may be biased if MALARIA is: (a) correlated with important unobservable factors, and/or (b) a noisy proxy for the true incidence of malaria.14 The omitted variables are likely to bias estimates of β toward overstating the adverse effects of malaria exposure since the disease is positively associated with poverty. The measurement error may bias the estimator toward zero (in the classical case).15 However, the net effect of these two biases is uncertain, prima facie.
B. Two-Stage Least-Squares (2SLS)
In order to address the potential biases in OLS, I instrument the malaria death rate with the fraction of days with daily mean temperatures between 22°C and 28°C (hereafter “malaria-ideal temperatures”) for the set of states where malaria was a serious problem. Specifically, I allow the effects of malaria-ideal temperatures to vary depending on whether a state has one of the 15 highest malaria death rates in 1920 according to Maxcy (1923).16 These 15 states (hereafter “high malaria states”) all had at least one malaria death per 100,000 inhabitants and accounted for an estimated 95 percent of all malaria deaths that occurred in the United States in 1920.
Although malaria primarily afflicted the 15 high malaria states during the early 20th century, I include the remaining states and the District of Columbia (hereafter “low malaria states”) in my sample in order to more precisely estimate the year-of-birth fixed effects while absorbing relatively less of the identifying variation.17
In sum, I propose the first-stage specification to be of the form:
(2)
where MALARIA is the malaria death rate for the population of state j in year t, TEMP2228 is the fraction of year t that state j had a mean daily temperature between 22°C and 28°C, HIGH is an indicator variable for whether state j was one of the 15 high malaria states; ψ is a set of state fixed effects; λ is a set of unrestricted year fixed effects; τj · YEAR is a set of state-specific linear time trend; and v is a stochastic error term. Finally, X is a vector of weather-related control variables for state j and year t. Specifically, X consists of the average daily precipitation (PRCP) and the average of the daily precipitation squared (PRCPSQD) in year t and state j both as main effects and interacted with the high malaria state indicator.18
Finally, I estimate Equation 1 by 2SLS using the predicted values of MALARIA from Equation 2.19
C. Semiparametric Temperature-Malaria Model
I also test the effects of other “less ideal” temperature ranges on the malaria death rate. To accomplish this, I estimate the following equation via OLS for the sample of high malaria states:
(3)
This model shares some similarities to Equation 2 in that it includes a set of state fixed effects (d), state-specific time trends (hj · YEAR), and controls for precipitation and precipitation squared (X). However, I do not include year-fixed effects due to lack of statistical precision.20 Importantly, TEMP now denotes the fraction of the year t in state j that is in one of five temperature ranges/bins (r); specifically, I estimate the effects of temperatures below 0°C, between 0°C and 8°C, between 8°C and 15°C, between 22°C and 28°C, and above 28°C, respectively, with temperatures between 15°C and 22°C as the omitted category.21
D. Conceptual Issues
There are three main conceptual issues and/or limitations with my identification strategy worth noting. First, I cannot discern the specific mechanism through which cohorts are affected by malaria. That is, I may be measuring the indirect effects of being born in a state where, for example, their parents (or neighbors) were infected.22 Additionally, there may be peer effects, which would likely reinforce any direct effect of exposure, and/or other general equilibrium effects.23 Given data limitations, I am unable to separate these different mechanisms.24
Second, my core identification strategy does not distinguish between in utero and postnatal exposure because the identifying variation is at the year-of-birth level. To disentangle the separate effects of in utero and postnatal exposure, I have estimated a reduced-form model using quarterly variation in malaria-ideal temperatures along with state by year by quarter of birth cells. The results of this model, which are available upon request, are too imprecise to offer any substantive conclusions.25
Third, there is some concern that my estimates may understate the effects of contracting malaria early in life because physically weaker fetuses and infants may die from the disease, and consequently, may be excluded from my sample. To explore the scope of the selection effect, I estimate the effects of malaria on the log of the cohort size. There is little evidence to support the concern that my estimates are biased by a selection effect.
IV. Data
A. Malaria Mortality Data
The data on malaria deaths was constructed from various volumes of the Mortality Statistics of the United States from the National Center for Health Statistics (NCHS) archives. The NCHS reports malaria deaths at the state-year level from 1900 through 1941. Using historical population estimates from the Bureau of the Census, I construct malaria death rates by dividing the malaria mortality counts by the state-year population (in 100,000s). Although annual malaria data is available through 1941, I restrict my sample to the years 1900 through 1936 because malaria death rates declined significantly starting in 1937 for reasons unrelated to my identification strategy.26 In addition, malaria deaths are not available from the NCHS for all state-years between 1900 and 1936 because states began reporting mortality statistics for the first time at different points over this period. The NCHS sample of malaria death rates total 1,147 state-year observations.
B. Climatic Data
Daily minimum and maximum ground temperature and daily precipitation by weather station were obtained from the National Climatic Data Center (NCDC). Unlike the malaria mortality data, I have weather information for at least one weather station for all state-years from 1900 through 1936.27 Over the period from 1900 through 1936, there are approximately 900 weather stations nationwide on average, and about 20 weather stations per state on average. The full sample of NCDC state-year observations total 1,813.
I construct state-year weather variables from the daily weather-station data by taking an inverse-distance population weighted average of the weather-stations within 100 miles of a given state. For example, I construct my measure of malaria-ideal temperatures (TEMP2228), the fraction of days within state j and calendar-year t that the daily mean temperature is between 22°C and 28°C, as follows:
(4)
where DUM2228 is a dummy variable that takes a value of one if the mean temperature is between 22°C and 28°C at weather station s, on day d of year t.28 D is the total number of days (that is, 365 or 366) in year t. S is the number of (reporting) weather stations in or within 100 miles of state j in year t. I weight all weather station observations by ω, an inverse-distance weight of the state-county population within 100 miles of the weather station.29 All weather variables (including precipitation) are aggregated to the state-year level in a similar fashion to TEMP2228.
C. Census Data
Data on adult outcomes was compiled using the 1960 Decennial Census (Ruggles et al. 2004). I create state and year of birth cell averages for those born between 1900 and 1936, or those who were approximately 24 to 60 years of age at the time of the 1960 Census.30 Cohort outcomes include: years of schooling, attained at least 12 years of education, log of total personal income, is below 150 percent of the poverty line, worked 50 or more weeks last year, worked 35 or more hours per week last year, employed at time of census, fraction female, and log of cohort size.31
The cohort cells are merged with the malaria and weather data by state and year of birth.
D. Data Limitations
It is important to note that the geographic unit of observation is at the state level due to data constraints. Malaria deaths are not available at the county level, nor are county of birth identifiers available in the 1960 Census. As such, it is not possible to use a finer geographic area (such as county).
Furthermore, to make maximum use of available data, both the OLS and IV specifications use an unbalanced panel of state-years for which I have malaria death rates. As a robustness check, I estimate a reduced-form model for the balanced panel of cohorts born between 1900 and 1936.
V. Descriptive Evidence and First Stage Results
As Figure 1 illustrates, malaria was a public health problem in the United States that was particular to the American South during the early 20th century. For example, the states with at least one malaria death per 100,000 inhabitants in 1920 were all located within (or adjacent to) the South. Moreover, Arkansas, Florida, Mississippi, and Louisiana all had between 30 and 40 malaria deaths per 100,000 inhabitants in 1920. Because approximately 200 to 400 infections are associated with each malaria death, Figure 1 suggests that malaria may have infected as many as one in eight people in the most malaria-afflicted states. Both socioeconomic conditions (for example, housing conditions, nutrition, education levels) and climatic differences between the South and the rest of the United States can likely account for the cross-region differences in malaria death rates (Humphreys 2001).
In addition to the significant cross-state (or cross-region) differences in malaria death rates, there is also ample within − state variation in the malaria death rate over time. For example, Figure 2 illustrates the malaria death rate between 1918 and 1941 for the four “most” malaria-afflicted states: Arkansas, Florida, Mississippi, and Louisiana.32 These four states all had malaria death rates that nearly doubled over a two- or three-year period between 1918 and 1941. Also, malaria rates were declining over this period in these four states.33 As a reminder, I restrict my core sample to the years between 1900 and 1936 because malaria death rates began approaching zero starting in 1937.
Figure 2 also highlights the fact that much of the within-state variation in malaria death rates follows a similar pattern across states. For example, Arkansas, Mississippi, and Louisiana experienced similar a spikes in malaria death rates in 1933. Both socioeconomic conditions and the weather are potential candidates to explain this within-state variation because they are correlated within geographic regions over time.
Cursory evidence suggests that socioeconomic conditions cannot explain much of the year-to-year variation in the malaria death rate within malaria-afflicted areas of the United States. For example, in Florida and Mississippi malaria deaths increased during a time of relative economic prosperity (1928), declined after the onset of the Great Depression (1930), and surged again in the midst of the Great Depression (1933) (Figure 2).34
Malaria-ideal temperatures, unlike socioeconomic conditions, can explain both the cross-state and the within-state variation in malaria death rates. Table 1 shows that high malaria states had approximately twice as many days per year between 22°C and 28°C as low malaria states on average. Conversely, low malaria states had over four times as many days below 8°C as high malaria states. Also, the difference in precipitation levels between high and low malaria states is less stark. For example, high malaria states had 0.12 inches of precipitation and low states had 0.10 inches of precipitation on average per day between 1900 and 1936.
Moreover, Figure 3 demonstrates that malaria-ideal temperatures are positively correlated with the malaria death rate within the four most malaria-afflicted states. For example, changes in the fraction of the year between 22°C and 28°C mirror changes in the malaria death rate in Mississippi and Florida between 1928 and 1936. Consequently, using a within-state IV identification strategy to estimate the long-term effects of malaria exposure seems promising.
The OLS estimates in Table 2 indicate that the fraction of the year with daily mean temperatures between 22°C and 28°C, in the 15 high malaria states, is a strong predictor of the malaria death rate. For example, the Column 1 specification (without any weather controls) indicates that a 10 percentage point increase in the fraction of the year with malaria-ideal temperatures in high malaria states causes an additional 2.9 malaria deaths per 100,000 inhabitants (33.3 minus 3.4 times 0.1). Also, the magnitude of the effect of malaria-ideal temperatures is large; for example, an increase of 2.9 malaria deaths per 100,000 inhabitants represents almost 30 percent of the average malaria death rate in high malaria states (that is, 9.5 malaria deaths per 100,000 inhabitants). Furthermore, the estimated effect for high malaria states is statistically significant at the 1 percent level. As expected, variation in the fraction of the year with malaria-ideal temperatures in low malaria states has no statistically distinguishable effect on the malaria death rate.
My estimates are qualitatively similar when I control for precipitation and/or precipitation squared.35 Although somewhat smaller in magnitude, my estimates are robust to controlling for a temperature-precipitation interaction term or including state-specific quadratic time trends.36 In addition, my results are similar when I restrict my sample to the set of high malaria states (results not reported).
I find that malaria-ideal temperatures have no statistically significant or economically meaningful impact on the malaria death rate in low malaria states. Therefore, to improve the efficiency of my identification strategy I drop the main effect (TEMP2228) in my preferred specification (Column 6).37 (However, I do include both TEMP2228 and TEMP2228 · HIGH as regressors in several reduced-from models to verify that malaria-ideal temperatures do not have an impact on long-term outcomes in states where malaria was not a serious problem.) The fact that the F−statistic on TEMP2228·HIGH in my preferred specification is approximately 9.6 mitigates “weak instrument” concerns.
Finally, Figure 4 demonstrates that temperatures between 22°C and 28°C are a good predictor of the malaria death rate relative to several other temperature ranges (such as above 28°C). Figure 4, which presents the parameter estimates of the semiparametric temperature-malaria model (Equation 3), shows that a 10 percentage point increase in the fraction of the year between 22°C and 28°C causes the malaria death rate to increase by approximately 3.3 deaths per 100,000 inhabitants relative to temperatures between 15°C and 22°C. Moreover, the estimated coefficient on temperatures between 22°C and 28°C is statistically significant from zero at the 1 percent level, while the estimated coefficients on the other temperature ranges are not significantly different from zero at conventional levels.
VI. Effects on Long-Term Economic Outcomes
A. Main Results
Cohorts born in high malaria states have significantly worse long-term outcomes on average than those born in low malaria states, according to the 1960 Census (Table 3). For example, cohorts born in high malaria states had 9.71 years of schooling while cohorts born in low malaria states had approximately 11.15 years of schooling on average. High school completion rates were around 40 percent and 56 percent for high malaria and low malaria state cohorts, respectively. Income and earnings are approximately 25 percent lower for high malaria state cohorts. Furthermore, high malaria state cohorts also have a much larger black population (that is, 20 percent versus 2 percent). The magnitude of the disparity in outcomes and characteristics illustrates the importance of identifying factors (such as variation in malaria-ideal temperature) that exogenously shift the risk of malaria exposure.
The OLS estimates indicate that there is little or no relationship between educational attainment and the malaria death rate, but there is a statistically significant negative relationship between the malaria death rate and other important indicators of economic well-being (Table 4). For example, ten additional malaria deaths per 100,000, which is just above the average malaria death rate in high malaria states, is associated with approximately 11 percent less income and a 0.8 percentage point increase in the probability of being below 150 percent of the poverty line. However, as mentioned above, the OLS estimates may not provide reliable estimates of the causal impact of malaria exposure on account of omitted variables and measurement error.
My core IV estimates provide extensive evidence that malaria exposure has a significant impact on educational attainment (Table 4). For example, the estimated coefficient on the malaria death rate is negative and statistically significant at conventional levels for years of schooling, probability of attaining eight years of schooling, and probability of attaining 12 years of schooling.38 The estimated effects on the educational attainment variables are large in magnitude; for example, exposure to ten additional malaria deaths per 100,000 inhabitants causes a cohort to have approximately 0.4 fewer years of schooling and a 5.1 percentage point less chance of having at least 12 years of education. Also, I find that malaria exposure has no adverse effect on the probability of attaining at least 16 years of schooling, which is consistent with the fact that malaria, like many diseases, impacted more economically disadvantaged populations.
Although my point estimates suggest that malaria exposure adversely affects other measures of long-term economic well-being, large standard errors preclude making strong conclusions in most cases. For example, exposure to ten additional malaria deaths per 100,000 inhabitants reduces income by approximately 13 percent; however, I cannot rule out an equally large positive effect on income.39
I do find statistically significant effects on the probability of being below 150 percent of the poverty line and on the probability of having worked 50 or more weeks. For example, exposure to ten additional malaria deaths per 100,000 inhabitants causes a 3.8 percentage point increase in the probability of being below 150 percent of the poverty line (significant at the 10 percent level).40
Importantly, my IV estimates indicate that selection bias is not a significant concern. That is, the estimated impact of the malaria death rate on the log of the cohort size is small, positive, and not statistically significant at conventional levels.41
B. Discussion
In sum, educational attainment is the most noticeable mechanism through which in utero and postnatal exposure to malaria affects long-term outcomes. My estimates indicate that malaria exposure can account for much of the differences in schooling between high malaria and low malaria state cohorts. For example, the difference in the malaria death rate was approximately 8.9, and the difference in years-of-schooling was approximately 1.4 years. Using my years-of-schooling point estimate from Table 4, I find that exposure to malaria can account for approximately 25 percent of the difference in years of schooling between cohorts born in high and low malaria states (8.9 times 0.04 divided by 1.4).42 Furthermore, based on malaria’s fatality rate, I estimate that one additional case of malaria translates into approximately ten fewer years of schooling.43
Although the estimated impacts on schooling are large, they are generally consistent with previous malaria studies that examine the long-term effects of exposure during the broader time horizon of childhood. For example, Bleakley (2003) finds that the average cost of one year of exposure during childhood reduced schooling levels by approximately 0.05 years in the United States; my estimate is about eight times larger, suggesting that exposure during the in utero and postnatal period may be much more important than exposure during the “average” childhood year. Also, Lucas (2005) finds that a 10 percentage point reduction in malaria during childhood increased years of schooling by about 0.10 years in Paraguay, Sri Lanka, and Trinidad. According to my estimates, a 10 percentage point reduction in malaria deaths within high malaria states (that is, one death per 100,000 inhabitants) would lead to approximately 0.04 more years of schooling, or four-tenths of Lucas’ estimate.
Finally, there is mixed evidence to suggest that measurement error may be biasing my OLS estimates more than any omitted variables. That is, for most (but not all) outcomes my IV estimates are larger in magnitude than my OLS estimates. For example, the IV estimates in Table 4 for years of schooling and probability of attaining at least 12 years of schooling are much larger (in absolute terms) than my OLS estimates.44 However, the IV estimates are smaller in magnitude than OLS for a few outcomes (such as probability of being employed). As such, I cannot offer any strong conclusions regarding the importance of measurement-error bias in relation to omitted-variables bias.
VII. Robustness checks
A. Controlling for Extreme Temperatures and Agriculture-Ideal Temperatures
There are two noteworthy mechanisms through which malaria-ideal temperatures may affect outcomes irrespective of malaria, which would bias my estimate of the causal impact of malaria exposure. First, malaria-ideal temperatures may be correlated with extreme temperatures that are generally unfavorable to the human physiology. For example, epidemiological evidence suggests that both “cold” weather and “hot” weather are a potential danger to human health.45 To address these concerns, I control for the frequency of days with temperatures below 8°C, and the frequency of days above 28°C in some specifications.
Second, malaria-ideal temperatures are likely correlated with crop yields, which may indirectly affect human health conditions (for example, via improved nutritional intake or additional farm income). For example, Deschênes and Greenstone (2007a) (hereafter DG) posit that agriculture product increases linearly as the temperatures increase between 8°C and 32°C. Schlenker and Roberts (2008) (hereafter SR) find that crop yields increase as the temperature increases between 8°C and some threshold value (for example, the threshold is 32°C for cotton); after which, crop yields begin to decline as the temperature increases. As such, my estimates of the causal impact of malaria may be biased since temperatures between 22°C and 28°C lie at the upper limit of “agriculture-ideal” temperatures.
To address the concern that malaria-ideal temperatures are correlated with crop yields, I include two sets of controls. First, I control for weather that is ideal for most types of agriculture, as defined by DG. Specifically, I construct this control by integrating the daily mean temperature between 8°C and 32°C during the “growing season”; all days with temperatures above 32°C are assigned a value of 24.46 This control variable is known hereafter as “degree days (8–32°C)”.
Second, because malaria-afflicted states are also cotton-producing states, I control for cotton-ideal temperatures as described by SR.47 This control is constructed in a similar fashion to the degree days variable above, except I integrate the fraction of the day spent in each 1°C interval between 8°C and 32°C during cotton’s “growing season” (following SR); importantly, I include a separate variable that measures the fraction of the growing season that was spent at or above 32°C.48 These controls are known hereafter as “cotton-ideal temperatures.”
Table 5 presents the results of these robustness checks. Column 1 controls for potentially harmful “cold” and/or “hot” temperatures, that is, temperatures below 8°C and temperatures above 28°C. Columns 2 and 3 control for “degree days (8–32°C)” and “cotton-ideal temperatures,” respectively. Column 4 includes all the aforementioned controls. Finally, Column 5 controls for state-specific quadratic time trends. My estimates are generally robust to the inclusion of these controls, although the estimates are imprecise when state-specific quadratic time trends are included.
B. Reduced-Form Estimates, Pre- and Post-Eradication
As an additional test of my exclusion restriction, I examine the effects of malaria-ideal temperatures in low malaria states using a reduced-form model. Also, I estimate the relationship between malaria-ideal temperatures and adult outcomes for those cohorts born subsequent to malaria’s demise (circa 1937–56) using the 1980 Census.49 However, urbanization, migration, and the introduction of social-insurance and crop-insurance programs that occurred in the 1930s weakens the validity of the latter test.
There are three noteworthy results from this reduced-form analysis: First, the estimates shown in Panel A (Table 6) for TEMP2228· HIGH and TEMP2228 suggest that malaria-ideal temperatures adversely affect educational attainment and the probability of being below 150 percent of the poverty line for the cohorts born in high malaria states prior to malaria’s decline. Furthermore, these reduced-form estimates, which use the full panel of state-years, are consistent with my core IV estimates, which rely on an unbalanced panel.
Second, the estimates shown in Panel A for TEMP2228 suggest that malaria-ideal temperatures had mostly no effect on long-term outcomes for cohorts born in low malaria states during the 1900–36 period. However, malaria-ideal temperatures may have actually led to a small improvement in income for the cohorts born between 1900 and 1936 in low malaria states. The income result is not a first-order concern because the magnitude of the effect is small and it suggests I am potentially underestimating the effects of malaria exposure.50
Third, the estimates shown in Panel B for TEMP2228· HIGH and TEMP2228 suggest that malaria-ideal temperatures had little or no effect on most outcomes for cohorts born in high malaria states subsequent to malaria’s demise. There is a statistically significant and economically meaningful relationship between the probability of cohorts attaining at least 12 years of education and malaria-ideal temperatures. However, once I restrict my sample to the cohorts born between 1941 and 1956, the estimated probability of obtaining 12 or more years of education is diminished and no longer statistically significant at conventional levels. This result can be explained by the fact that cohorts born between 1936 and 1940 were still exposed to modest levels of malaria (Faust and Hemphill 1948).
C. Different Samples
Table 7 presents the IV estimates separately for white males, white females, black males, and black females. My estimates are qualitatively similar for white males, white females, and black females. Although my point estimates for black males are generally smaller than for the other three samples, large standard errors preclude making any strong conclusions regarding the different effects of malaria exposure across these demographic groups.
I also test the effects of restricting to the sample of cohorts born in high malaria states. The qualitative conclusions are unchanged.
D. Exposure During First, Second, and Third Years of Life
I estimate a reduced-form model to test the effects of exposure to malaria-ideal temperature in the first, second, and third years of life in Table 8.51 Although statistically imprecise, these estimates indicate that exposure during one’s year of birth (t) is more important than exposure during the second (t+1) and third (t+2) years of life.52 This result suggests that health policies that target pregnant women and infants may have higher a return than those polices that target somewhat broader age groups. However, statistical imprecision precludes making strong conclusions in this regard.
E. Other Checks (Not Reported)
I have verified that my results are not sensitive to slight modifications in my identification strategy. Specifically, I have tried redefining “malaria-ideal” temperatures to be between 21°C and 29°C or between 20°C and 30°C. I also have tried restricting my sample to cohorts born between 1900 and 1930, and cohorts born between 1910 and 1936, in two separate checks. My estimates are robust to these modifications.53
VIII. Conclusion
Using historical data from the United States (circa 1900–36) and an instrumental-variables identification strategy, this paper estimates the long-term effects from exposure to malaria during the crucial in utero and postnatal periods. After instrumenting malaria exposure with malaria-ideal temperatures, I conclude that cohorts exposed to malaria in their year of birth have significantly lower levels of educational attainment as adults. I also find that malaria exposure led to a higher probability of being below 150 percent of the poverty level. Although imprecise, my estimates also indicate that malaria exposure has an adverse impact on income levels.
Moreover, I estimate that in utero and postnatal exposure to malaria can explain as much as 25 percent of the difference in educational attainment between those born in high malaria states and those born in the low malaria states during the early 20th century. The magnitude of this effect suggests there are significant economic ramifications over the life course from early-life exposure to adverse health conditions.
Today, malaria is a serious public-health problem in many developing countries, most of which are in Africa. Although the disease is closely linked to poverty and poor economic growth, the specific mechanisms through which malaria impacts long-term economic outcomes have not been well established. Identifying these mechanisms is important for developing policies with lasting economic returns. Using the United States’ experience, my results indicate that protecting pregnant women and infants from malaria may be an effective tool for improving economic outcomes in malaria-afflicted countries.
Footnotes
Alan Barreca is an Assistant Professor at the Department of Economics, Tulane University. He is indebted to his dissertation chairs Hilary Hoynes and Douglas Miller for their help with this paper. He thanks Peter Lindert, Marianne Page, Ann Stevens, Elizabeth Cascio, Colin Cameron, Hoyt Bleakley, Alfredo Burlando, Trudy Marquardt, Janet Currie, Jane Loomis, the anonymous referees, as well as the participants at the UC Davis Brownbag series, the participants at the 2006 WEA Conference, and the 2007 SOLE Conference for providing valuable suggestions. This project was greatly aided by financial assistance from the Eugene Cota-Robles Fellowship and the Marjorie and Charles Elliott Fellowship awarded while the author was attending University of California, Davis. The data used in this article can be obtained beginning June 2011 through May 2014 from Alan Barreca; Tulane University; 206 Tilton Hall; New Orleans, LA 70118; abarreca{at}tulane.edu.
↵1. The first premise originated with David Barker’s “fetal-origins hypothesis” (Almond 2006). See Barker (2001) for a review of the epidemiological evidence in support of this hypothesis.
↵2. Almond (2006) finds large negative long-term consequences (in the form of lower income, lower educational attainment, higher disability rates) from exposure to the 1918 influenza epidemic.
↵3. In particular, Meng and Qian (2006) and Chen and Zhou (2007) find that cohorts exposed to China’s Great Famine around the time of their birth were significantly shorter in adulthood.
↵4. Results from Black et al. (2007), Oreopoulos et al. (2006), and Royer (2009) suggest that better in utero health conditions positively affect long-term health, educational attainment, and labor-market outcomes.
↵5. In general, contracting malaria, at any age, is harmful because the parasite destroys red blood cells when it reproduces in a human host’s bloodstream, thereby depriving the body’s tissues of oxygen and nutrients. For those exposed during the in utero and postnatal periods, the consequences of exposure to this disease are more acute because lowered immunity enables the malaria parasite to destroy relatively more red blood cells. An extensive epidemiological literature has documented the relationship between early-life exposure to the falciparum strain of malaria and short-term outcomes. See Holding and Snow (2001) for a review of this literature. Less is understood about the consequences from exposure to the vivax strain (WHO 2007), which was the more prevalent strain in the United States during the early 20th century.
↵6. Approximately 200 to 400 cases were associated with each death (Humphreys 2001).
↵7. Countries with more of the vivax strain of malaria today share more similarities with the United States (circa early 20th century).
↵8. Bleakley (2003) shows that malaria’s decline in the United States led to moderate improvements in both long-term educational attainment and long-term earnings. Lucas (2005) examines the effects of malaria eradication campaigns in Paraguay, Sri Lanka, and Trinidad in the mid-20th century and finds that children born subsequent to malaria’s decline made moderate gains in schooling. Bleakley (2006) illustrates that malaria eradication efforts in the United States, Brazil, Colombia, and Mexico led to long-term increases in income and literacy.
↵9. Similar to my work, Hong uses weather to predict the incidence of malaria. However, my work has the added advantage of using high frequency (daily) temperature data, which allows me to more precisely identify the nonlinear temperature-malaria relationship. Although both Hong (2007) and Burlando (2007) share similarities with my research design, the work here was done in parallel to these other studies and the ideas developed independently. Recently, Case and Paxson (2009) examined the correlations between malaria death rates and long-term outcomes in the United States.
↵10. Also, the rate at which mosquito larvae develop into adults is an increasing function of temperature between 16°C and 34°C (Bayoh and Lindsay 2003).
↵11. For example, sporogeny lasts approximately 210 days at 15°C, 14 days at 22°C, and eight days at 30°C for the vivax strain of malaria. Sporogeny is possible between 15°C and 40°C for the vivax strain, and 16°C ad 40°C for the falciparum strain (Martens et al. 1995).
↵12. For example, see Martens et al. (1995) or Craig et al. (1999).
↵13. The results from these specification checks are available upon request. However, I do include a temperature-precipitation interaction term in one specification below.
↵14. Troesken (2004) notes that malaria deaths were sometimes misclassified as typhoid fever deaths, and vice versa, because some of the symptoms, like a high fever, are similar.
↵15. Note that if the fatality rate from contracting malaria varies across states or over time, there may be nonclassical measurement error.
↵16. These 15 states include, in order of the state with highest malaria death rate to the 15th highest malaria death rate: (1) Mississippi, (2) Florida, (3) Arkansas, (4) Louisiana, (5) South Carolina, (6) Georgia, (7) Texas, (8) Alabama, (9) North Carolina, (10) Tennessee, (11) Oklahoma, and (12) Missouri, (13) Kentucky, (14) Virginia, and (15) Illinois. Also, my estimates are qualitatively similar when I redefine the set of high malaria states to include only the states with the 12 highest malaria death rates.
↵17. My results are similar when the sample is limited to just the 15 high malaria states.
↵18. I include the average of the daily precipitation squared since increases in precipitation may have diminishing returns at the day level. For example, excessive rainfall may destroy mosquito breeding grounds (Craig et al. 1999). In an alternative specification (not reported) I control for the square of the average daily precipitation; the results are qualitatively similar.
↵19. My identification strategy implicitly assumes that (within-state) variation in the predicted malaria death rate, which is a function of variation in malaria-ideal temperatures, is a good proxy for the incidence of malaria infections.
↵20. Although the point estimates are similar when I include year fixed effects in this model, the standard errors are nearly double in size.
↵21. I also have estimated a model with round 5°C temperature bins (for example, between 20°C and 25°C). The results are qualitatively similar. Furthermore, I rely on temperatures between 22°C and 28°C as opposed to a 1°C or 2°C wider temperature range, such as between 20°C and 30°C, because of model fit. That is, models with temperatures between 22°C and 28°C have higher R–squared than models with a somewhat narrower or a somewhat wider temperature window. (These results are available upon request.)
↵22. For example, malaria exposure may cause loss of parental income, which could indirectly affect birth conditions.
↵23. For example, assuming contracting malaria diminishes human capital, the return to skill should increase within those cohorts exposed to malaria. As such, high-skill workers within treated cohorts would see an increase in earning potential, while low-skill workers would see a decrease in earning potential; the net effect of which is uncertain, prima facie. Furthermore, these general equilibrium effects also may spill over into nontreated cohorts as well.
↵24. This interpretation issue is not necessarily unique to my research design. See Almond (2006), for example.
↵25. To disentangle the separate effects of in utero and postnatal exposure, I also have estimated my preferred specification on each of the four different quarters of birth separately. In short, I find that the first and fourth quarter births are the most affected by malaria exposure; this suggests that both in utero exposure and postnatal exposure have significant long-term effects. However, my quarter-of-birth estimates are generally imprecise and I cannot rule out equal effects of exposure across each of the four quarters of birth in most cases. (These results are available upon request.)
↵26. According to Humphreys (2001) numerous factors lead to the demise of malaria; among other things, she cites urbanization, New Deal policies, the Tennessee Valley Authority and their economic development efforts, and a general reduction in poverty as the major determinants of malaria’s eradication.
↵27. I use an unbalanced panel of weather stations to make use of all the available data.
↵28. Mean temperature is determined by taking the average of the maximum and minimum temperature for that day.
↵29. Inverse-distance weights are constructed following Hanigan et al. (2006) using county-population estimates from the 1920 Census, geographic information from Sechrist (2000), and STATA’s “sphdist” program (Rising 2000).
↵30. I exclude races other than white or black.
↵31. In order to include all observations, I first set income to zero if it is reported as negative; I then take the log of income plus one and the log wages plus one. Following Almond (2006), I rely on the probability of being below 150 percent of the poverty line as a good indication of “poverty” status. Fraction female is an important indicator of in utero health since females are more likely to survive adverse in utero health conditions (Kraemer 2000).
↵32. The NCHS does not report malaria deaths prior to 1918 for any of these four states.
↵33. The downward trend in malaria death rates is correlated with general improvements in public health conditions (for example, better understanding about how to prevent malaria transmission) and socioeconomic conditions (for example, urbanization) as well as a concerted effort to eradicate malaria in the South (Humphreys 2001). This trend also underscores the importance of including state-specific time trends in order to control for any spurious time-series correlations.
↵34. Also, there is no statistically meaningful relationship between within-state changes in per capita income and within-state changes in the malaria death rate between 1929 and 1936. (I use state-year per capita income estimates from the Bureau of Economic Analysis, which does not have per capita income estimates for years prior to 1929.) These results are not reported.
↵35. The coefficient estimates on precipitation and precipitation squared are not statistically significant at conventional levels, nor are they large in magnitude.
↵36. My temperature-precipitation variable is constructed by first interacting a 30-day moving average of an indicator variable that takes a value of one if the daily mean temperature is between 22°C and 28°C with a 30-day moving average temperature variable at the weather station-day level. Then, I aggregate the temperature-precipitation interactions to the state-year level using the weighting method described in the Data section.
↵37. The second-stage results are nearly identical when I include the main effect (TEMP2228). Although I include precipitation and precipitation squared in my preferred specification, my second-stage results are qualitatively similar to these variables’ exclusion. These results are available upon request.
↵38. Note that the expected effect of malaria exposure on years of schooling is ambiguous prima facie. As noted by an anonymous referee, improved health raises both the returns to schooling (for example, increases in work-life expectancy) and the opportunity cost of schooling (for example, healthier individuals make better laborers).
↵39. For example, the estimated effect of malaria exposure on income can be bounded between a 40 percent decrease and a 15 percent increase. Bleakley (2003) found that individuals exposed to malaria during childhood had approximately 15 percent less earnings. Given the imprecision of my wage and income estimates, I cannot discern to what extent in utero and postnatal exposure can account for Bleakley’s findings.
↵40. There is also a statistically significant effect on being below 100 percent of the poverty line.
↵41. The estimated effect of malaria exposure on cohort size can be bounded between an 8 percent reduction and a 12 percent increase.
↵42. The fact that I estimate a local-average-treatment effect limits the validity of this thought experiment, however.
↵43. 0.04 divided by 400 times 100,000 equals 10. Since the magnitude of this effect is larger than a “typical” amount of schooling, this thought-experiment suggests that there may be peer effects or general equilibrium effects from being born in a cohort that was exposed to a significant amount of malaria.
↵44. However, for years-of-schooling and for the probability of attaining at least 12 years of education, the difference is not statistically significant at conventional levels using a Hausman test that adjusts for clustered standard errors.
↵45. See Deschênes and Moretti (2007), Deschênes and Greenstone (2007b), and Barreca (2008).
↵46. The “growing season” for most crops, according to DG, is between April 1 and September 30. See DG for a more detailed discussion.
↵47. According to the 1940 Census of Agriculture (Minnesota Population Center 2004), cotton constituted 25 percent of the total crop value in the South. In other regions of the United States, cotton output was close to nil.
↵48. I linearly interpolate the fraction of the day spent in each 1°C bin based on the minimum and maximum temperature of the day. The “growing season” for cotton, according to SR, is between April 1 and October 31. See SR for a more detailed discussion.
↵49. I use the 1980 Census since it is the last census in which an individuals’ year of birth can be derived (that is, the quarter-of-birth variable is not reported in the 1990 or the 2000 Censuses).
↵50. There is evidence to suggest that malaria-ideal temperature have a small “culling” effect in low malaria states.
↵51. This thought experiment is complicated by the fact that malaria-ideal temperatures affect the probability of malaria transmission in subsequent years. Malaria-ideal temperatures affect this year’s mosquito population, which will in turn affect next year’s mosquito population and the propensity for malaria to be transmitted. In addition, because the vivax strain can survive in humans over the winter, more people with malaria in year t may affect the number of malaria hosts in year t+1 (Humphreys 2001). Furthermore, I also estimated an analogous IV model using malaria death rates and malaria-ideal temperatures in year t, t+1, and t+3. The results were very imprecisely estimated.
↵52. This result is also consistent with Meng and Qian (2006), who show that the effects of early-life malnutrition is most pronounced for those who are exposed to China’s Great Famine around the time of birth. Also, the results are qualitatively similar when I include five years of malaria-ideal temperatures as regressors.
↵53. However, the estimates for the 1900–30 sample are imprecisely estimated. I also have examined whether the effects of malaria-ideal temperatures vary over time. Given that malaria incidence was declining during the early 20th century (see Figure 2), the negative correlation between malaria-ideal temperatures and long-term outcomes may be smaller for younger cohorts in the 1960 Census. However, this test does not offer any meaningful insight because the parameters are imprecisely estimated. (These results are available upon request.)
- Received September 2007.
- Accepted July 2009.