Abstract
This paper tests the hypothesis that education improves health and increases life expectancy. The analysis of smoking histories shows that after 1950, when information about the dangers of tobacco started to diffuse, the prevalence of smoking declined earlier and most dramatically for college graduates. I construct panels based on smoking histories in an attempt to isolate the causal effect of smoking from the influence of time-invariant unobservable characteristics. The results suggest that, at least among women, college education has a negative effect on smoking prevalence and that more educated individuals responded faster to the diffusion of information on the dangers of smoking.
I. Introduction
The strong correlation between education and health outcomes, even after controlling for income, has been recognized as a robust empirical observation in the social sciences and economic literature (Cutler and Lleras-Muney 2006; Deaton and Paxson 2003; Fuchs 1982; Grossman 2006; Lleras-Muney 2005). The decision to smoke or not to smoke is a conscious choice that directly affects the health status and ultimately the mortality of individuals. Smoking behaviors therefore provide an interesting opportunity to investigate how education, by influencing behaviors, affects health outcomes.
Smoking habits were not initially perceived as dangerous. The information revealing the health-damaging consequences of cigarette smoking emerged gradually between 1950 and 1970. Variations in smoking prevalence across education groups during this period might inform us about the way individuals reacted to that information and how education helps in accessing and processing this information.
This paper, using smoking behavior as an example, examines whether the effect of education on improving health outcomes can be considered as causal. It turns out smoking is the leading cause of premature adult mortality. Each year in the United States, tobacco use causes more than 438,000 deaths.1
The issue of the causality in the relationship between education and health is important in the health economics literature. Theoretical explanations for this correlation can be classified into three broad categories. One explanation stresses that education is an investment. Education will deliver a higher income, a higher consumption level in the future, and raises the value of staying alive (Becker 1993; Becker 2007). More educated individuals are healthier because their investment in the future gives them the right incentives to protect their health. Another explanation, based on education entering as a factor in the health production function (Grossman 1972), emphasizes that education improves the access to health-related information and the processing of that information to make health-related decisions. A third view (Farrell and Fuchs 1982; Fuchs 1982) claims that the observed correlation between health and education is mainly due to unobservables, like the discount factor or the ability that causes the same individuals both to study longer and to take greater care of their health. This study will attempt to distinguish the first two explanations from the third one, but will not attempt to separate the first two.
I investigate whether the effect of education in reducing smoking is causal in three steps. I first analyze smoking behaviors across education groups in the United States from 1940 to 2000. Despite the lack of surveys linking education and smoking before 1966, I obtain this information by using retrospective smoking histories constructed from the smoking supplements of National Health Interview Surveys conducted between 1978 and 2000 (U.S. Department of Health and Human Services, 1978–2000). Such a reconstruction had never been done systematically and is a useful contribution to the knowledge of historical patterns in smoking behaviors. The conclusion from the analysis is that the smoking prevalence among more educated individuals, college graduates in particular, declined earlier and most dramatically than in any other education category. The decline for college graduates started in 1954, four years after the medical consensus on the health consequences of smoking and ten years before the publication of the first Surgeon General’s Report on this issue. This suggests that they had easier access to the information and/or were more able to process that information.
Farrell and Fuchs (1982), however, challenge this conclusion taken from cross-sectional analysis. They show that the strong negative relation between schooling and smoking observed at age 24 is accounted for by differences in smoking prevalence at age 17, when schooling is still very similar across individuals. Using the smoking histories constructed for this paper, my second step is to reproduce results similar to Farrell and Fuchs (1982). From their analysis, it would therefore appear that a causal link between education and smoking cannot be established and that a “third unobservable variable” should be preferred as the hypothesis to explain the correlation between smoking and education.
But, by limiting their analysis to the 17 to 24 age range, Farrell and Fuchs (1982) omit most of people’s smoking histories, and, in particular, they almost entirely ignore smoking cessation.
This paper uses the information recovered from the smoking histories to construct a series of panels and analyze smoking behavior in a way that allows accounting for time-invariant unobservable characteristics. Given a date of smoking initiation and of smoking cessation, the smoking status of an individual can be reconstructed for each age. Assuming that college graduation takes place between ages 17 and 25, allows one to analyze smoking behavior for the same individual before and after college.
Comparing the results from an individual-level panel data analysis using fixed effects with cross-sectional estimates is instructive as the panel data analysis allows removing the influence of unobserved individual time-invariant characteristics like time preference or ability. In the cross-sectional analysis, for males, the negative gradient between education and smoking is present for all birth cohorts (starting with those born between 1910 and 1919) but its magnitude increases with age and with later birth cohorts, while the negative association between schooling and smoking among females begins to appear around the time that evidence of a causal link between smoking and lung cancer became widely known. In the fixed effect panel regressions, the negative gradient between smoking and college takes longer to appear. For females, the negative association is still roughly coincident with the wide-spread knowledge of the smoking-cancer link. But this is not true for males. It is not until the male birth cohorts of 1950–59 that the negative association appears.
The paper is structured as follows. Section II presents descriptive results from the analysis of the smoking prevalence in the United States from 1940 to 2000. Section III focuses on a multivariate, cross-sectional, analysis of smoking behaviors, including a replication of results similar to Farrell and Fuchs (1982). Section IV presents the results of the panel regressions. Section V concludes.
II. Evidence from Smoking Histories in the United States, 1940–2000
It is well known by social scientists that more educated people are less likely to smoke (Wald 1988). The literature about the negative correlation between education and tobacco use has become a key element in the debate about whether education is causal in improving health.2
Smoking habits prior to the arrival of the information about the dangers of smoking for the health have been less studied due to lack of data (Ferrence 1989; Gilpin and Pierce 2002; Harris 1983; U.S. Department of Health and Human Services 1980; Meara 2001). Before 1950, even though the medical literature mentioned tobacco smoking as a possible factor in causing mouth and lip cancer, there was no consensus in the medical profession. This consensus was achieved in 1950, with the publication of five studies, four in the United States and one in the United Kingdom showing the link between smoking and lung cancer (Peto et al. 2000). These medical findings were echoed in the popular press in 1953 in the Reader’s Digest and in Consumer Report (U.S. Department of Health and Human Services 1989; Cutler and Kadiyala 2003; Viscusi 1992). In 1964, the publication of the first Surgeon General’s Report (U.S. Department of Health and Human Services 1964) gave the endorsement of the government and disseminated the information to a larger public. The release of this report has been ranked one the most important news events of 1964 (U.S. Center for Disease Control and Prevention 2004). The report marks the beginning of awareness campaigns and prevention efforts. In 1966, a first health warning was made mandatory on cigarette packages. The wording was rather timid: “Caution: Cigarette Smoking May be Hazardous to Your Health.” Long debates in Washington (National Conference on Smoking and Health 1970) brought in a stronger language in 1970: “Warning: The Surgeon General Has Determined that Cigarette Smoking Is Dangerous to Your Health.” Given the overwhelming majority of literate smokers in 1970, the health-damaging consequences of smoking can be considered as common knowledge by then. The period 1950–70 can thus be interpreted as a period of gradual diffusion of that information and it is reasonable to expect the more educated individuals to have had access to that information earlier in that period. How did this differential access to information impact smoking behaviors? This part of the paper will address this question.
A. Data
One problem in studying smoking behaviors before and after the health-damaging consequences of smoking became widely known to the public is the lack of data linking smoking prevalence and education category before 1966. The first smoking supplement included in the National Health Interview Survey dates from 1965,3 and in 1966 education categories were included (U.S. Department of Health and Human Services 2004).4 Sander (1995a; 1995b) reports smoking behaviors by schooling levels after 1965. Meara (2001) uses the National Health Interview Survey to look at differential trends in smoking behavior of women by education level after 1966 and she uses the answer to the question of whether a woman had ever smoked in 1966 as a proxy measure of pre-1964 smoking behavior.
This paper uses 16 smoking supplements from different years of the National Health Interview Survey between 1978 and 2000 to construct detailed smoking histories going back before 1950. A similar method, based on cohort reconstruction, has been used in the sociological and medical literature with U.S. and Canadian data (Ferrence 1989; Gilpin and Pierce 2002; Harris 1983; Pierce et al. 1989; U.S. Department of Health and Human Services 1980).
Since 1978 the smoking questionnaire in the National Health Interview Survey includes questions about the age at which individuals started smoking cigarettes, if they ever started, and about how long ago they stopped, if they did so. Using the year at which individuals started smoking and the year at which they stopped, I constructed 373,738 smoking histories of adults aged 25 and older at the time of the interview.5
To make sure that the education variable is not time sensitive, and given that most people will have reached their definitive education level by age 25, I have sampled only individuals aged 25 and older.
B. Smoking Prevalence by Education Category, 1940–2000
Figure 1 shows how the general smoking prevalence evolved in the United States from 1940 to 2000. The years 1950 (consensus in medical journals), 1964 (first Surgeon General Report), and 1970 (clear health warning on the packages) are benchmarks in the gradual diffusion of the information about the health consequences of tobacco. Smoking prevalence in the population aged 25 and older was 37.7 percent in 1940 and reached a peak of 45.9 percent in 1957, before the publication of the Surgeon General’s report. After that, it steadily decreased and reached 24.6 percent in 2000. Figure 2 breaks down the data used in Figure 1 in four educational groups: individuals with less than a high school degree, high school graduates, individuals with some undergraduate education, and finally college graduates together with individuals having studied at the graduate level. Each point on Figure 2 is estimated from at least 1,000 observations. In 1940, individuals with less than a high school degree are the least likely to smoke (35.8 percent). The smoking prevalence for the three other education categories is close to 40 percent (39.4, 40.8, and 40.4 respectively). By 2000 there is a clear negative gradient between educational achievement and smoking prevalence. The prevalence of smoking among individuals with less than a high school degree is 29.6 percent, 28.4 percent for high school graduates, 25.6 percent for people with some college education, but only 14.2 percent of people with at least a college degree smoke. The most striking feature of Figure 2 is that smoking prevalence among the college graduates declined earlier and more dramatically. This declining trend starts in 1954, ten years before the first Surgeon’s General Report and only one year after the first articles in the general press. Differences in smoking prevalence among the three lower education categories are less marked, but a careful examination of Figure 2 reveals, however, that the ranking between these three categories has been inverted between 1940 and 2000. Figure 2 clearly supports the idea that more educated people reacted more quickly and more strongly to the information about the health-damaging consequences of smoking. Figures 3 and 4 break down further the data by gender. They reveal similar trends, although among males, college graduates were already less likely to smoke in 1940. The prevalence of smoking declined earlier (late 50s) for men than for women (mid 60s), but men started from a much higher level. In the early years, smoking prevalence was indeed much higher among men (above 55 percent, and above 60 percent for the lowest three education categories) than for women (the prevalence never went much above 40 percent).6 That gender gap is much smaller in 2000 and has all but disappeared for the college graduates. The relatively low (around 22 percent) prevalence for women with less than a high school degree is what drives down the general prevalence for that category in 1940.
Prevalence of Smoking in the United States, Age 25 and older, 1940–2000
Note: From smoking histories constructed from the 1978, 1979, 1980, 1983, 1985, 1987, 1988, 1990, 1991, 1992, 1994, 1995, 1997, 1998, 1999, and 2000 National Health Interview Surveys. The information about the dangers of smoking diffused gradually: 1950, consensus in medical journals, 1964, first Surgeon General’s Report, 1970, clear health warning on packages.
Prevalence of Smoking by Education Category in the United States, Age 25 and older, 1940–2000
Note: From smoking histories constructed from the 1978, 1979, 1980, 1983, 1985, 1987, 1988, 1990, 1991, 1992, 1994, 1995, 1997, 1998, 1999, and 2000 National Health Interview Surveys. The information about the dangers of smoking diffused gradually: 1950, consensus in medical journals, 1964, first Surgeon General’s Report, 1970, clear health warning on packages.
Prevalence of Smoking by Education Category in the United States, Males, Age 25 and older, 1940–2000
Note: From smoking histories constructed from the 1978, 1979, 1980, 1983, 1985, 1987, 1988, 1990, 1991, 1992, 1994, 1995, 1997, 1998, 1999, and 2000 National Health Interview Surveys. The information about the dangers of smoking diffused gradually: 1950, consensus in medical journals, 1964, first Surgeon General’s Report, 1970, clear health warning on packages.
Prevalence of Smoking by Education Category in the United States, Females, Age 25 and older, 1940–2000
Note: From smoking histories constructed from the 1978, 1979, 1980, 1983, 1985, 1987, 1988, 1990, 1991, 1992, 1994, 1995, 1997, 1998, 1999, and 2000 National Health Interview Surveys. The information about the dangers of smoking diffused gradually: 1950, consensus in medical journals, 1964, first Surgeon General’s Report, 1970, clear health warning on packages.
Keep in mind that in this analysis the general level of education in the population has been steadily increasing between 1940 and 2000. Being a college graduate in 1940 was more exceptional than it was in 2000. Figure 5 addresses this question by plotting the smoking prevalence among individuals above or under the average education level in each year for individuals aged 25 in that particular year.7 Initially, individuals above the average education level were more likely to smoke. The positive difference between the prevalence in both groups has been gradually decreasing and now the situation has been inverted, since individuals under the average schooling level in the population are much more likely to smoke.
Prevalence of Smoking by Relative Educational Level in the United States, Age 25 and older, 1940–2000
Note: From smoking histories constructed from the 1978, 1979, 1980, 1983, 1985, 1987, 1988, 1990, 1991, 1992, 1994, 1995, 1997, 1998, 1999, and 2000 National Health Interview Surveys. The information about the dangers of smoking diffused gradually: 1950, consensus in medical journals, 1964, first Surgeon General’s Report, 1970, clear health warning on packages. Each individual is classified as above or under the average educational achievement in each year for individuals who were aged 25 in that particular year.
C. Sample Composition Issues
The retrospective smoking histories allow a complete overview of smoking prevalence patterns by education levels from 1940.8 However, this method presents two problems in the composition of the sample. First, since the information about smoking prevalence between 1940 and 1977 is gathered from surveys taken between 1978 and 2000, only individuals who survived up to the year of the survey are interviewed. Since smokers usually die sooner than nonsmokers, this creates a “survivor” bias: nonsmokers are disproportionately represented among persons interviewed at older ages. For example, if we take someone who is 40 years old in 1945—that person would be 73 in 1978. To make it into the sample, he would have had to survive through the ages that smokers are least likely to survive, so he is less likely to have been a smoker. This problem can be mitigated by selecting respondents younger than 60 years of age at the time of their interview in the survey. Even though there is already some excess mortality due to smoking before age 60, most of the premature deaths from smoking-related diseases occur between age 60 and 75 (Peto et al. 2000). Figure 6 plots the smoking prevalence from 1945 to 2000 by education categories for respondents aged 60 or less at the time of the interview (for example, the person who is 40 in 1945 and 73 in 1978 would be dropped from the sample). The historical patterns of smoking prevalence by education levels are similar to the ones described previously.9
Prevalence of Smoking by Education Category in the United States, Age 25 to 60 at the Time of the Survey, 1945–2000
Note: From smoking histories constructed from the 1978, 1979, 1980, 1983, 1985, 1987, 1988, 1990, 1991, 1992, 1994, 1995, 1997, 1998, 1999, and 2000 National Health Interview Surveys. Only individuals aged younger than 60 at the time of the interview were selected. The information about the dangers of smoking diffused gradually: 1950, consensus in medical journals, 1964, first Surgeon General’s Report, 1970, clear health warning on packages.
The fact that data for the period 1940–77 are taken from interviews after 1977 implies that the age distribution for the years before 1978 is not complete. The data for the year 1940, for example, is made from individuals 25 years of age and older, but young enough in 1940 to have survived up to at least 1978, 38 years later. As one gets closer to 1978, the age distribution in the sample becomes closer to the age distribution in the general population. To address this problem and maintain a constant age distribution, Figure 7 plots the smoking prevalence by education for individuals between the age of 25 and 30 in each year. If we consider that Figure 7 is somewhat less precisely estimated given the reduced sample size implied by the selection of the 25–30 age group, Figures 2, 6, and 7 are very similar.10
Prevalence of Smoking by Education Category in the United States, Age 25–30, 1945–2000
Note: From smoking histories constructed from the 1978, 1979, 1980, 1983, 1985, 1987, 1988, 1990, 1991, 1992, 1994, 1995, 1997, 1998, 1999, and 2000 National Health Interview Surveys. The information about the dangers of smoking diffused gradually: 1950, consensus in medical journals, 1964, first Surgeon General’s Report, 1970, clear health warning on packages.
Even though the above mentioned sample composition issues, which are inherent to the retrospective nature of the data, need to be taken into consideration and have some influence on the estimation of the smoking prevalence between 1940 and 1950, they do not affect the main conclusion already drawn from Figure 2: More educated individuals reacted earlier and more dramatically to the arrival of information about the health consequences of smoking. By controlling the year of birth, the multivariate regressions of the type performed in the following sections also address some of those sample composition issues.
Another problem, however, with the way the data set has been constructed, is that it is only possible to use time-invariant variables like gender, race, and educational achievement (which I assume does not vary after age 25). Variables such as income or the number of cigarettes smoked per day, present in most of the surveys, are measured at the date of the survey but, since they can vary over time, they cannot be included in regressions which I rely on retrospective data.11
The price of cigarettes is also an important element of the smoking decision, even if tobacco is an addictive good (Becker, Grossman, and Murphy 1994). The National Health Interview Surveys do not include information about prices but the evolution of the real, after-tax price of a pack of 20 cigarettes from 1955 to 2000 is reconstructed in Figure 8.12 Despite a slight increase in the early 1970s, it is mainly with the tax increases after 1980 and, even more strongly, with the Tobacco Settlement of the late 1990s that the real price to consumers increased substantially. Figure 2 indicates that most of the differential in smoking prevalence by educational level is already present before the early 1980s and has occurred when there was no large variation in the real price of cigarettes.
Real price of one pack of 20 cigarettes, United States, 1955–2000, in 2002 Dollars
Note: The data for the 1955-90 period are taken from the “Tax Burden on Tobacco”, an historical compilation edited by the Tobacco Tax Council (Tobacco Tax Council, 1990). From 1993, the index on “Cigarette and other tobacco products” in the Consumer Price Index database of the Bureau of Labor Statistics is used (U.S. Department of Labor, 2001). The prices for 1991 and 1992 were obtained by using interpolation between 1990 and 1993. I have used the inflation calculator of the Bureau of Labor Statistics to transform the prices in 2002 dollars.
Finally, it is worth emphasizing that this data set of smoking histories is based on self-reported smoking behaviors. This limitation of the data is difficult to avoid, especially in an historical context. It would potentially bias the results if more educated individuals were more or less likely to report truthfully their smoking practices. I am not aware of evidence that this would be the case.
III. Cross-sectional analysis of smoking behaviors
Farrell and Fuchs (1982) use the age pattern of smoking initiation to show that “the strong negative relation between schooling and smoking observed at age 24 is accounted for by differences in smoking behavior at age 17, when all subjects were all in the same grade.”13 Using the data set constructed for this paper, I replicated results similar to those obtained by Farrell and Fuchs (1982) for each ten-year birth cohort between 1910 and 1979.
The model is:
(1)
Where:
a is age and goes from 17 to 60 according to the regression.
S is a dummy for whether the individual smokes at age a.
C is a time-invariant dummy for whether or not the individual is a college graduate. At age 17, it represents future college status and, as in Farrel and Fuchs (1982), it takes the value 1 for future college graduates.
X are other time-invariant covariates: gender, race, year of birth, and survey year.
Read horizontally, Tables 1 and 2 follow smoking behavior at a particular age, across birth cohorts while read vertically, the tables follow the same birth cohort across different ages. I focus on college education for two reasons. College graduation is the crucial margin for smoking prevalence, as evidenced in Figure 2. It also will allow me to compare Tables 1 and 2 with Tables 3 and 4 where I will consider a series of panels following smoking behavior before and after college graduation.
The effect of college education on smoking behavior at different ages. Males. Cross-sectional linear regressions
The effect of college education on smoking behavior at different ages. Females. Cross-sectional linear regressions.
The effect of college education on smoking behavior at different ages. Panel data with fixed effects, linear model. Males
The effect of college education on smoking behavior at different ages. Panel with fixed effects, linear models. Females.
The results are included in Table 1 for males and Table 2 for females. In the first two rows of Tables 1 and 2,14 the dependent variable is whether the individual was smoking at age 17 and at age 25. At age 17, the individuals is assigned, as in Farrell and Fuchs (1982) his or her future college graduation status. For males in Table 1, at age 25, when educational achievements can be considered as definitive, there is a clear negative effect of college graduation on smoking prevalence. This effect increases for younger birth cohorts. But, at age 17, before college enrollment, when males are classified according to their future definitive schooling levels, the strong negative relationship between college and smoking is already present.15 Similarly, for females in Table 2, the negative gradient between education and smoking is also already present at age 17 (except for the cohort born between 1910 and 1919 for which the coefficient is insignificant).16 Farrel and Fuchs (1982) use the existence of a negative gradient between future college graduation and tobacco use at age 17 as ground to reject the causality from schooling to smoking, in favor of a “third variable” hypothesis. However, they do not analyze smoking behavior after age 24. Tables 1 and 2 present, by birth cohort, a cross-sectional analysis of smoking behavior at a large range of ages (17 to 60).
The remainder of the analysis now focuses on smoking behaviors between ages 25 and 60.17 In Table 1 for males, all coefficients on college education are negative and significant. The general trend is for them to increase in magnitude horizontally indicating that the younger the birth cohort, the stronger the negative effect of education on smoking for a certain age. Among cohorts born later in the 20th century who have been more exposed to information, the effect of college education seems to be stronger.
The analysis for females is in Table 2. The magnitude of the coefficients is smaller among females than males. For females born between 1910 and 1919 at ages 25 and 30 and those born between 1920 and 1929 at age 25, the coefficient on college status is positive and significant. Before the information on the dangers of smoking was available, more educated women were more likely to smoke.18 For those two birth cohorts, at older ages, the coefficient on college is not statistically different than zero. From age 55 for women born in 1910–19 and from age 45 for women born in 1920–29, this coefficient becomes negative and significant. Thus, for those two birth cohorts of females, there is a complete reversal of the gradient between education and smoking as individuals age and as the health related information becomes available. For females born after 1929, all coefficients are negative and significant. For females, looking vertically for the same birth cohort and abstracting from smoking at age 17, the older the individuals, the stronger the negative effect of education on smoking tends to be.
The results of Tables 1 and 2 suggest that comparing smoking behaviors at ages 17 and 25 like Farrell and Fuchs (1982) misses the dynamics of smoking cessation in adult ages and that there are substantial differences in quitting behavior by education level.
In the cross-sectional analysis, the negative association between schooling and smoking among females begins to appear around the time that evidence of a causal link between smoking and lung cancer became widely known. This is consistent with the hypothesis that the more educated process information more effectively to produce better health. This suggestion, however, is not true for males as college education is negatively associated with smoking behavior for all birth cohorts and at all ages.
IV. A panel analysis of smoking and education
The evidence presented in the previous sections documents the appearance, after 1950, of a negative gradient between educational achievement and smoking habits. However, since it is based on cross-sectional analyses, this evidence cannot be conclusive about the mechanisms through which education affects health decisions and does not allow distinguishing between three theories explaining the correlation between health and education: (1) education is causal in reducing smoking by entering as a factor in the health production function; (2) education increases the stream of future income and consumption, and ultimately the value of life, so that it provides incentives for individuals to protect this stream by reducing mortality risks; or (3) the correlation between education and health is due to unobservables like the discount factor or ability that causes the same individuals to both study longer and take care of their health.
The hypothesis of a causal link between tobacco use and education has been tested using instrumental variables approaches: Sander (1995a; 1995b) uses family background characteristics, Currie and Moretti (2003) use college openings and De Walque (2003; 2007b) use draft-avoiding behavior during the Vietnam War as instruments for college education. Those analyses find that when education is treated as endogenous, there remains a significant effect of schooling in reducing smoking habits. Kenkel (1991) and Meara (2001) use direct measures of health knowledge to separate the effect of information from the effect of unobservables.
This section does suggest a novel approach. An individual-level panel data analysis using fixed effects would allow removing the influence of unobserved individual time-invariant characteristics like time preference or ability. It is important, however, to acknowledge the limitations of this fixed effects strategy. The underlying assumption of strict exogeneity is strong as it implies that the error term for each period is independent of the fixed effect and all the regressors in every period. That assumption is unlikely to hold. For example, the discount factor, as suggested by Becker and Mulligan (2007) could evolve over time. Similarly, while one can plausibly argue that innate ability is fixed and time-invariant for an individual over time, the effect of ability on the probability of smoking is probably not invariant over time (and exogenous to schooling): Throughout school and their working lives, individuals learn more about their ability. This updating of information, constraints and opportunities presumably affects choices about smoking.
The National Health Interview Surveys do not constitute a panel, but merely repeated cross sections. But, by reconstructing smoking histories based on information about the date of smoking initiation and smoking cessation,19 it is possible to trace back individual smoking behaviors over time, for every age or year. In this section, like in Tables 1 and 2, I will start by considering age 17 and then ages 25 to 60, with five-year intervals. Variations in educational achievement also can be followed over time: Although the actual date of college graduation is not reported in those surveys, it seems safe to assume that it occurs after age 17 and before age 25. This procedure allows reconstructing retrospectively a series of two period panels based on smoking histories with individual variations on smoking status, according to smoking initiation and smoking cessation and individual variations in education, based on the age of college graduation. The structure of Tables 3 and 4 is similar to Tables 1 and 2,20 but the model, in each cell, is a two-period panel regression with individual fixed effects. The two periods are age 17 on the one hand and ages 25–60 on the other hand. In each regression there are two observations per individual.
The model is:
(2)
Where:
t = 17, a with a = age going from 25 to 60 according to the regression.
St,i is a dummy for whether the individual smokes at age t
Ct,i is a dummy for whether the individual is a college graduate at time t.
ηi is a time-invariant individual specific fixed effect.
γt is an age effect.
Tables 3 and 4 use a linear regression model, while Tables A3 and A4, present, results from conditional logit specifications (marginal effects shown), which are technically more appropriate with a binary dependent variable like smoking status.21 The individual fixed effect specification, under the strong assumption of strict exogeneity, should purge the estimates from individual specific time-invariant unobservables, like the discount factor or ability.
Tables 3 and 4 present the analysis with the reconstructed panel analysis separately for males and females. For males in Table 3, it is only for the birth cohorts born after 1950 that the coefficient on college education is negative and significant. For older birth cohorts, the coefficients are either not significantly different than zero or, for the older birth cohorts positive. One potential explanation for the positive coefficients is that future college graduates tend to start smoking later than individuals who will not obtain a college degree. This is shown in the last two rows of Table 3 that display the average age at smoking initiation for the two education categories: for each birth cohorts, future college graduates start smoking on average at least one year after other individuals and, on average, they start after age 17. If, from age 17, college graduates initiate smoking at a higher rate than individuals with no college education—who are more likely to have already started by age 17—and if this is not compensated by a higher quitting rate for college graduates, this would lead to positive coefficients, as found for individuals born between 1910 and 1929 until age 40 and for the 1930–39 birth cohort until age 35 in Table 3. Another explanation for the positive coefficients might be that, before the information on the dangers of smoking was available, smoking was a normal good.
For females in Table 4,22 the positive coefficients on college education are confined to the birth cohorts 1910–29 when they were young. For females born after 1930, all coefficients are negative and significant, except at ages 25 and 30 for the 1930–39 birth cohort and at age 25 for women born between 1940 and 1949 where the coefficient is not significantly different than zero. The general trend that the coefficient becomes more negative as each birth cohort ages is present for females. Notice in the last two rows of Table 4 that among females born before 1939, future college graduates started to smoke before women with no college degree. Thus, delayed smoking initiation cannot be an explanation for the positive coefficients among females. However, for females, the positive coefficients are limited to old birth cohorts when they were young. Figure 4 suggests that this might be due to women with no high school degree who had a lower smoking prevalence than other females until 1970. For women born after 1940, average age of smoking initiation is higher for future college graduates, like for males.
For both males and females in Tables 3 and 4, the general trend is for the coefficient on the college indicator to decrease (going from positive to nonsignificantly different than zero and to negative and more negative), both horizontally and vertically. This indicates, reading the table from left to right that, at the same age, the younger the birth cohort, the stronger the negative effect of education on smoking. And, going down each column, it suggests that, for the same birth cohort, the older the individuals, the stronger the negative effect of college education on smoking.
It is important to note that the negative association between smoking and college takes longer to appear than in the cross-sectional analysis in Tables 1 and 2. For females, the negative association is roughly coincident with the widespread knowledge of the smoking-cancer link. But, this is not true for males. It is not until the male birth cohorts of 1950–59 that the negative association appears. This suggests that the decline in smoking among men associated with college education did not begin until the late 1970s well after the dangers of smoking had become common knowledge.
The specifications in Tables 3 to 4 assume that the unobservable, individual specific variables that can explain both smoking behaviors and education levels, like the discount factor or ability, are time invariant. The fixed effect regressions would not account for unobservables that vary over time, for example, if the discount factor would be endogenous as in Becker and Mulligan (1997). One could argue that the assumption that unobserved variables are time invariant is far-stretched because smoking is an addictive good and therefore that state dependence is an important factor. However, Tables 3 and 4 include in the bottom rows the average age of smoking initiation for individuals with or without a college degree. There are differences in age at smoking initiation, but those are not substantial (always less than two years). It follows that when those individuals are older (for example, between age 30 and 40) and they consider quitting tobacco, the length of their smoking addiction does not vary widely.
The results in Tables A3 and A4 under the conditional logit specification confirm the general tendency of the results included in Tables 3 and 4. Among males, for the earlier birth cohorts (1910–39), there is a positive association between education and smoking. This association weakens and loses significance for the more recent birth cohorts (for example, 1940–49). For the cohort born between 1950 and 1959, the association becomes significantly negative. Among females, for the earlier birth cohorts (1910–29), there is generally no significant association between smoking and education, but the negative gradient appears with the 1930–39 birth cohort, much earlier than among males.
Tables 3–4 and Appendix Tables A3–A4 report the results from panels including only two points in time: age 17 and a later age between 25 and 60. On the one hand, these specifications have the advantage that there is no need to assume a specific age for college graduation, information that is not included in the data: At age 17, no individuals are assumed to be college graduates, while at ages 25 and older, all those who report in the survey that they have a college degree are assumed to have already graduated. But on the other hand, those specifications yield very short panels and do not exploit all the information available.
If one is willing to make an assumption about the age of college graduation, it is possible to construct much longer panels. In Table 5, I constructed those longer panels, using 21 as the graduation age for individuals who report having a college degree in the survey and using the information on smoking initiation and cessation available in the survey to determine smoking status at every age. Table 5 reports results from those longer panels using both a linear model and the conditional logit model (marginal effects reported). The results are in line with those reported with the shorter, two periods-panels.
Panel data analysis of smoking behavior by birth cohort: long panels
For males, until the cohort born 1940–49, there is a positive effect of college graduation on smoking (significant in the linear model, but not in the conditional logit model). The positive gradient disappears for individuals born after 1940: the relationship becomes negative under the linear model (significantly for the cohorts born 1940–69), while with the conditional logit model, it is only for the cohort born 1940–49 that a significant negative relationship between education and smoking can be observed. Among females, the coefficient on college education is always negative under both models and that negative relationship is consistently statistically significant for all women born after 1930.
The combined results from Tables 3–4, Appendix tables A3–A4 and Table 5 yield different conclusions for males and females. The pattern among females is consistent with the idea that more educated individuals responded faster to the diffusion of information on the dangers of smoking. Among males, we observe a complete reversal—from positive to negative—of the education/smoking gradient, but that reversal took longer to materialize and did not coincide perfectly with the gradual diffusion of the information, as the reversal was only fully completed with the cohort born after 1950. The described effects could be considered as causal only under the assumption of strict exogeneity, which is unlikely to hold.
V. Conclusions
This paper starts by analyzing smoking histories reconstructed from National Health Interview Surveys in the United States and by retracing the evolution of smoking patterns across education groups. The main conclusion is that, after the gradual arrival of the information about the dangers of tobacco, the smoking prevalence among more educated people declined earlier and more dramatically.
The time pattern of the decline in smoking prevalence for individuals with college degrees suggests that at least some of the differential by education can be ascribed to a causal role of education in giving access to and processing the information. Figure 2 shows that the smoking prevalence among college graduates started to decline earlier than for the three other categories. The inflection point is in 1954, ten years before the first Surgeon General’s Report, four years after the consensus in the medical profession, and one year after the first article in the general press on the dangers of smoking.
Farrell and Fuchs (1982) suggest that, because the negative gradient between college and smoking observed at age 24 is already present at age 17, the association between smoking and schooling is not causal. This paper confirms their results but argues that the complete lifecycle of smoking decisions should be considered, including smoking cessation.
In an attempt to isolate the causal effect of smoking from the influence of time-invariant individual unobservable characteristics, I construct a series of panel using the smoking histories of individuals. The pattern of results among females is consistent with the idea that more educated individuals responded faster to the diffusion of information on the dangers of smoking. However, the decline in smoking among men associated with college education did not begin until the late 1970s well after the dangers of smoking had become common knowledge. The difference in the results for men and women is intriguing. The empirical analysis proposed in this paper does not offer an explanation, but this difference would be interesting to explore in further research. One hypothesis worth testing would be that women are more sensitive to health-related information. In a very different setting, De Walque (2007a) shows that in rural Uganda, a negative gradient between education and HIV prevalence and incidence emerged only among young women.
The analysis proposed in this paper does not address the hypothesis that part of the negative gradient between smoking and education is due to social and peer effects. Although this appears to be a reasonable hypothesis, the evidence in this paper that smoking did not vary substantially by education category before 1950 and that the gradient appeared as the information on the dangers of smoking started to diffuse would suggest that peer and social effects could merely act as multipliers and that the interaction between education and information is more likely to have been the trigger.
Rates of return to education, as traditionally calculated, only account for labor market earnings. If the effect of education on health and longevity is, as in the case of smoking reduction, at least partially causal, it would make sense to attribute additional returns to education. I have proposed (De Walque 2003) a method to estimate these extra returns in the context of the HIV/AIDS epidemic in Africa. Compared to HIV/AIDS case, the potential addition to the returns to education is significant but modest in the case of smoking, essentially because smoking kills at relatively late ages.
Appendix
The effect of college education on smoking behavior at different ages. Males. Marginal effects of cross-sectional logit regressions
The effect of college education on smoking behavior at different ages. Females. Marginal effects of cross-sectional logit regressions
The effect of college education on smoking behavior at different ages. Panel data with fixed effects, marginal effects of conditional logit. Males
The effect of college education on smoking behavior at different ages. Panel data with fixed effects, marginal effects of conditional logit. Females
Comparisons between smoking prevalence obtained using current and retrospective data: NHIS, 1974–99, less than high school and college graduates
Note: For the retrospective data, from smoking histories constructed from the 1978, 1979, 1980, 1983, 1985, 1987, 1988, 1990, 1991, 1992, 1994, 1995, 1997, 1998, 1999 National Health Interview Surveys.
Footnotes
Damien de Walque is a Senior Economist at the World Bank, Development Research Group. He thanks Gary Becker, Pedro Carneiro, Raphael De Coninck, Mark Duggan, Michael Greenstone, Michael Grossman, Ted Joyce, Donald Kenkel, Fabian Lange, Steven Levitt, Ellen Meara, Chris Rohlfs, Tomas Philipson, and participants in the Applications Workshop at the University of Chicago, the NBER Health economics seminar, the 24th Arne Ryde Symposium on the Economics of Substance Use at Lund University (Sweden), the Econometric Society World Congress (University College London) 2005, the AEA Meetings (Boston) 2006, and Lehigh University for helpful comments and discussions. The findings, interpretations, and conclusions expressed in this paper are those of the author and do not necessarily represent the views of the World Bank, its Executive Directors, or the governments they represent. The data used in this article can be obtained beginning February 2011 through January 2014 from Damien de Walque, 1818 H Street, NW, Washington, DC 20433, USA. ddewalque{at}worldbank.org
↵1. http://www.cdc.gov/tobacco/issue.htm (accessed on February 12, 2009).
↵2. See Berger and Leigh (1989); Chaloupka and Warner (2000); Currie and Moretti (2003); de Walque (2003; 2007b); Farrell and Fuchs (1982); Fuchs (1982); Grossman (1975); Kenkel (1991); Meara (2001); Sander (1995a,b); Viscusi (1990, 1992).
↵3. Gallup surveys about tobacco use were available as early as 1949, but do not classify the respondents by education category (Cutler and Kadiyala 2003; Viscusi 1992).
↵4. To the best of my knowledge, the only statistics taken from current surveys where some information about the relationship between smoking and education levels is available are from the United Kingdom (Wald et al. 1988). In 1948, men from almost all “social classes” were smoking at a very similar and very high level (between 64 and 72 percent). Only the lower “class,” manual workers earning less than 7£ per week were smoking significantly less (45 percent). By 1985, the situation had reversed: the education/smoking gradient was clearly negative (smoking prevalence among skilled and partly skilled manual occupations was 40 percent; 34 percent among skilled manual occupations; 28 percent for clerical and lower professional, and 20 percent for professional clerical occupations). To get information about the level of smoking prevalence in the U.S. population before 1966, most studies rely on sales data to compute the per capita number of cigarettes consumed per year (U.S. Center for Disease Control and Prevention 1994). This approach does not allow disaggregating consumption of cigarettes by education categories.
↵5. To estimate the smoking prevalence in any year, I created a dummy variable for every individual who has been interviewed after or during that year and was aged 25 and older in that year. The dummy takes the value zero if the individual never smoked, started after that year or stopped before it. It takes the value 1 if the individual smoked during that year. In the 1983, 1985, 1990, 1991 and 1994 surveys, no questions about the age at which individuals started smoking were included. Given that, in the other 11 surveys, 93.3 percent of the respondents who ever started smoking had started by age 25, and since I only sampled individuals aged 25 and older, I considered every person to have ever smoked in these five surveys as having started by age 25. Individuals might stop smoking for a certain period and then resume. A few surveys (1978, 1979, 1980, 1990, 1991, 1992) included questions about temporary smoking cessation, but since in 86.2 percent of the cases the period of temporary smoking cessation was not longer than one year, I have for simplicity ignored it and decided to use only the starting year and the year of definitive cessation.
↵6. One of the reasons why smoking prevalence among men was historically higher than among women is that smoking was heavily subsidized during military service, as documented in Bedard and Deschenes 2006. In De Walque 2007b, I explicitly include (Vietnam) veteran status in the analysis and I find that veterans are more likely to smoke, but that the introduction of the veteran status variable does not affect OLS estimates of the impact of education on smoking behaviors.
↵7. In 1940, the average education level was between ten and 11 years of education, in 1954 it went above 11, and in 1976 it went over 12.
↵8. Kenkel, Lillard, and Mathios. (2003, 2004) discuss the use of retrospective smoking data. In Figure A1, I compare current smoking prevalence rates by education from 1974 to 1999 as calculated directly from the yearly National Health Interview Survey with the estimates obtained in this paper using retrospective smoking histories constructed from 16 surveys. The estimates using these two methods are very close.
↵9. However, a comparison between Figures 2 and 6 confirms the existence of a survivor bias. This is best seen in the 1945–50 period: In Figure 6 the smoking prevalence is higher (above 50 percent) than in Figure 2 and the trend up to the early 1960s is rather flat, whereas Figure 2 suggests an increasing prevalence up to that point. These two comparisons suggest that in figure 2, between 1940 and 1950, nonsmokers are overrepresented because of the survivor bias. By construction, however, the importance of this survivor bias decreases over time.
↵10. The two sample composition issues discussed here probably also explain why the less-educated category appeared less likely to smoke in 1940 in Figure 2 but not in Figures 6 and 7: Earlier in the century, before 1940, tobacco might have been too expensive for the less-educated segment of the population, so that they were less likely to start smoking. Older people with low education were not likely to smoke in the 1940s so this dragged down the smoking prevalence for individuals with no high school degree. When the sample is restricted as in Figures 6 and 7 to people that were young between 1940 and 1950, the smoking prevalence for the individuals with less than a high school degree between 1940 and 1950 becomes very close to the prevalence of the other education groups.
↵11. Using the single 1990 National Health Interview Survey, I verified that considering the quantity of tobacco smoked does not change the conclusion. Controlling for income, the higher the level education among smokers, the smaller the amount of cigarettes smoked daily. This result reinforces my previous conclusions: More educated people are less likely to smoke and, if they smoke, they are smoking fewer cigarettes per day. Results are available on request from the author.
↵12. The data for the 1955–90 period are taken from the “Tax Burden on Tobacco,” an historical compilation edited by the Tobacco Tax Council (Tobacco Tax Council 1990). From 1993, I have used the index on “Cigarette and other tobacco products” in the Consumer Price Index database of the Bureau of Labor Statistics (U.S. Department of Labor 2001). The prices for 1991 and 1992 were obtained by using interpolation between 1990 and 1993. I have used the inflation calculator of the Bureau of Labor Statistics to transform the prices in 2002 dollars.
↵13. Notice that the sample used in Farrell and Fuchs (1982) is not nationally representative as it is drawn from four small California cities.
↵14. Notice that in Tables 1 and 2 (as well as in Tables 3 and 4), I have excluded the data from the 1983, 1985, 1990, 1991, and 1994 National Health Interview Surveys because the age at smoking initiation, a necessary variable for the analysis in those tables, was not included in the questionnaires.
↵15. Actually, for males, the magnitude of the gradient is larger at age 17 than at age 25 for the birth cohorts born before 1950. This might indicate that individuals who will end up less educated start smoking sooner and that those who will become more educated are more likely to start later, between ages 17 and 25. This is confirmed in the two last rows of Table 3: Among males, future college graduates tend to start smoking later.
↵16. For females, it is also the case that up to the birth cohorts born after 1960, the coefficient is more negative when the analysis is done at age 17, using future education levels, than at age 25 using achieved education levels. For the birth cohort born between 1920 and 1929, the coefficient even switches sign, from negative to positive, between the analysis at age 17 and at age 25. College-educated women born after 1939 tended to start smoking later than women with no college degree (see bottom of Table 4), so this could explain this phenomenon, as for males. However, among women born before 1940, college graduate started earlier. Therefore, “late” smoking initiation cannot be an explanation for those birth cohorts. Notice that most women born between 1910 and 1940 were 17 and 25 before the information about the dangers of smoking was available.
↵17. Tables 1 and 2 use a linear regression model, but Tables A1 and A2 confirm the results using a logit model, displaying the marginal effects of the logit coefficients.
↵18. But as described above, this was not always the case at age 17, when future college graduation status was considered.
↵19. Only the 1978, 1979, 1980, 1987, 1988, 1992, 1995, 1997, 1998, 1999, and 2000 NHIS contains both the age at smoking initiation and at smoking cessation and are therefore used.
↵20. The first row of Tables 1 and 2, the analysis at age 17 is however not relevant in the panel regressions since the beginning and the ending period of the panel would otherwise be the same.
↵21. Hahn’s (2001) results for probit models cannot be used in this case because it would require that all individuals attend college. The unconditional logit model would yield inconsistent estimates and I am therefore using the conditional logit model which only uses information from observations that changes status, that is, people who start smoking or quit smoking.
↵22. There is no positive coefficient in Table A4 with the conditional logit specification.
- Received November 2007.
- Accepted April 2009.