Abstract
We use administrative data from Texas to estimate how graduating from a state flagship or a community college relative to a nonflagship university affects the distribution of earnings. We control for the selection of students across sectors using a rich set of observable ability and background characteristics and find evidence of substantial heterogeneity in the returns to quality. Returns increase with earnings among UT–Austin graduates but decline among Texas A&M graduates. For community colleges, returns are negative for lower earners but go to zero for higher earners. Our estimates also point to differences in the distribution of returns by race/ethnicity.
I. Introduction
The U.S. higher education system is characterized by a large amount of heterogeneity, which produces significant differences across institutions in per-student resources and peer quality but also in costs (Hoxby 2009). On average, empirical evidence points to a substantial earnings premium associated with college enrollment and completion (Card 1999; Goldin and Katz 2008; Autor, Katz, and Kearney 2008). A growing body of literature, however, explores whether the college earnings premium is larger when students attend more selective, higher-resource schools. A series of papers using “selection-on-observables” techniques show that students who attend schools with higher observed quality earn significantly more postgraduation (Brewer, Eide, and Ehrenberg 1999; Black and Smith 2004, 2006). Hoekstra (2009) estimates returns of similar magnitude by exploiting an admissions cutoff rule to determine the effect of attending a large flagship state university.1 In contrast, Dale and Krueger (2002, 2014) employ a matching estimator that compares earnings among students who applied and got into the same set of schools but who attended schools of different quality. While they find no average effect of college quality on earnings, they do show evidence that lower-income and minority students experience positive returns. Taken together, these prior studies indicate that students who attend or graduate from higher-quality schools subsequently earn more on average in the labor market.
A central limitation with the existing research on the returns to college quality is that it focuses solely on estimating mean effects. There are several reasons, though, that examining only mean earnings impacts might miss important aspects of how college quality influences earnings. First, the social welfare implications of any positive mean quality effect differ conditional on whether the whole earnings distribution shifts outward or whether the gains are concentrated at the top or the bottom of the distribution. For example, if higher-quality schools only shift out the top of the earnings distribution, then relatively few students would obtain the return to investing in such schools while all students would bear the costs. Although the mean return to college quality would be positive in such a circumstance, the investment would include substantial risk for students because these returns are experienced by a small number of people. Alternatively, if higher-quality schools shift out the lower part of the earnings distribution, this would be consistent with there being an “insurance value” to attending a more elite school. That is, higher-quality schools would insure students against low earnings outcomes. Since U.S. colleges receive a significant amount of public support from federal and state governments that vary across the quality distribution, understanding how college quality affects the entire distribution of earnings and what types of students experience these returns has implications for the efficient allocation of that support. Critically, prior work that estimates the mean return to college quality has not been able to examine these distributional aspects of college investment decisions.
Second, analyzing the effect of college quality on the earnings distribution yields insight into the efficiency of the college selection process. College choices are made under significant uncertainty with respect to the resources available at each school, the quality of peers, and the academic fit. This uncertainty could cause students to choose colleges that lead to lower returns than if they had selected another college in their feasible choice set.2 Estimating the extent of heterogeneity in the returns to college quality is a first step in determining whether certain types of students are choosing colleges that do not maximize their return to college, as measured by earnings. That mean effects miss important aspects of how college quality impacts earnings also is consistent with analyses of large government programs, such as welfare reform and job training, which show much heterogeneity in program effects on different parts of the earnings distribution (Bitler, Gelbach, and Hoynes 2006; Heckman, Smith, and Clements 1997).
In this paper, we provide the first comprehensive study of the effects of college quality on the distribution of earnings, focusing in particular on public universities in Texas. A key component of our contribution is the employment of extremely detailed administrative data that link high school records with college records and up to three years of earnings data derived from unemployment insurance records. Because these data come from Texas, a populous and diverse state that has a large and heterogeneous public higher education system, we are able to generate precise estimates that are relatively rare in studies that examine distributional effects. The administrative high school data we use include rich information about student precollegiate academic ability, such as standardized test scores, information about students’ academic track, and the high school from which they graduated. Thus, this data set is unique in both the size of the sample and the scope of relevant background characteristics we observe about each individual.
Our methodological approach follows the unconditional quantile treatment effects (QTE) methods outlined in DiNardo, Fortin, and Lemieux (1996) and Firpo (2007).3 Similar to much of the prior literature on the returns to college quality, the main assumption under which our QTE estimates are identified is that our detailed academic and student background data are sufficient to control for the fact that students of higher ability select into and graduate from more elite schools and also have higher earnings potential absent their school choice. The data include a comprehensive set of controls that we argue account for such selection, particularly as it relates to academic preparation for college: The precollegiate academic information we observe about students is more detailed than has been used in previous “selection-on-observables” analyses of the effect of college quality on earnings. We additionally argue that the pattern of results and the robustness of our estimates to alternative controls, samples, and model constructions make it unlikely our results are seriously influenced by unobservable differences across students graduating from colleges of differing quality.
We proxy for college quality by partitioning the Texas higher education system into four groups: University of Texas at Austin (UT-Austin), Texas A&M University at College Station (TAMU), all other four-year public universities, and all public two-year colleges. The first two groups represent the two flagship schools in the state of Texas, and we split them up because UT-Austin is typically higher-ranked and because TAMU is highly focused on engineering and agriculture. Four-year public universities excluding UT-Austin and TAMU constitute our control group. We also examine how community college graduates’ earnings compare to this control group.
While our estimates of the mean impacts of college quality are similar to those from prior studies, the quantile treatment effects show evidence of large amounts of heterogeneity in the returns to college quality that vary across school type. Our findings suggest that the variance in the expected return to college is not a simple function of college quality. For UT-Austin graduates, the earnings premia generally increase across the earnings distribution, from a low of 2.7 percent to a high of 31.7 percent, while for Texas A&M graduates there is much less heterogeneity in the returns. We present evidence that differences in college majors between UT and TAMU graduates is a plausible explanation for the differences in returns experienced by these students, which suggests much of the upward sloping return profile is related to the majors students select. For community college graduates, the returns are mostly negative and tend to increase with one’s place in the earnings distribution. Notably, for about 15 percent of the distribution, the estimated returns to community college versus nonflagship, four-year graduation are close to zero in magnitude and are not statistically different from zero at the 5 percent level. Given the large cost differences between two-year and four-year schools, these results suggest that community colleges may be optimal for a significant subset of students who are both relatively high potential earners and who are choosing between a less-selective four-year school and a community college.
We also examine the effect of college quality on the distribution of earnings separately by race and ethnicity. This is among the first evidence to demonstrate that the returns to college quality vary by race because most data sets used in previous work lack minority samples of sufficient size to estimate such parameters with any precision. Our main findings show that the top of the earnings distributions of historically underrepresented minorities are most affected by UT-Austin graduation, although African American students also experience large returns at the lowest earnings percentiles. In contrast, at Texas A&M, black and Hispanic students experience a large positive level shift in earnings across the entire distribution. Among historically underrepresented minority groups at community colleges, earnings at the top percentiles shift out substantially, leading the higher earners to make more than their counterparts who graduated from a nonflagship public university. Thus, nonflagship four-year schools provide higher earnings to minority graduates at the bottom of the distribution relative to minorities who graduate from community colleges, but the opposite is true for top earners.
On the whole, we show evidence of extensive heterogeneity in the returns to graduating from postsecondary institutions of different quality. Our estimates demonstrate that mean impacts poorly characterize the breadth of returns a given student can expect as a function of college quality. These results underscore the need to understand how the variance of expected returns, rather than just the mean, affect student postsecondary attendance decisions across the quality spectrum. That college quality does not impact the earnings uniformly has distributional implications that should be taken into account when making higher education funding decisions, as our results suggest funding certain types of schools will have uneven effects on the distribution of earnings that may not be intended by policymakers.
II. Data
The data we use in this study come from three sources: administrative data from the Texas Education Agency (TEA), administrative data from the Texas Higher Education Coordinating Board (THECB), and quarterly earnings data from the Texas Workforce Commission (TWC). The data are housed at the Texas Schools Project, a University of Texas at Dallas Education Research Center (ERC). These data allow one to follow a Texas student from prekindergarten through college and into the workforce, provided individuals remain in Texas.
Because college quality is difficult to measure with a single variable or set of variables (Black and Smith, 2006), we follow much of the previous literature in proxying college quality by college sector (Brewer, Eide, and Ehrenberg 1999; Hoekstra 2009; Bound, Lovenheim, and Turner 2010, 2012; Lovenheim and Reynolds 2013). Due to data availability constraints, we focus only on public university graduates and stratify them into four comprehensive and mutually exclusive sectors: UT-Austin, Texas A&M at College Station,4 other four-year public universities (that is, nonflagship public universities), and community colleges. We examine UT-Austin and Texas A&M separately because they are the flagship universities of the State of Texas.
Table 1 shows the observable characteristics of the universities across these sectors. Both UT-Austin and Texas A&M generally have higher resources and quality measures than the other four-year and community college sectors. They both have much higher SAT scores, and spending per student is twice the amount spent in the nonflagship universities. The two flagship universities also graduate over twice the proportion of students as the other four-year colleges. However, in-state tuition (unadjusted for financial aid) is about $1,000 more per year to attend the flagship schools. Community colleges are cheaper to attend than four-year schools as well, but they have far fewer resources than the public four-year sector. Thus, our four sectors have large differences in resources and measurable college quality associated with them, and they also define the relevant college choices for most students in Texas due to the dominance of public universities in that state.5
We focus on male graduates from Texas’ public colleges and universities who graduated from high school during the years 1996–2002. The total sample size includes 94,071 male graduates, with 9,837 graduates from the University of Texas at Austin, 13,436 graduates from Texas A&M University-College Station, 47,935 graduates from Texas’ other four-year public colleges and universities, and 22,863 graduates from Texas’ community colleges.6 We only include males in the analysis because of the concern that many female college graduates are endogenously missing from the sample due to fertility decisions. The sample includes males who meet the following restrictions: (1) the student must graduate before the age of 25,7 (2) the graduate’s earnings for a given year are included only if he worked for four consecutive quarters in the year (or three quarters in 2009), and (3) the student must not be currently enrolled in graduate school when the earnings are measured. These restrictions are meant to isolate the earnings of full-time working males, and they are similar to the sample restrictions imposed by Hoekstra (2009).
We obtained records of each individual’s quarterly earnings from the TWC and examine earnings data for the years 2007–2008 and the first three quarters of 2009. Because these students graduated from high school between 1996 and 2002, they will be between 23 and 31 years of age postgraduation when we observe their earnings. Examining earnings of graduates in their early 20s may be problematic if college quality increases the returns to work experience. In such a circumstance, we will understate the earnings of college graduates from higher-quality schools relative to lower-quality schools. However, in Section IVD we estimate effects for the older sample who graduated from high school during the years 1996–98. The results using this sample are similar to the estimates for the sample as a whole, which is suggestive that the relative inexperience of our sample is not driving our results.8
We observe between three and 12 quarters of earnings for each sample member. In order to generate one earnings estimate for each respondent, we stack log quarterly earnings observations and regress them on year, quarter-of-year, and high school graduating cohort indicators. We use the within-graduate average of the residuals from this regression as the earnings measure in our empirical models. This method isolates the constant component of earnings for each individual over the period for which we observe his earnings and allows us to control for time- and cohort-specific shocks as well as for seasonality.
The data from the TEA consist of both individual- and high school-level information. The individual-level data include variables such as race / ethnicity, an indicator for whether the student has a college plan,9 participation in Title 1, whether the child receives free or reduced price meals, and the scores from the 11th grade reading, writing, and mathematics sections of the Texas Assessment of Academic Skills (TAAS). The high school-level data include enrollment, the ethnic composition of the school, and the percentage of the school that participates in talented and gifted programs. We obtain graduation status and timing from the THECB for each student as well.
The odd columns of Table 2 present summary statistics of individual characteristics for our analysis sample, separately by school type. As expected, the UT-Austin and TAMU graduates have higher high school test scores in every subject, and the community college students have the lowest average high school test scores. The flagship university graduates also are more likely to be in the top decile of their school in each of these tests and have fewer black, Hispanic, and economically disadvantaged peers than nonflagship universities and community colleges. Overall, Table 2 demonstrates that students attending these different school types differ on important observable characteristics that are likely to affect earnings. Our empirical strategy described below seeks to eliminate the differences in the earnings distributions across sectors that are due to the differences in these observable characteristics.
A main limitation of the data we use is that individuals only are in our sample if they graduated both from a Texas high school and a public Texas college. They also need to have at least three quarters of complete earnings data in Texas, which could be a limiting factor if students are in graduate school, if they leave the state, or if they do not work. Because both UT-Austin and Texas A&M have more of a national profile than other public universities in Texas, if these graduates are more likely to take a job in another state or if they are more likely to be attending graduate school, then our earnings distributions will be biased. Furthermore, if college quality has an effect on the extensive margin of labor supply, it could create endogenous sample biases in our estimates.
Table 3 shows the characteristics of those included and excluded from our analysis sample among graduates of each school type. Excluded observations are graduates for whom we never observe full-time work. As the table demonstrates, those excluded from the sample are very similar to those who are included. Those who are in the top 10 percent of their high school class in reading and writing are slightly more likely to be excluded, but the difference is only three percentage points and this difference is present in all school sectors. Within school type there are few differences in the observable characteristics of graduates included and excluded from the sample, and comparing the flagship and community college sectors to the nonflagship sector, there are no discernible differential patterns of exclusion. The fact that our sample is balanced with respect to whether earnings are present is summarized with the mean of predicted log earnings in Table 3. The predicted earnings are calculated by estimating a regression of log earnings residuals on all observable characteristics, separately by sector. We then predict log earnings using the resulting coefficients for the samples with and without earnings data in each sector. As the estimates in Table 3 indicate, these means are extremely similar, which suggests that the sample of men for whom we observe earnings is not systematically different from the sample of men for whom we do not observe earnings.
Finally, at the bottom of Table 3, we show the number and proportion of students included in and excluded from the earnings data. Although the proportion included increases across the table (and thus declines with observable college quality), the differences are not large. As the rest of the table shows, these different inclusion rates are uncorrelated with the rich set of observable characteristics in our data. The percentage of students excluded due to graduate school attendance also is very similar for UT-Austin and Texas A&M, and it actually is slightly higher in the nonflagship sector.10 The sum total of the evidence in Table 3 indicates that the sample restrictions we make are unlikely to create systematic biases in our earnings distributions for each school type. Tabulations by race and ethnicity, which are available upon request, show that observable characteristics among included and excluded students by school type do not differ substantially either for any of these groups.
To provide additional evidence that selective migration is not problematic, we use 2000 U.S. Census data where we can observe earnings for young adults who exit the state. We calculate log earnings among BA recipients who report having lived in Texas five years prior and who would have been of college age at the time (18–21). Panel A of Figure 1 shows the distribution of earnings in 2000 among BA recipients who are 23–26 and who lived in the Austin, Texas Metropolitan Statistical Area (MSA) in 1995. The earnings distributions for those in Texas in 2000 versus those not in Texas in 2000 are virtually identical. Although there is a slight divergence at the top of the distribution, earnings are higher among those currently residing in Texas. A similar pattern holds for 23–26-year-old BA recipients who lived in the College Station, Texas MSA in 1995 (Panel B). While out-of-state workers earned more at the bottom 20 percent of the distribution,11 the rest of the earnings distributions are very similar. We show below that the returns for Texas A&M are highest for lower earners. Panel B of Figure 1 suggests we may understate these returns slightly because we are unable to observe earnings from Texas A&M graduates who leave Texas. Finally, in Panel C, the earnings distributions for those who were in other areas of Texas are the same regardless of whether they currently live in Texas.12 Figure 1 therefore presents little evidence of systematic attrition by higher or lower earning students, which suggests our inability to observe earnings for non-Texas-resident workers is not driving our results.
III. Methodology
The goal of this analysis is to estimate unconditional quantile treatment effects of college quality on earnings. This method differs from the conditional quantile treatment effects literature (Koenker and Bassett 1978; Abadie, Angrist, and Imbens 2002; Chernozhukov and Hansen 2005) in the examination of treatment effects for each quantile of the marginal earnings distributions rather than the quantiles conditional on the covariates. The conditional quantiles are more difficult to interpret for policy purposes because they are unobserved, and thus conditional quantile treatment effects cannot be mapped simply into unconditional quantile treatment effects. We estimate the latter since we are interested in understanding how college quality affects the observed distribution of earnings.13
We estimate quantile treatment effects associated with graduating from each college sector relative to the public nonflagship four-year sector. We focus on graduation rather than on attendance because graduation is the outcome most likely to be observed (and rewarded) by the labor market. Furthermore, estimation of the effect of college quality attended on earnings is complicated by the fact that many dropouts do not accumulate many credits. So, even at high-quality schools, they will be relatively untreated by the university’s quality. Because college quality is associated with higher rates of graduation, even conditional on student background characteristics and preparation for college (Bound, Lovenheim, and Turner 2010; Rouse 1995), college graduates may constitute an endogenous sample. In Section IVD, we show that we obtain similar results when our sample consists of college attendees rather than graduates, particularly among four-year schools. Still, our main analysis focuses on graduates because we believe college graduation to be a more relevant postsecondary outcome for the labor market. In addition, it is unlikely that relative graduation rates are significantly influencing students’ choices about where to attend.
To demonstrate how we estimate the quantile treatment effects of graduating from a particular college sector on earnings, consider a two-sector higher education system. Students choose either UT-Austin (T = 1) or a nonflagship, four-year university (T = 0). As described in Firpo (2007), the quantile treatment effect on the treated for quantile τ can be written:
(1)
The inference problem faced in this analysis is that the counterfactual quantile for the treated sample, q0,τ|T =1, is unobserved. In order to estimate q0,τ|T=1, we generate counterfactual earnings distributions that show what the earnings distribution would be among the untreated group if the distribution of their observable characteristics were the same as in the treated group using the propensity score reweighting method proposed by DiNardo, Fortin, and Lemieux (1996). The approach consists of estimating a “selection” equation, which yields the estimated probability of treatment as a function of the observables, and then reweighting the control distribution by the predicted odds ratio of treatment. In order to show how this method can be used to estimate quantile treatment effects, we begin with the original derivation of the reweighting estimator as shown in DiNardo, Fortin, and Lemieux (1996) and then embed the counterfactual earnings distribution in the QTE estimator proposed by Firpo (2007).
Each graduate can be described by earnings, w, a vector of observable characteristics, X, and a treatment status, T. The joint distribution of earnings and observables conditional on treatment status is given by:
(2)
The density of earnings at UT-Austin can be calculated by integrating the earnings distribution over the distribution of observable characteristics for UT-Austin students:
(3)
where x1 is the distribution of observable characteristics among the treated (x1 = g(x|T = 1)). We want to estimate f(w|X = x1, T = 0), which is the counterfactual earnings distribution among those who were not treated that we would expect if their observable characteristics were identical to the observable characteristics of the treated group. DiNardo, Fortin, and Lemieux (1996) show that:
(4)
where
(5)
Applying Bayes’ rule, Equation 5 can be written:
(6)
Because P(T = 1|x) = 1 – P(T = 0|x), Equation 6 is the odds ratio of the conditional likelihood of treatment and ψ(x) are weights that we will estimate using our rich set of observable student characteristics. We then apply these weights to Equation 4 to generate a counterfactual distribution of earnings that would have been expected if the observable characteristics of students who graduated from nonflagship public universities in Texas were distributed the same as the observables of UT-Austin graduates. This method is akin to the “aggregate decomposition” described in Firpo, Fortin, and Lemieux (2011).
The estimated quantile treatment effect then can be written as:
(7)
Equation 7 is simply the difference between the unconditional quantiles of two marginal distributions: the observed treated distribution and the counterfactual untreated distribution. This difference identifies the quantile treatment effect for quantile τ under the assumption that the observable characteristics in our reweighting function given by Equation 6 are sufficient to control for the fact that the earnings distribution among UT-Austin graduates differs from the distribution among nonflagship graduates because they have different characteristics that are rewarded by the labor market.14
In order to adjust the nonflagship public university earnings distribution for the fact that students who graduate from these universities differ systematically from UT-Austin graduates in ways that affect future earnings, we leverage the extensive information in our administrative data on student backgrounds. The propensity score model that we use to generate the weights given by Equation 5 is a logistic regression of the probability a student graduates from a school in sector j(j ∈ {UT-Austin, TAMU, Community College}) relative to a nonflagship four-year university:
(8)
where X is a vector of individual background characteristics, T is a vector of high school test score controls, E is a set of high school education variables, and HS contains observed high school characteristics in the year the student graduated. The variables in X are student ethnicity / race (white, black, Asian, Hispanic), Title I status, English proficiency, and free and reduced price lunch status. We include flexible controls for high school test scores, including quartics of student scores on the Texas state math, reading and writing standardized exams all students take in high school. Using the high school students attend, we also control for each student’s relative rank within his or her school on each exam. Because we cannot observe GPA or class rank in our data, the relative rank variables control for the fact that higher-ranked students in each high school, conditional on test scores, are more likely to be admitted to higher-quality schools. The vector E contains information on high school educational programs, such as enrollment in gifted programs, special education, career and technology courses, whether the student had a college plan, and whether he was at risk of dropping out. Finally, we control for high school variables that measure the educational environment from which students came: school ethnic composition, the percentage of students in each economic status group, of gifted students, of students at risk, of Title I eligible students, as well as total school enrollment.
We use the methodology described above for each of the three “treatment” school types: UT-Austin, Texas A&M, and community colleges. For each treatment sector, we estimate a separate version of Equation 8 that includes the same independent variables but that uses separate indicator variables for whether a student graduated from a school in a given sector relative to a nonflagship public school. All versions of Equation 8 are estimated using logit models, and the predicted values from these logit models are used to construct the weights shown in Equation 6.15
In the even columns of Table 2, we show descriptive statistics for the nonflagship group that are weighted by the relative odds of graduating from each sector in Texas. Because our propensity score model contains a large number of variables and because we assume a logistic functional form, it is not assured that the methodology described above will balance each observable across treatment and control sectors. Comparing the even and odd columns in Table 2, however, shows that our method fully balances the covariates in each sector; in no case is there a large or statistically significant difference between the observed mean and the reweighted control group mean. At the bottom of the table, we calculate predicted log earnings in each sector using the observed graduates from that sector and all observed characteristics. We then predict log earnings for the treatment and control groups using the sector-specific coefficients. One can interpret these predicted log earnings as summary measures of the difference between the characteristics of the treatment and control groups as they relate to earnings. As the table demonstrates, not only are the means identical among the treated and weighted control groups, but the distributions are the same as well. Thus, our propensity score model produces the desired result: There is no predicted difference in the distribution of earnings across treatment and control groups. Any observed difference in the distribution of earnings must thus be due to school quality or to other factors that are uncorrelated with the observables in our model.16
Students’ decisions about where to attend college are based on several factors, including student academic ability, “noncognitive” skills, geographic location, parental income, and parental education. Most of these factors also correlate strongly with labor market outcomes, which means our selection-on-observables approach must be able to account for these different student characteristics in order to identify the causal effect of college quality on the distribution of earnings. The variables we include in our model represent powerful controls for the fact that students of higher academic ability are more likely to graduate from a higher-quality school and to earn more. Although “student ability” is very difficult to account for perfectly, our flexible controls for standardized test scores in three different subjects, each student’s place in their high school’s test score distribution for each exam, and a detailed set of information about each student’s high school composition and academic track are much stronger ability controls than have been used previously.17 Thus, our data allow us to account for student ability in a much more detailed manner than has been done in prior selection-on-observables papers.
In our baseline model, we have far fewer controls for the other factors listed above that influence both college quality and earnings. In Section IVD, however, we provide a series of robustness checks using different samples and controls that provide support for our empirical approach. In particular, we estimate models that include high school fixed effects as well as those that control for parental income and education.18 These estimates suggest our baseline controls adequately control for these factors. Unobserved student attributes, such as “student motivation” or “noncognitive” skills that are related to earnings are far more difficult to account for in our setting. At base, we can be confident that the earnings differences across the school quality distribution are not due to academic ability, geography, or parental socioeconomic status.
We argue that our estimates are inconsistent with the existence of biases from these unobserved variables as well for several reasons. First, the distributions of quantile treatment effects take different shapes across the two flagship schools, even though the admissions processes and standards at these schools are very similar. Second, given the likely correlation between parental income / education as well as one’s high school and noncognitive skills, our results would not be robust to including such controls if these student unobservables were a primary driver of our estimates. Third, our mean estimates are similar to, if somewhat smaller than, the estimates from previous work using other sources of identification that are less prone to such biases (for example, Hoekstra 2009, Zimmerman 2014). Together, these arguments suggest our control variables are sufficient to account for the selection of students with higher earnings power into high-quality colleges.
While we present extensive evidence that our estimates are not seriously biased by unobserved student attributes, it still is important to consider what drives the residual selection of students into colleges of differing quality in assessing the validity of our results. Currently, there is little understanding of why students select different school types. Although students are sensitive to quality differences among schools (Long 2004, Avery and Hoxby 2004), this is not the only factor that matters. Choices also rely on idiosyncratic preferences (such as whether a relative attended the school, the quality of the campus visit, the quality of the sports teams), the behavior of one’s peers (for example, whether many peers attend this school), and preferences for different courses of study or educational environments. Due to the detailed set of controls for student academic ability and background we include in our model, we argue these three sources of variation are the predominant residual determinants of college sector selection. Crucially, these factors all are unlikely to be correlated with underlying earnings potential.
In our data, the change in access to flagship universities that accompanied the implementation of the Top Ten Percent Rule in 1998 also is a potential source of variation in college selection.19 Such variation is important to consider because it could have induced large changes in student sorting across the college quality distribution in the state. Unfortunately, our data do not include class rank, so we cannot control for exposure to the Top Ten Percent Rule directly. Post-1998, the data include an indicator of whether students are admitted to a given university under the Top Ten Percent Rule, which on its own is highly predictive of attending UT-Austin and Texas A&M. However, conditional on the relative test score rank controls in our model, this variable loses its predictive power, suggesting that controlling for relative rank on standardized tests is sufficient to account for the Top Ten Percent Rule and for the effects of student relative rank on flagship admission. This result also suggests that our controls are indeed highly correlated with college sector choices of students. Furthermore, if we control for whether one is admitted through the Top Ten Percent Rule as well as interactions of all variables with a post-1998 indicator variable, our estimates are unchanged.20 Thus, there is little evidence in our data that the imposition of the Top Ten Percent Rule biases our estimates by altering the relationship between student ability and college choice in a manner that we cannot measure.
As implied by Equation 6, the estimated propensity scores must be less than one because the weights are not defined for those who are predicted with certainty to attend a given type of college.21 Our propensity score models generate predicted probabilities of less than one for every individual in our sample. Similar to matching estimators, there also must be overlap of the propensity score distributions among the treated and control observations.22 Without overlap of the propensity scores, there will be individuals in the treated group for whom there are no observably equivalent individuals in the control group. Thus, it would not be possible to construct a counterfactual earnings distribution that would occur if the distribution of observables in the control group was the same as in the treated group because there would be parts of the joint distribution of observable characteristics in the treated group that are absent in the control group. The presence of such nonoverlap in observable characteristics will cause a bias in the estimation of the counterfactual earnings distributions.
Figure 2 presents estimated propensity scores from Equation 8, estimated for each school type separately with the nonflagship universities as the control group. Each panel shows the number of individuals in each grouping of estimated propensity scores among treated and control observations. Due to confidentiality concerns, we are unable to present means calculated with fewer than ten individuals, so these propensity score groups are the smallest equal-sized bins we could construct between zero and one. For no school types are there gaps in the estimated propensity scores, and even for those who have very high and low estimated probabilities of attending each school type there are those in the same propensity score range in both the treated and control groups. Although our propensity score models are based on a large number of observable characteristics that are designed to control for student selection into different school quality types, we have sufficient overlap of the predicted likelihood of treatment among treated and control groups to estimate Equation 7.23
IV. Results
A. Mean Effects
Before estimating the quantile treatment effects, it is instructive to examine mean effects of college quality on earnings as a point of comparison with the rest of the literature. Table 4 shows estimated mean impacts, regressing log earnings residuals on treatment indicators while adding in controls sequentially. We first show estimates from bivariate regressions, and the mean effects all are large and statistically significantly different from zero at the 1 percent level. When we add in our test score controls, the estimates become smaller in magnitude. We then add in student demographic and background characteristics and finally include school-cohort average demographic characteristics. These variables further attenuate the estimates. However, conditional on the test score controls, the other variables do not have large impacts on the results. We interpret this as evidence that our test score measures are powerful controls for selection into school types because the demographic, school experience, and school composition variables are all independently correlated with college sector and with future earnings.
Focusing on the estimates with a full set of controls, the mean effect for UT-Austin is 11.5 percent. This estimate is somewhat smaller than in previous work that estimates the returns to attending an elite public university.24 However, it is still positive, statistically different from zero at the 1 percent level, and sizable in magnitude. The mean effect for Texas A&M relative to nonflagship public universities is much higher at 21.2 percent. This estimate is similar to the results from the existing literature, most notably those in Hoekstra (2009) that do not rely on selection-on-observables assumptions. Finally, for community colleges we estimate a mean effect of –10.0 percent on earnings, which is somewhat larger in magnitude relative to previous findings of the effects of community college enrollment on earnings.25 Overall, the similarity of our mean estimates to prior work is inconsistent with the existence of large biases due to unobserved student attributes in our results. But we emphasize that this conclusion is only suggestive due to both the sample differences and the differences in counterfactuals that make cross-study comparisons difficult.
B. Quantile Treatment Effect Estimates
The purpose of this analysis is to determine the impact of college quality on the distribution of earnings. As described in Section III, the quantile treatment effect estimator we employ shows the difference between the observed earnings distribution in a given “treated” postsecondary sector relative to the counterfactual distribution in the nonflagship sector if the distribution of observables in this sector were the same as in the “treated” sector. Appendix Figure A126 shows the observed distributions in each sector as well as the counterfactual distributions that have been adjusted for observables. The QTE estimator provides a way to describe these distributional shifts.
The baseline QTE estimates are shown in Figure 3. In each panel, the solid line is comprised of 99 quantile treatment effects for each percentile between one and 99 that are the difference between the observed UT-Austin / TAMU / community college distributions and the associated counterfactual other four-year earnings distributions.27 The horizontal dotted lines are the 95 percent confidence intervals of the mean estimates from the “All Controls” specification shown in Table 4, and the dashed lines show the bounds of the 95 percent confidence intervals that are estimated by bootstrapping the entire estimation procedure and plotting the 2.5th and 97.5th percentiles of the bootstrapped earnings differences at each percentile. This bootstrapping procedure takes 200 random samples from the data, with replacement, and conducts the entire estimation procedure on each one. In this way, the confidence intervals account for the fact that the weights used to construct the counterfactual earnings distribution are estimated. Quantile treatment effects for each fifth percentile of the distribution together with bootstrapped confidence intervals are shown in Column i of appendix Tables A1–A3 for UT-Austin, Texas A&M, and community colleges, respectively.
As shown in Panel A of Figure 3, the mean estimates do a poor job of characterizing the earnings premia experienced by most UT-Austin graduates: the quantile treatment effects are outside the mean confidence interval for over 70 percent of the distribution. The effect of UT-Austin graduation relative to nonflagship graduation is decreasing in the first decile, from 12.0 percent to 3.4 percent. However, these estimates are not statistically different from zero at the 5 percent level. After the tenth percentile, the premium for UT-Austin graduation increases dramatically across the earnings distribution. From the tenth percentile premium of 3.4 percent, it increases to 6.3 percent at the 25th percentile, to 12.1 percent at the median, to 16.8 percent at the 75th percentile, and are over 19 percent above the 90th percentile. The effects are largest at the 97th percentile, at 31.6 percent. Thus, UT-Austin graduation increases earnings more for those higher in the earnings distribution, suggesting that this university is particularly lucrative for top earners.
The shape of Texas A&M estimates in Panel B is much different from that of UT-Austin. For Texas A&M, the largest earnings effects are at the bottom of the distribution, with an estimate of 36.4 percent at the first percentile. Although the coefficients at the lowest part of the distribution are not very precisely estimated, the lower bound of the 95 percent confidence interval is 25.4 percent at the first percentile, and the bottom 20 percent of the distribution is larger than the upper confidence interval bound of the mean. The estimated effects decline until the 30th percentile, where they are 20.6 percent. They remain fairly stable between the 28th and 90th percentiles, ranging between 17.6 and 21.8 percent, after which they increase to 22.8 percent at the 99th percentile. The impact of TAMU graduation on the distribution of earnings clearly is more stable across the distribution than for UT-Austin graduation. The standard deviation of the estimates across quantiles in Panel B is 0.037 while in Panel A it is 0.066 (a 78 percent difference).
That the slopes of the earnings premia for Texas A&M and UT-Austin differ in sign is further evidence against biases from omitted student characteristics. For example, it might be the case that those at the top of the earnings distribution are more motivated, attend higher-quality schools, and earn more. While such a story would explain the upward sloping returns at UT-Austin, it cannot explain why the returns largely decline with earnings at Texas A&M, especially because students at the two flagship schools are similar on observables (see Table 2). Similarly, if student motivation is a substitute for college quality, it should produce the pattern of returns for TAMU but not the one observed at UT-Austin.
A question of interest, then, is why the Texas A&M estimates are different than those for UT-Austin. As Tables 1–3 show, the observable characteristics of these two flagship schools and the students who graduate from them are similar. However, students who graduate from these schools major in very different subjects, as shown in Figure 4. For example, over 44 percent of the undergraduate degrees awarded at Texas A&M are in engineering and agriculture whereas only 19 percent of the degrees awarded to undergraduates at UT-Austin are in those subjects.28 In contrast, 62 percent of the degrees at UT-Austin are in liberal arts, social sciences, communication, math / computer science, and science while only 36 percent of the undergraduate degrees awarded by TAMU are in these subjects.
Using the same log earnings residuals used in the quantile treatment effect estimates for all male college graduates in Texas, Table 5 shows that agriculture and engineering majors earn more on average but have a lower variance of earnings than those in the majors favored by UT-Austin graduates. Consequently, when we control for college major dummies in the weighting logits, the differences in the QTE estimates between Texas A&M and UT-Austin largely disappear. These estimates are shown in Figure 5 along with the associated mean confidence interval bands. Although the Texas A&M distribution is very similar to the one shown in Figure 3, the bottom of the UT-Austin distribution has been shifted upward significantly. Although the earnings premia at the top of the UT-Austin distribution still are higher than at Texas A&M, from an accounting standpoint, the majority of the differences across the flagship universities can be explained by differences in college majors: UT-Austin students major in areas that have lower average earnings and higher earnings variance than TAMU students. Of course, major selection is itself endogenous, and we are hesitant to draw too strong a conclusion from Figure 5 because our controls may not be sufficient to account for the selection of students into majors across schools.29 But, these estimates also point to major choice as a key variable in understanding the returns to enrolling in specific colleges that has not been explored adequately in the literature assessing the returns to college quality. That the returns to both flagships are universally positive and are almost universally statistically different from zero at the 5 percent level suggests that all students graduating from either school can expect positive gross earnings returns on their investment. However, the size and variance of the expected premium is a function of one’s major as well.
In Panel C of Figure 3, we show the effect of graduating from a community college relative to graduating from a nonflagship public university on the distribution of earnings. For nearly the entire distribution of earnings, earnings for community college graduates are below those of nonflagship public university graduates. However, the estimates steadily approach zero as we move across the earnings distribution. The estimates are negative and statistically significant below the 84th percentile, ranging from –21 to –1 percent. From the 85th through the 91st percentiles, the returns continue to be negative, but they are small in magnitude and are not statistically differentiable from zero. The estimates for the 92nd through the 97th percentiles are positive, with the 95th and 96th percentiles effects being statistically different from zero at the 5 percent level. Although these positive estimates are relatively small in magnitude, the results in Panel C suggest that the earnings penalty from community college versus nonflagship four-year graduation only applies to the lower part of the earnings distribution.30 Thus, graduating from a community college is particularly deleterious for the bottom of the distribution. These estimates are suggestive that the previous literature that estimates negative effects of community college attendance or graduation on earnings is driven by the lower part of the earnings distribution (Reynolds 2012, Kane and Rouse 1995): The mean estimate of –10.0 percent is a very imprecise estimate of the expected return for a randomly selected community college graduate. Given the large price differences between the two-year sector and the four-year sector, which the tuition estimates in Table 1 likely understate due to the higher prevalence of commuter students at community colleges, the results in Panel C of Figure 3 suggest that it may be optimal for a nontrivial proportion of students to attend a community college in Texas rather than a four-year nonflagship university in Texas.31
To better understand who benefits from graduating from a more elite college, it would be useful to determine whether the earnings premium varies across the socioeconomic distribution. Although we do not observe parental income for a proportion of our sample, we observe student race and ethnicity, which is correlated with socioeconomic status. Due to the large disparities in higher education attainment for African American and Hispanic students relative to white and Asian students, the differential returns to college quality faced by these groups are of much interest. Such differences could be driven by mismatch, by differential academic preparation for college, by differences in major selection, and by labor market discrimination. Most previous work has not been able to identify how college quality earnings premia vary across racial and ethnic backgrounds because of data limitations. Despite the fact that the proportion of black and Hispanic students at UT-Austin and Texas A&M is low (see Table 2), we are able to present some of the first evidence of college quality returns for these groups.
C. Quantile Treatment Effects by Race / Ethnicity
Table 4 presents mean effects of college sector on earnings for white, black, Asian, and Hispanic students. At UT-Austin, white and Asian students have the highest mean returns, at 12.9 percent and 13.8 percent, respectively. Black and Hispanic students experience the lowest earnings premia—3.4 percent for black graduates and 3.5 percent for Hispanic graduates. At Texas A&M, however, the mean returns are both higher and more similar across groups, ranging from 17.9 percent to 21.9 percent. The differences between the earnings premia for black and Hispanic students across UT-Austin and TAMU mirror those for the whole sample show in Section IVA. Among white, black, and Hispanic community college graduates, the returns are between –8.5 and –12.3 percent. However, for Asian students, the average earnings penalty for community college graduation relative to nonflagship public four-year graduation is –27.1 percent.
These mean estimates are suggestive of a large amount of heterogeneity across sector and student groups in the returns to college quality. We now present quantile treatment effects for each sector and student race / ethnic group to examine how well these mean effects describe the earnings distribution effects for these students. In Figures 6–8, we present only the QTE estimates; appendix Figures A2–A4 show results that include confidence intervals and mean effects, and estimates for each decile along with bootstrapped 95 percent confidence intervals are shown in appendix Table A5.32
Figure 6 shows QTE estimates for UT-Austin separately for white, black, Asian, and Hispanic graduates. The distribution of returns for white students is very similar to the overall sample, showing modest effects low in the distribution and large effects at the top of the distribution. For black and Hispanic graduates, however, the effects on the earnings distributions are quite different.33 Aside from a large increase at the lower percentiles for African Americans, the returns for these underrepresented minority groups are small and often are not statistically significant across much of the earnings distribution. Returns rise substantially at the top of the distribution, matching those for whites, but for most of the bottom 70 percent of the distribution the earnings returns to UT-Austin are small. In contrast, the earnings premium among Asian students graduating from UT-Austin is consistently large across the earnings distribution. Thus, for all but Asian students, the mean estimates do a poor job of characterizing how the earnings distribution is affected by college quality. And, for historically underrepresented minorities, while the average returns are rather low, for portions of the earnings distribution the returns are quite large.
Similar to the estimates in Figure 3, the estimates across all four groups exhibit less variability for Texas A&M than for UT-Austin, as shown in Figure 7. Along most of the earnings distribution, the returns for Asian, Hispanic, and black students are nearly identical. The white distribution is very similar to the overall distribution shown in Figure 3, and it is somewhat larger than the returns for the other groups between the 20th and 80th percentiles. As with the UT-Austin estimates, African Americans at the bottom of the earnings distribution experience a large increase in earnings. However, as appendix Table A5 and Figure A3 show, these estimates are quite imprecise.
Figure 8 shows earnings penalties from finishing at a community college rather than finishing at a four-year nonflagship university stratified by race and ethnicity. The general pattern of estimates are similar across all groups, with graduates earning much less than nonflagship four-year graduates at the bottom of the earnings distribution and this earnings penalty reducing until it either disappears or reverses at the top of the earnings distribution. The slope of the QTEs is steepest among Hispanics and is flattest among African Americans. However, both of these underrepresented minority groups experience positive returns to community college graduation at the top of the earnings distribution. For African Americans in particular, these returns are sizable, reaching around 16 percent. Among Asian students, community college graduation is associated with lower earnings at every point in the earnings distribution relative to nonflagship four-year graduation. Although the confidence intervals are large (see appendix Table A5 and Figure A4), the negative returns are more pronounced in the lower part of the earnings distribution. Unlike for the other ethnic groups, these returns do not go to zero or become positive high up in the earnings distribution, although the earnings penalty does attenuate for high earners.
The finding that mean community college returns are negative but are positive for the upper portion of the distribution of earnings, particularly for black and Hispanic students, highlights the importance of examining the effect of college quality on the entire distribution of earnings rather than just focusing on the mean. In particular, the lower mean effects for black and Hispanic students relative to white students shown in Table 4 mask the larger negative shift in the middle part of the distribution and the bigger positive shift at the top. Considering both the lower time commitment and the lower cost of community colleges relative to the nonflagship four-year public universities in Texas, the net returns to community college graduation are likely positive for much of the upper portion of the earnings distribution for black and Hispanic college graduates.
D. Robustness Checks
As discussed in Section III, the validity of our estimates rests on the ability of the observable characteristics in our model to control for the selection of students with higher underlying earnings power into higher-quality schools. In this section, we assess the robustness of our estimates to several different modeling assumptions. We show these estimates in Figure 9. For parsimony, we have not included confidence intervals in this figure, but they are shown in appendix Tables A1–A3. First, we include high school fixed effects in the propensity score model. Because one’s high school likely is correlated with unobserved ability, motivation, and noncognitive skills, as well as the fact that one’s geographic location can influence both college choice and earnings, these results will lend insight into remaining selection bias in our preferred estimates. The drawback of this model is that we lose many control group observations because there are numerous high schools in Texas that send no students to Texas A&M and / or UT-Austin. For all three school types, the estimates are virtually identical to those without high school fixed effects.
While our data contain a rich set of covariates with which to control for selection, we do not include parental income and parental education in our baseline estimates. This omission stems from the large volume of missing data for these variables due to the fact that only students who apply to elite schools in Texas are required to supply such information. We show estimates for the sample of students that have parental income and education data but excluding these variables and then for this sample including these variables. We are unable to do this robustness check for community college students because too few students provide parental background information. However, as Figure 9 and appendix Tables A1 and A2 show, the estimates for the sample of students who provide this information are virtually unchanged when parental income and education are included or excluded. Furthermore, despite the endogeneity of the reporting of these variables, the estimates for this sample are very similar to the baseline estimates. Given the strong correlations between family background, student ability, schooling and earnings, that adding parental education and income does not influence the quantile treatment effect estimates suggests that the set of controls in our baseline model, which we use for the entire sample, does a good job of accounting for the influence of student ability (including noncognitive skills).
The earnings data we use for this analysis come from earnings when graduates in our sample are in their mid-20s and early 30s. However, if college quality affects the returns to experience, examining earnings differences for recent graduates may yield misleading estimates of the effect of college quality on long-run earnings. In order to examine whether our estimates are sensitive to the timing of when earnings are measured, Figure 9 shows quantile treatment effects using the oldest cohorts: those who graduate between 1996 and 1998 and thus who are between 27 and 31 years old in 2007–2009 when earnings are measured. These students are also unaffected by the Texas Top Ten Percent Rule, so these estimates provide a check that this admissions regime change does not seriously bias our estimates. For UT-Austin and Texas A&M, the estimates using the older sample match the estimates from the whole sample very closely above the 40th percentile of the earnings distribution. Below the 40th percentile, the older workers experience slightly lower returns, suggesting that our baseline sample understates the amount of heterogeneity in returns. For UT-Austin, the estimates are negative below the 15th percentile; however, these estimates are statistically insignificant. Aside from the estimates below the 15th percentile for UT-Austin, the quantile treatment effects of UT-Austin and Texas A&M graduation are similar for the 1996–98 sample and for the full sample.
Among community college graduates, the differences are somewhat larger between the two samples, as shown in Panel C of Figure 9. As with the estimates for TAMU and UT-Austin, the estimates for the 1996–98 sample are below those for the full sample at the bottom of the distribution. Between the tenth and 65th percentiles, the estimates are very similar. However, above the 65th percentile, the early cohort estimates approach zero and then become more negative at the very top, while the full sample estimates approach zero higher up in the distribution. Despite these differences, these estimates are qualitatively similar, and the differences in magnitudes of the returns at the top of the earnings distribution are small. Using the early cohort versus the full sample does not alter the main conclusions drawn from the community college results that for the upper portion of the earnings distribution the returns to community college graduation relative to four-year nonflagship graduation are close to zero and likely are positive once one accounts for the cost differences across school types.
Throughout this analysis, we focus on the returns to college quality among college graduates because graduation is a highly salient signal for employers and because examining only enrollment means the intensity of college quality treatment varies significantly across individuals based on how many credits they receive. Furthermore, for community college students, the graduate sample is likely to be much more similar to BA recipients than is the attendee sample, as many community college students who drop out have loose attachments to the postsecondary sector and do not intend to obtain a BA. However, if the likelihood of graduating is affected by the college sector one attends, as suggested by Bound, Lovenheim, and Turner (2010) and Rouse (1995), then examining only graduates could generate misleading conclusions about the return to college quality. We now turn to a series of robustness checks that provide some evidence on how sensitive our results are to conditioning on degree recipients.
First, we control for residualized GPA in the first year of college in order account for the fact that the lower tail of the academic ability distribution likely is dropping out. We estimate GPA in each student’s first year of college on a series of college indicators in order to account for different grading standards across schools. We then take the residual from this regression and include it in our selection logit model. As we only observe GPA data for 2000–2002, this analysis is restricted to those cohorts. The appendix Table A7 reports the results for this sample with and without residualized GPA. The baseline estimates are very similar to those shown in appendix Tables A1–A3, except for an attenuated estimate at the bottom of the community college distribution and a larger estimate at the very top. Importantly, the estimates are insensitive to controlling for residualized GPA, which is inconsistent with differential graduation rates across school types, at least as predicted by early college performance, biasing our estimates.34
Second, we estimate quantile treatment effects of college quality on earnings for the sample of college attendees, assigning school types based on the institution in which each individual earned the most credits.35 As shown in Figure 9 and appendix Tables A1–A3, for UT-Austin and Texas A&M graduates, the results using attendees are similar to those using graduates. The main differences come at the bottom of the distributions, where the returns for attendees are around three to five percentage points higher. But, the main qualitative and quantitative results remain, with the returns for UT-Austin students increasing across the income distribution and the returns for Texas A&M students declining.36
For community college attendees, there is more of a divergence from the baseline estimates. Below the 40th percentile, the returns for community college attendees lie above the returns for college graduates, while above the 45th percentile the returns for community college attendees are lower than the returns for two-year graduates. However, near the top of the distribution, the returns among attendees approach zero, with returns higher than –5 percent above the 90th percentile of the earnings distribution. The differences in these results most likely can be attributed to the fact that completion rates at community colleges are very low.37 Thus, graduates and noncompleters are likely to differ substantially with respect to future earnings, with the inclusion of community college dropouts shifting the community college earnings distribution downward. The low completion rate at community colleges also makes the selection problem more difficult to solve among the attendee sample, as the community college dropouts are probably less similar to four-year college attendees on observable and unobservable characteristics. Although the top of the attendee distribution approaches zero and the estimates become statistically indistinguishable from zero, the attendee returns approach zero higher in the distribution than in the graduate sample and never become positive. One implication of this finding is that there may be substantial returns to increasing completion rates among community college students. Estimating the distributional returns to increasing community college completion rates deserves more attention in future research.
As a final robustness check, we include in our analysis all earnings for individuals who are not contemporaneously enrolled in graduate school. This robustness check relaxes the constraint that an individual must work for three to four consecutive quarters for their earnings to be included. We do not favor this method of measuring earnings because we want to measure lifetime earnings differences as best we can with our data. Including earnings of those who may work very few hours part time or who may work part time and then leave the state or the labor market may yield a misleading picture of permanent earnings differences by school type. Using all earnings reduces the returns among UT-Austin graduates at the bottom of the distribution. Thus, we are understating the amount of heterogeneity in returns by examining only full-time earnings. The Texas A&M returns are extremely similar to the baseline estimates, and for community colleges the returns are less negative at the bottom of the distribution. However, we still find that the returns become positive at the top of the distribution for community college graduates in this sample. These estimates demonstrate that our main results and conclusions are not being driven by analyzing earnings from full-time workers only.38
V. Conclusion
This paper estimates quantile treatment effects of college quality on earnings using administrative data on schooling and earnings from the state of Texas. We measure quality using the public college sector in Texas, examining the effects of UT-Austin, Texas A&M, and community college graduation on the distribution of earnings relative to earnings for nonflagship public four-year university graduates. Although our mean estimates are consistent with previous work in this area, our quantile estimates demonstrate a large amount of heterogeneity in the earnings premium from college quality. At UT-Austin, the premia are roughly increasing with earnings, although the opposite pattern is exhibited among Texas A&M graduates. We argue that differences in the courses of study across these schools is a potential explanation for this difference, but these results indicate that much work remains in understanding how the characteristics of a particular university map into the distribution of earnings for graduates. At community colleges, we find an overall negative effect on earnings but show that there is significant heterogeneity in the returns and uncover evidence of positive returns at the very top of the distribution.
Our data also allow us to examine returns separately by race and ethnicity, which previous work has not been able to do because of data limitations. For black and Hispanic students, who are historically underrepresented in higher education in general and at high-quality universities in particular, earnings premia are low for UT-Austin graduates except at the very top of the earnings distribution but are consistently high among Texas A&M graduates. The returns to community college graduation is negative on average; however, for black and Hispanic community college graduates, we find large and positive returns relative to nonflagship public graduates of the same race and ethnicity at the top of the earnings distribution.
In drawing attention to the large amount of heterogeneity in college quality earnings premia and the differences in the quantile treatment effects across school types, this paper demonstrates the importance of considering more than the average treatment effect of college quality on earnings. Even if educational choices made by students are based on such averages, our estimates suggest that these averages mask significant uncertainty of the returns for any given student. The main policy implications of this work are twofold. First, policies seeking to induce students to attend four-year universities and more selective colleges should pay attention to the distribution of returns, not simply the average, in order to target students who will benefit most from changing their attendance behavior. Second, the large public subsidies to higher education do not benefit all students equally, even within school quality tiers. Future work that focuses on identifying which students face the largest predicted returns to attending different school types is needed to help guide how to best fund public universities to support human capital development as well as to develop policies that facilitate the efficient sorting of students into different schools.
Footnotes
Rodney J. Andrews is an assistant professor of economics at the University of Texas at Dallas and a faculty research fellow at the National Bureau of Economic Research. Jing Li is a research assistant at the University of Texas at Dallas. Michael F. Lovenheim is an associate professor of policy analysis and management at Cornell University and a faculty research fellow at the National Bureau of Economic Research.
↵1. Zimmerman (2014) finds similarly sized effects using a regression discontinuity design at a lower-quality public university.
↵2. This is commonly referred to as “mismatch.” See Arcidiacono and Lovenheim (2014) and Dillon and Smith (2013) for a discussion of the evidence on mismatch in postsecondary education in the United States.
↵3. Firpo (2007) distinguishes between the “quantile treatment effect,” which is the quantile analog to the average treatment effect and the “quantile treatment on the treated” that is the quantile analog to the average treatment effect on the treated. We will use the term quantile treatment effect to refer to the quantile treatment effect on the treated, as that is the parameter we are able to identify.
↵4. Hereafter, we will refer to Texas A&M at College Station as “Texas A&M” or “TAMU.” This university is to be distinguished from the other Texas A&M campuses, which are part of the other four-year sector.
↵5. Public postsecondary schools dominate the higher education market in Texas. In the National Longitudinal Study of 1988, only 9.6 percent of Texas high school graduates who went to college attended a private college. Estimates from first-time, first-year students in IPEDS show about 15 percent of Texas students attend a private school. Thus, our focus on public schools is appropriate given the small proportion of students who enroll in private universities in Texas.
↵6. We treat transfer students who graduate from a given school the same as “direct attendees” who only attend that school. See Andrews, Li, and Lovenheim (2014) for detailed information on student transferring patterns and correlations with earnings in Texas over this time period.
↵7. Among the sample who attend college within two years of high school, the vast majority graduate by the age of 25 with only small differences across sectors. For example, among those who attend within two years and obtain a postsecondary degree by the time they are 30, 90 percent graduate by the time they are 25. The proportions are somewhat larger at UT-Austin and Texas A&M, at 95.7 percent and 97.5 percent, respectively. In the nonflagship four-year sector, 87.7 percent who graduate by the age of 30 do so by 25, and 86.5 percent of community college students do so. There is growing enrollment in community colleges among older students, however, which we do not consider in our analysis. See Jacobson, LaLonde, and Sullivan (2005) and Jepsen, Troske, and Coomes (2014) for some recent estimates of average returns to community college training among older students.
↵8. The large increase in unemployment rates during this period in the United States also is a potential concern. Because our data only include full-time workers, the recession may cause us to overstate quality premia to the extent that unemployment increases were inversely proportional to college quality. However, the recession in Texas was relatively mild: The average unemployment rate was 5.4 percent between 2004 and 2006 and was 5.6 percent between 2007 and 2009. In contrast, for the United States as a whole, the unemployment rate increased from 5.1 percent to 6.6 percent across these two periods. Furthermore, we show below that our results and conclusion are robust to including all earnings observations, not just those from full-time workers.
↵9. The “college plan” variable is an indicator that assumes a value of one if a student plans within one year of graduation to enter college in a program that leads to either an Associate’s Degree or a Bachelor’s Degree. This information comes from student reports in the Public Education Information Management System, which is collected by the TEA.
↵10. This result may be due, in part, to the fact that we only observe graduate school attendance if it is within the state of Texas. More graduates from nonflagship universities who attend graduate schools probably do so in-state.
↵11. The difference at the bottom of the distribution could be due to the inclusion of Blinn College students. Blinn College is a two-year school with a very high four-year transfer rate. If those students are lower earners and are more likely to remain in-state, the presence of these students in this sample could cause a divergence in earnings at the bottom of the distribution.
↵12. We also do not find evidence of differences in the likelihood of having any earnings in the Census by school type.
↵13. See Angrist, Chernozhukov, and Fernandez-Val (2006) for conditional quantile treatment effects of education on wages. Carneiro, Hansen, and Heckman (2003) also show substantial heterogeneity and uncertainty in the returns to attending college.
↵14. Much of the QTE literature discusses the role of the rank-permanence assumption as well, which requires that the treatment must not change individuals’ relative places in the earnings distribution (Doksum 1974; Lehmannn 1974; Firpo 2007; Bitler, Gelbach, and Hoynes 2006). This assumption is necessary in order to interpret QTE estimates as identifying the distribution of treatment effects rather than the effect of the treatment on the distribution of outcomes. That is, our estimates identify the effect of a given school type on median earnings, but in order to interpret them as the median treatment effect rank permanence is needed. Our analysis focuses on estimating how college quality affects the distribution of earnings, which is what is most relevant for welfare analysis and which does not require rank permanence to hold.
↵15. Because students make one decision over a set of schools, it may be more appropriate to estimate the weighting function using a multinomial logit model of the choice between all college sectors. We have implemented this model for the choice among our three four-year public school sectors and the estimates, which are available upon request, are indistinguishable from those shown below.
↵16. We also have estimated more flexible models that include interactions between each race / ethnicity and demographic characteristic with each of the three exam scores. These estimates, shown in the online Appendix Table A4, are virtually identical to our baseline results that use a more parsimonious model. The robustness of our estimates to using a more flexible model is consistent with the results in Table 2 that show the weights obtained from estimating logistic regressions of Equation 8 balance the distribution of observables across school types as they relate to earnings.
↵17. Black and Smith (2004, 2006) use the 1979 National Longitudinal Survey of Youth (NLSY79) for their analyses. The main ability measure in those data are the Armed Forces Qualifying Test. Brewer, Eide, and Ehrenberg (1999) use the National Longitudinal Survey of the High School Class of 1972 and High School and Beyond in their analysis, which contain exams developed by the National Center for Education Statistics. Although each of these data sets contain strong measures of student academic aptitude at the end of high school, our data contain a larger battery of exams as well as within-school rank on each exam and high school-level demographic information that is not present in these other data sets.
↵18. Parental education and income data come from college application materials, and only the more selective schools asked students for this information. Analyses using these data therefore are restricted to four-year college students.
↵19. The Top Ten Percent Rule—Texas House Bill 588—is an admissions algorithm that grants admission at any public university in Texas to Texas high school graduates who finish in the top decile of their graduating class and apply to college within two years of finishing high school. It also permits each college or university to decide, on an annual basis, whether or not to offer automatic admission to students who finish in the top quartile. Prior to 1998, admissions included affirmative action considerations, which allowed colleges to consider factors such as the applicant’s race, academic performance, class rank, curriculum, and standardized test scores. To our knowledge, there were no strict formulas relating these factors to admission decisions.
↵20. These estimates are available from the authors upon request.
↵21. Note as well that those with zero predicted likelihood of attending each college type will receive a weight of zero. These observations are effectively excluded from the analysis.
↵22. See Smith and Todd (2005) for a detailed discussion of this issue with respect to matching estimators.
↵23. Flores and Mitnik (2013) present a method for generating common support among treatment and control groups when there are multiple treatments. They emphasize that in order to compare estimates for different treatments, the same sample with general common support needs to be used. Their method is to delete observations sequentially that are not in the common support for each treatment, such that the final sample is the sample for which there is support with respect to each treatment individually. Since we do not exclude any observations for any treatment due to the lack of common support among treatment and controls, this sequential support method leaves exactly the same analysis sample as we use in the analysis. Because of this feature of our data, comparisons of the quantile treatment effects across treatment sectors are valid.
↵24. Hoekstra (2009) estimates that the earnings effect of attending an unnamed flagship state university is 24 percent. Brewer, Eide, and Ehrenberg (1999) estimate a mean return of 26 percent. Note that both of these papers focus on attendance, while we examine the effect of graduation. This difference may cause our estimates to be smaller if part of the return to attending a flagship university is increasing the likelihood of finishing. Furthermore, in the Hoekstra (2009) analysis, the counterfactual for those not attending the flagship is a mix of lower-ranked four-year attendance, community college attendance, and noncollege attendance. This counterfactual may cause the estimated earnings returns to be higher than if a counterfactual of nonflagship four-year universities is used. Similarly, in Brewer, Eide, and Ehrenberg (1999), the counterfactual is “bottom public” universities, which are likely to be of lower quality on average than the nonflagship public universities in Texas. This will serve to increase the estimated earnings returns.
↵25. Reynolds (2012) estimates a decline in earnings of about 5 percent due to community college relative to four-year attendance. Kane and Rouse (1995) find that community college and four-year credits are equally valued by the labor market but that community college students earn less than four-year students because they accumulate fewer college credits.
↵26. All appendix tables and figures referenced in this paper can be found online at http://jhr.uwpress.org/.
↵27. We also have split up the “other four-year” schools into two quality tiers, consisting of the schools vying for Tier 1 status in the top tier (Texas Tech University, The University of Houston, The University of North Texas, The University of Texas at Dallas, The University of Texas at Arlington, The University of Texas at San Antonio, and The University of Texas at El Paso) and all other schools in the lower tier. Estimates using the lower tier as a control group are very similar to those presented below, and thus our grouping of all nonflagship schools together is not masking important quality heterogeneity in this sector. These results are available from the authors upon request.
↵28. While there is no agriculture major at UT Austin, several nonflagship four-year schools have such a major in Texas, most notably the Texas A&M branch campuses.
↵29. Arcidiacono (2004) finds large differences in returns to different majors and argues that much of the ability sorting into majors is due to student preferences rather than perceived monetary returns. However, Arcidiacono, Hotz, and Kang (2012) and Wiswall and Zafar (2011) both find evidence from direct surveys of students about subjective beliefs regarding returns to and preferences for majors that expected returns are a driver of student preferences for majors. Controlling for initial field of interest thus might be more desirable, but unfortunately our data do not contain such information. Thus, although the estimates in Figure 5 may not represent causal estimates due to the endogeneity of college major choices, they still are interesting as descriptive evidence of the importance of college majors for the distribution of earnings differentials across college sectors.
↵30. It is unlikely that the estimates at the top of the distribution for community colleges are due to transferring. Recall that these estimates are for graduates, so any transfers to four-year schools who obtain a BA are counted as four-year graduates in our analysis. If some of the community college graduates transfer to a four-year school but do not obtain a BA, their earnings could increase. But, they still likely would earn less than comparable BA recipients, which is at odds with the equivalent earnings of nonflagship four-year and community college graduates at the top of the earnings distribution shown in Figure 3.
↵31. Note that controlling for majors in the community college estimates is difficult, because many community college degrees are in areas that are not offered in four-year schools. Thus, we do not estimate community college models with major controls as we do for the public flagships.
↵32. We also have generated propensity score distributions akin to Figure 2 separately by race and school type. These distributions show that there is common support between treatment and control across all school types and race and ethnic groups. These graphs are available from the authors upon request.
↵33. Note that the distribution of majors is remarkably similar across race and ethnic groups, such that differences in majors across groups are unlikely to account for the differences in estimated returns.
↵34. The estimates that control for first-year GPA also suggest that academic “mismatch” is not a primary driver of the shape of the return distributions we estimate. There is some evidence of mismatch in undergraduate education, particularly at elite schools (Arcidiacono and Lovenheim 2014). Such mismatch could plausibly be driving both the high returns at the top of the community college distribution and the low returns at the bottom of the UT-Austin distribution. However, the results in Table A7 are inconsistent with such a hypothesis.
↵35. Estimates are very similar when we assign students based on the first college attended after high school.
↵36. Online Appendix Table A6 presents estimates by race and ethnicity for the attendee sample. The same differences apply to these samples: Returns are higher at the lower end of the distribution among attendees at UT-Austin and Texas A&M. The only substantive difference comes from black UT-Austin attendees, who experience returns of over 10 percent below the 30th percentile.
↵37. In the National Educational Longitudinal Study of 1988 (NELS:88), only 20 percent of community college attendees earned an Associate’s Degree among those who attended within two years of high school graduation. In our sample, only 14 percent of attendees who started at a community college finished with any type of degree, be it from a community college or otherwise. The analogous completion rates for UT-Austin and TAMU are 77 and 82 percent, respectively.
↵38. We have generated estimates using all nongraduate school earnings for the attendee sample as well, and those estimates are similarly robust to the use of this alternative earnings measure.
- Received June 2014.
- Accepted October 2014.