## Abstract

We consider the effects of student ability, college quality, and the interaction between the two on academic outcomes and earnings, using data on two cohorts of college enrollees. Student ability and college quality strongly improve degree completion and earnings for all students. We find evidence of meaningful complementarity between student ability and college quality in degree completion at four years and in long-term earnings, but not in degree completion at six years or STEM degree completion. This complementarity implies some trade-off between equity and efficiency for policies that move lower-ability students to higher-quality colleges.

## I. Introduction

How students of varying ability sort into colleges of varying qualities has captured the attention not only of academic researchers studying higher education but also of the policy literature, the popular press, and the blogosphere. Until the last decade, the literature focused almost exclusively on outcomes for relatively low-ability students at high-quality colleges, particularly students admitted under racial and ethnic preference policies at selective colleges. More recently, high-ability students at relatively low-quality colleges have moved into the spotlight via the widely read studies by Bowen, Chingos, and McPherson (2009) and Roderick et al. (2008). Despite this recent focus on student–college sorting, the literature offers few credible estimates of the main object of interest: the interaction effects of student ability and college quality.

We ask whether the effects of attending a higher-quality college vary by student ability. Many studies conclude that increased college resources improve average student outcomes. These main effects of college quality should inform individual students’ preferences over colleges, but cannot speak to the efficiency of various approaches to sorting students into colleges. In the presence of differential effects, resorting students via policy, even when respecting existing capacity constraints, has the potential to produce gains or losses in both efficiency and equity. In contrast, if the effects of college quality do not vary by student ability, then resorting can yield only equity gains or losses. The applied theory literature on college sorting, such as Rothschild and White (1995) and Sallee, Resch, and Courant (2008), posits complementarities between student ability and college quality; casual discussion about “fit” often presumes them. Under this assumption, positive assortative matching maximizes output. We look for evidence of these complementarities or of alternative relationships. Knowledge of the heterogeneous effects (if any) of college quality has clear value to students and parents making decisions about college enrollment, as well as to researchers and policymakers concerned with the design, operation, and effects of state university systems with diversified quality portfolios.

Our analysis contributes to the small but growing literature on the effects of student–college sorting in a number of important ways. First, much of the literature frames the discussion of student–college sorting in terms of match, with relatively low-ability students at relatively high-quality colleges labeled “overmatched” and relatively high-ability students at relatively low-quality colleges labeled “undermatched.” While we retain these practical labels for the two types of deviations from full assortative matching, we clarify the conceptual distinction between the main effects of college quality and student ability and their interaction when interpreting outcomes for these groups of students.^{1}

Second, we present estimates from the 1979 and 1997 cohorts of the National Longitudinal Survey of Youth (hereinafter NLSY-79 and NLSY-97). By considering two different data sets, we can examine the stability of our estimates between cohorts of college students separated by more than two decades, during which post-secondary education in the United States changed in important ways. We code our outcome and conditioning variables and design our analyses in the same way for the two data sets in order to make our analysis across cohorts as compelling as possible. Our analysis also implicitly replicates (in a broad sense) and extends the earlier analyses of the college quality main effect in the NLSY-79 presented in Black, Daniel, and Smith (2005).

Third, we examine a variety of outcome measures. With a couple of important recent exceptions discussed in more detail below, the earlier literature focuses primarily on degree completion. The early finding of Bowen and Bok (1998) that there was no apparent impact on degree completion for overmatched students suggested to us that these students might find other ways to deal with better-prepared colleagues and a high-pressure environment. For example, they might follow the increasingly common path of increased time to degree, as highlighted in Bound, Lovenheim, and Turner (2010). Or they might follow scholarship athletes at some colleges in taking easy courses, as suggested in journalistic exposés, such as Steeg et al. (2008) and Thompson (2008). Or they might transfer to another school. Our examination of transfers, highlighted by Arcidiacono and Lovenheim (2016) as an understudied outcome, as well as of earnings in the years immediately following college enrollment, tells us more about the mechanisms through which college quality and ability affect educational and labor market outcomes. Our analysis of earnings up to 11 years after initial enrollment quantifies the medium-term labor market effects of college quality, ability, and their interaction. For the NLSY-79 cohort, we present estimates for earnings up to 30 years following college start, the first longer-term estimates of college quality effects that vary with student ability.

Fourth, following Black and Smith (2006) and our earlier analysis of the determinants of college quality choice in Dillon and Smith (2017), we use composite indexes as our measures of student ability and college quality. We expect these measures to embody substantially less measurement error than the single measures (for example, the student’s own SAT score and the average SAT score of the entering class) commonly used in the literature and thus to provide more accurate estimates.

Finally, we make an explicit case for our “selection on observed variables” identification strategy, which we view as credible in our context. Relative to the small set of existing studies of college sorting that use this identification strategy, we have a richer and more compelling set of relevant conditioning variables. Moreover, the literature provides strong evidence of the importance of factors likely conditionally unrelated to outcomes in driving college choice; these factors provide exogenous variation in college quality. Unlike earlier papers, we show that our estimates stabilize as we add marginal sets of conditioning variables.

We find substantial deviations from full positive assortative matching in both cohorts, with both high-ability students at low-quality colleges and low-ability students at high-quality colleges. These deviations modestly decline from the NLSY-79 to the NLSY-97. Our examination of the effects of ability, college quality, and their interaction reveals substantively strong and statistically significant main effects of college quality and student ability on degree completion and earnings. These marginal effects of college quality are always positive, even for relatively low-ability students, indicating that resorting policies like affirmative action likely pass a private benefit–cost test, on average, for the affected students. College quality matters relatively more, compared to ability, in the later cohort.

We find evidence of a causal effect of the interaction of quality and ability, but only for certain outcomes. We find clear evidence of complementarity between student ability and college quality in time to degree in both cohorts: abler students benefit relatively more from college quality for this outcome. We also find clear evidence of complementarity for long-run earnings outcomes in the NLSY-79 cohort; this pattern shows up in the point estimates for the later cohort as well. These patterns indicate an equity–efficiency trade-off associated with policies that increase the enrollment of (relatively) less able students at high-quality colleges.

Looking at mechanisms, we find some evidence that transfers tend to reduce gaps between student ability and college quality: undermatched students have a higher conditional probability of transferring up, though overmatched students do not have a similarly higher conditional probability of transferring down. Unlike Arcidiacono, Aucejo, and Hotz (2016), we find no interaction effects related to STEM (science, technology, engineering, and math) degree completion.

## II. Literature

Consideration of how the effects of college quality vary with student ability represents a natural extension of the broader literature on the causal effect of college quality on student outcomes.^{2} If students benefit from greater college resources, then we expect academic undermatch to be mechanically costly for students, even without any role for student–college interactions, because it implies attending a lower-quality college. Overmatch is likewise mechanically beneficial. We review the subset of the literature on college quality that explicitly considers heterogeneous effects of college quality that vary with student ability.^{3}

A few papers devote their full attention to the ways in which students with particular characteristics sort into colleges of different qualities. Our earlier work, Dillon and Smith (2017), uses the same NLSY-97 data we employ here and finds important roles for financial constraints in explaining college quality choices, as well as the in-state public college options available to the student. We show that more informed students and parents, proxied by variables such as parental education and the fraction of high school peers attending four-year colleges, act as though they believe that the main effect of college quality dominates any negative effects of academic overmatch. Smith, Pender, and Howell (2013) conduct a similar analysis using different data sets and a different definition of match. Reassuringly, they obtain similar findings. Lincove and Cortes (2019) use administrative data from Texas and find, among other interesting patterns, an important role for “social matching,” which they define as attending a college with a high share of students in one’s own racial or ethnic group. Hoxby and Avery (2012) and Hoxby and Turner (2015) emphasize the role of information about college choices in driving college quality decisions for high-achieving students from disadvantaged backgrounds, while Griffith and Rothstein (2009) highlight the role of geographic distance for all students.

The studies most similar to ours examine heterogeneous college quality effects using “selection on observed variables” identification strategies to deal with nonrandom selection of students into colleges of varying qualities.^{4} None of these studies find meaningful interaction effects, though several identify strong main effects of both student ability and college quality.^{5} Mattern, Shaw, and Kobrin (2010) use data from a large number of colleges and a relatively limited set of observed characteristics, with student and college quality both measured using SAT scores and discretized into quartiles. They study how the effect of college quality on first-year college GPA and persistence into the same college in the second year varies with student ability. The analysis in Chingos (2012) resembles our own in imposing capacity constraints but employs different data (the National Educational Longitudinal Study), less refined measures of student ability and college quality, a linear specification in college quality and student ability, and a less compelling set of conditioning variables. Black, Daniel, and Smith (2005) look at the simple interaction of student ability and college quality in the context of a parametric linear model of log wages applied to the NLSY-79 data. We compare our results to theirs in Section VI.G. Finally, Bowen, Chingos, and McPherson (2009) examine heterogeneous effects of college quality on college completion using impressive administrative data from various state university systems. They use relatively less refined selectivity categories for universities, a modest conditioning variable set, and high school GPA and/or ACT/SAT scores as their measure of student ability.^{6}

Light and Strayer (2000) also look at how the effect of college quality varies with student ability using the NLSY-79, but employ an empirical approach that differs substantially from ours. They consider two sequential choices. The first choice, which they model as a multinomial probit, consists of either not attending four-year college (which combines entering the labor force and attending a two-year college) or going to college in one of four ordered quality quartiles. The second choice, which they model as a probit, concerns college completion. To address the potential for nonrandom selection on unobserved variables, they allow correlated errors between the two choices. Identification comes from conditioning on observed variables, from some (arguably implausible) exclusion restrictions, from some restrictions on the coefficients on the interactions of student ability and college quality, and from restrictions on the covariance matrix of the errors in the college choice model. Their estimates reveal substantively important heterogeneous effects that imply worse (average) outcomes for some lower-ability students at higher-quality colleges; we say more about why their qualitative findings differ from ours in Section VI.G.

Another genre of studies focuses primarily on academic overmatch. Bowen and Bok (1998) find strong positive college quality effects on degree completion and earnings among black students attending the selective schools included in the “College and Beyond” data, which likely rule out negative net effects from overmatch within this group. Arcidiacono, Aucejo, and Spenner (2012) study Duke University, where the average African-American student starts out somewhat less prepared academically than other students, presumably due to affirmative action, but somewhat more likely to express a desire to major in the natural sciences, engineering, or economics. The authors find that black students at Duke differentially migrate away from these majors toward majors in the humanities or other social sciences and show that this pattern results almost entirely from their differential preparation. Put into our conceptual framework, their findings suggest that relatively overmatched black students at Duke adapt by changing majors within an institutional context where essentially all students finish their degree. Arcidiacono, Aucejo, and Hotz (2016) continue this line of work by examining students at different University of California (UC) campuses entering school between 1995 and 1997, years prior to the ban on affirmative action in admissions in that state. The authors provide compelling evidence that overmatched minority students at UC schools who intend a STEM major have lower probabilities of completing any degree at a UC school and of graduating in STEM.^{7}

A final group of papers uses “natural” experiments or discontinuities to isolate the experiences of students on the margin of admission to a higher-quality college. Hoekstra (2009) began this strand of the literature by using a discontinuity in student SAT scores (conditional on high school GPA) to examine the effects of admission to and enrollment in a state flagship university. He finds large positive effects of flagship acceptance and attendance on earnings 10–15 years after high school completion for men but surprisingly small effects for women. Kurlaender and Grodsky (2013) exploit an unusual event in 2004 in which the admissions offices of UC schools initially offered their marginal acceptances deferred admission due to budget issues but were later able to offer them regular admission. They find that marginal students accumulate fewer credits compared to similar students at lower-ranked UC schools, but have higher graduation probabilities. Thus, they find evidence of overmatch effects on the intensive course-taking margin, but the college quality effect dominates for degree completion. The estimates in this group of papers correspond to quite narrowly defined populations of marginal students. Moreover, these findings shed only indirect light on match. The nature and extent of overmatch for students at a college quality margin depends on the particular definition of match employed, as well as the quality of the student’s next best alternative and the homogeneity of student ability within each college. Still, at the very least, the evidence from these papers stands at odds with large, negative consequences of what we might call local overmatch.^{8}

Overall, we view the literature as providing strong evidence of causal effects of college quality and student ability on academic and labor market outcomes. In contrast, most but not all of the literature finds little in the way of interaction effects, other than on intermediate outcomes such as transfer and major choice.

## III. Data

### A. NLSY

We use the NLSY-79 data, which includes Americans who were ages 14–22 on January 1, 1979, and the NLSY-97 data, which includes Americans who were 12–16 years old as of December 31, 1996. In both cohorts, participants were interviewed annually starting in 1979 and 1997, respectively, and continuing through their college years. They have been interviewed biannually since 1994 and 2011, respectively. We include the representative samples from each survey along with the supplemental samples of blacks and Hispanics.^{9} Most respondents in the NLSY-79 sample graduated from high school and made their college choice between 1975 and 1983, while the NLSY-97 sample did the same between 1998 and 2002. We focus on students who enroll in a four-year college by age 21 (39 percent of high school graduates and GED holders in the NLSY-79 sample and 36 percent in the NLSY-97 sample).

One of the strengths of the NLSY data for both cohorts lies in the rich set of individual and family covariates it provides.^{10} Using the restricted access geocode data provides additional information on the identities of colleges attended and allows the use of contextual information based on the respondent’s residential location. The following sections describe our ability and college quality measures, as well as our outcome variables; Appendix Tables A1 and A2, along with Online Appendix Tables A1 and A2, describe the construction of our analysis sample and summarize our conditioning variables.

### B. Ability

We follow Dillon and Smith (2017) in designing our measures of student ability and college quality for the NLSY-97 sample and construct comparable measures for the earlier NLSY-79 cohort. Our measures of student ability draw on the Armed Services Vocational Aptitude Battery (ASVAB). In the 1997 cohort, 86 percent of respondents who started at a four-year college completed the test; the corresponding number for the 1979 cohort is 93 percent. We use the method developed by Altonji, Bharadwaj, and Lange (2012) to construct comparable measures of eight of the ASVAB test components common to the two cohorts, adjusting for the transition between 1979 and 1997 from pen-and-paper to computer adaptive testing and for the varying ages at which the respondents took the test. We do not use the scores on the purely vocational components.

We then construct the first two principal components of these eight section scores. Our primary measure of ability, which we call ASVAB1, equals each respondent’s percentile of the first principal component within the sample distribution of college-bound respondents in their NLSY cohort.^{11} As shown in Online Appendix Table A3, the first principal component explains 68 percent and 66 percent of the total variance in test scores across the eight sections for the 1979 and 1997 cohorts, respectively. In both cohorts, the first component places the highest weight on academic subjects, such as arithmetic reasoning and paragraph comprehension. Not surprisingly giving the loadings, the correlation between ASVAB1 and the respondent’s SAT (or rescaled ACT) score equals 0.79 in NLSY-79 and 0.80 in NLSY-97.

The second component of the ASVAB scores, which we call ASVAB2, explains a further 10–11 percent of the variance. As in Cawley, Heckman, and Vytlacil (2001), a similar analysis using the NLSY-79 data, the second component places the most weight on the two timed sections of the test: numerical operations and coding speed. We include ASVAB2 as an additional control variable in our multivariate analyses. To capture further dimensions of ability we also include high school GPA and SAT scores, along with multiple proxies for noncognitive or socioemotional skills.^{12}

### C. College Quality

We construct a one-dimensional index of college quality by combining measures related to selectivity and college resources. The available data limit us to using measures of inputs as proxies for quality, but we note that the value-added estimates of Hoxby (2015) correlate with one important component of our index, namely college selectivity. In particular, our index combines the mean SAT or ACT score of entering students, the percent of applicants rejected, the average salary of all faculty engaged in instruction, and the undergraduate faculty-to-student ratio. We combine data from the U.S. Department of Education’s Integrated Post-Secondary Education Data System (IPEDS) and *U.S. News and World Report*, using data from 1992 for the NLSY-79 and data from 2008 for the NLSY-97.^{13}

Following Black and Smith (2004, 2006) and Black, Daniel, and Smith (2005), we estimate the principal components across these four measures of quality. We use the eigenvector of the first principal component (reported in Online Appendix Table A4) to calculate a weighted average of the proxies available for each college, then calculate percentiles of this average across colleges.^{14} We interpret our index as an estimate of latent college quality, which we view as continuous and one-dimensional. Combining multiple proxies into a single index measures latent quality with less error than a single proxy. Our index reveals remarkable stability in college quality between our two cohorts; weighted by full-time undergraduates the correlation equals 0.86.^{15} Our measure does not capture differences in the quality that different students experience within the same university due to, for example, quality differences across fields of study or participation in honors programs. Our index also speaks only indirectly to absolute differences in college quality. In practice, the four individual quality proxies underlying our index increase modestly but steadily with the index for the bottom 90 percent of four-year colleges but more steeply for the top 10 percent of colleges. Figure 1 documents this pattern for expenditures per student, a measure we do not include in our index but that correlates strongly with its components. This very general scaling issue with latent indexes, emphasized in this literature by Bastedo and Flaster (2014), also applies to the two other most common proxies for college quality in the literature: the mean SAT score of the entering class and the Barron’s selectivity categories.

We analyze the quality of the first four-year college a student attends rather than the last, as in Black, Daniel, and Smith (2005) and some other studies. Our concern with how students initially sort into colleges motivates this choice because treating the quality of the first college as the choice variable allows us to treat subsequent transfer and completion choices, some of which may result from initial college choices, as intermediate outcomes on the way to earnings effects.

### D. Sorting Among Colleges by Student Ability

To assess the degree of sorting across colleges by student ability we consider the joint distributions of the student ability and college quality measures just described. As we weight the quality percentile by student body size, a college in the *n*th quality percentile is the college that a student in the *n*th ability percentile would attend under positive assortative academic matching.^{16} We label substantial deviations from this type of sorting as overmatch and undermatch. One appealing feature of our measures is the possibility of achieving full assortative matching without violating institutional enrollment constraints. The measures employed in other studies in the literature, such as Roderick et al. (2008); Bowen, Chingos, and McPherson (2009); and Smith, Pender, and Howell (2013) lack this feature.^{17}

Table 1 gives the joint distributions of student ability and college quality for the 1979 and 1997 cohorts, with both variables discretized into quartiles. In both cohorts, students differentially concentrate along the diagonal indicating positive assortative matching. The four diagonal cells account for 34.3 percent of students in 1979 and 37.2 percent in 1997, rather than the 25 percent implied by random sorting. The three upper right cells, corresponding to low-ability students at high-quality colleges, account for 11.3 percent of students in 1979 and 10.2 percent in 1997, while the three lower left cells, corresponding to high-ability students at low-quality colleges, account for 14.9 percent in 1979 and 13.2 percent in 1997.^{18} Thus, we find substantial departures from full assortative matching in both cohorts. Viewed longitudinally, our data (perhaps surprisingly given the recent policy focus on match) reveal only a small, though meaningful, increase in the correlation between student ability and college quality.^{19}

### E. Outcomes

We examine five educational outcomes: graduation within four or six years of starting, obtaining a STEM degree, and transfer to a higher- or lower-quality college. The NLSY-79 survey did not begin asking questions on the specific college attended until even the younger sample respondents were several years into college, making it difficult to follow transfer behavior in the earlier cohort.^{20} We therefore calculate the transfer outcomes only for the NLSY-97 cohort. We define graduation as completing a four-year degree at any college. We define STEM degree completion based on the last reported major(s) prior to graduation and code majors as STEM or non-STEM using the (uncontroversial) system in Arcidiacono, Aucejo, and Hotz (2016).^{21} Some restless students transfer more than once; we code our transfer variable based on the first observed transfer and only count transfers that involve a change of at least five percentiles (up or down) in our college quality index. Transfers from any four-year college to any two-year college always count as a transfer down.

On the labor market side, in the spirit of the program evaluation literature, we examine the level of real (USD 2010) earnings (rather than the log) in all years from the start of college, without conditioning on degree completion.^{22} We look relative to the start of college rather than the end because we want to capture the opportunity cost of college, because college quality may (heterogeneously) affect the probability of working while in school, and because we want to capture effects on time to degree. We average earnings in two-year intervals, using observed earnings for one year when the value for the other equals zero or missing. We omit two-year intervals if the respondent did not report nonzero earnings in either year. This pooling reduces variance at minimal cost to sample size and temporal fineness, as nearly everyone in our sample of four-year college attendees works in almost every year. Comparisons with the information on job spells suggest that a nontrivial fraction of the zeros represent measurement error.

Table 2 summarizes these outcomes for our sample. The 1997 cohort has a higher graduation rate, consistent with the pattern documented in Archibald, Feldman, and McHenry (2015) that U.S. graduation rates reached a nadir for students starting college in the mid-1980s and have recovered since then. The probability of graduating with a STEM degree has fallen a bit, from 15 percent to 13 percent. Consistent with the somewhat earlier cohort studied by Goldrick-Rab (2006) and with the Texas cohort in Andrews, Li, and Lovenheim (2014), we find a great deal of transfer behavior among the NLSY-97 students; 27 percent transfer at least once. Earnings increase both over the life cycle within cohorts and between the 1979 and 1997 cohorts.

## IV. Econometric Framework

To determine whether the data provide evidence of important interactions between ability and college quality, we want to look flexibly at the conditional relationship between these two variables and the outcomes of interest. Several econometric frameworks comport with this goal. This section describes two: our preferred estimator based on a flexible polynomial approximation and an alternative estimator that uses indicators for bins of the discretized joint distribution of ability and college quality.

For binary outcomes, we estimate probit models. In our preferred specification, we estimate the conditional probability function as: 1

In Equation 1, *Y* denotes the binary outcome of interest, *A* denotes student ability, *Q* denotes college quality, β_{p}(*A _{i}, Q_{i}*) denotes a flexible polynomial of ability and quality, and

*X*denotes a vector of other conditioning variables. For earnings, we estimate a parametric linear regression model using the same specification by ordinary least squares. In both cases, 2

We chose this specification after a rigorous round of statistical testing.^{23} The polynomial in ability and quality becomes nonparametric once we promise to include additional higher-order terms as our sample size increases. Equation 1 then becomes a partially linear model in which we nonparametrically estimate the effects of ability and quality while conditioning parametrically on the other variables.

Polynomial approximations sometimes mislead, especially around the edges of the data. As a sensitivity check, we implement a different semiparametric framework that includes indicators for combinations of college quality quartile and student ability quartile. We include indicators for 15 of the 16 possible combinations, with ability and quality both in the lowest quartile serving as the omitted category. This approach avoids the often observed instability of higher-order polynomials away from the center of the data but cannot capture any within-quartile variation. In practice, the two estimators tell the same substantive story; see Online Appendix Table A5 for the estimates from the second approach for a subset of our outcomes.

The NLSY surveys include several measures of respondents’ cognitive skills. For ease of interpretation we interact only ASVAB1 with college quality. We therefore want to concentrate the effects of any common component of ability in this variable. To accomplish this, we orthogonalize the SAT score and GPA variables against ASVAB1 prior to including them in the multivariate analyses.^{24}

## V. Identification

This section considers the case for interpreting our estimates as causal. We argue that we have a sufficiently rich conditioning set such that the remaining variation in college quality that serves to identify our effects is uncorrelated with the error term in the outcome equation. Our identification strategy captures causal effects if two conditions hold. First, we need the observed covariates included in our model to capture, either directly or as proxies, all the factors that affect both the college quality choice and the outcomes we study. Second, in order to avoid identification via functional form, we need some conditionally exogenous variation in college quality choices. Put differently, we need instrumental variables to exist, even though we do not observe them, as they produce the conditional variation in college quality we implicitly use in our estimation.

We divide our conditioning variables into four sets, each of which proxy for one broad factor affecting educational choices: precollege skill, student demographic and family characteristics, neighborhood characteristics as of the first survey, and other social factors. We list these variables and describe their construction in detail in Appendix Table A2. Our preferred specification includes the first three sets of covariates; the fourth set provides a test, described below, for our identification strategy. We never condition on whether the student remains enrolled in college each year or whether they have completed a degree, which we view as intermediate outcomes.

We make the case that our conditioning set suffices to solve the problem of nonrandom selection into colleges in two ways. First, we can think about whether our conditioning set contains those things (or compelling proxies for those things) that existing theory and empirical evidence deem important. Much recent literature, for example, Heckman and Kautz (2012), emphasizes the importance of noncognitive skills for educational and labor market outcomes. The broader literature, including our own earlier study, illustrates the need to condition on family resources, both intellectual and financial. More money makes many things about college easier, including longer time-to-degree, more frequent visits home, and not having to work during school, and so affects outcomes—it also surely affects the college quality choice. Parental education will correlate with their knowledge of the college choice process and of how to succeed at college in both the institutional and academic senses. Parental education also likely correlates with taste for education and otherwise unobserved features of the student’s childhood environment that affect both outcomes and college choice. Becker and Lewis (1973) highlight a quality–quantity trade-off for parents, so number of siblings may reflect both resources and preferences. We expect that our county education variable will both help with measurement error in the direct parental resource variables and proxy for primary and secondary school quality, as well as peer pressure and expectations.

The second way to think about our covariate set asks whether the marginal covariates make any difference to the estimates. In Heckman and Navarro’s (2004) framework there exist multiple unobserved factors on which we need to condition. As we increase the number of proxy variables in our conditioning set, the amount of selection bias in our estimates should decrease to zero, so long as we keep adding proxies for all factors. Turning this around, if we observe that the estimates stabilize as we increase the richness of the conditioning set, this suggests we are doing a good job of proxying for the unobserved factors, unless there exists an additional unobserved factor uncorrelated with all of our covariates. Oster (2019) cautions that a finding of coefficient stability means little if the newly added variables do not capture any conditional variation in the dependent variable. We perform such analyses by adding sets of related variables to the conditioning set in an order that reflects our prior about their importance for solving the problem of nonrandom selection into colleges of different qualities.

The literature suggests that plenty of exogenous variation exists in college quality choices conditional on our observed covariates. First, differences in state college quality mix, admission policies, and pricing strategies provide plausibly exogenous variation in the budget sets facing students and their parents. Niu and Tienda (2010) and Daugherty, Martorell, and McFarlin (2014) estimate large effects of guaranteed admission to instate public colleges through the Texas Top 10 percent rule on college choices for eligible students. Cohodes and Goodman (2014) estimate similarly large effects of in-state-specific scholarships in Massachusetts, lowering the average quality of college attended for eligible students who chose in-state colleges over more selective outside options. Second, distances to colleges of various qualities provide variation in the costs of attendance, as in Card (1995) and Currie and Moretti (2003). Third, what normally represents a sad feature of this literature, namely the consistent finding that many students, parents, and high school guidance counselors have little idea about how to choose a college, provides support for our identification strategy. Hoxby and Avery (2012) and Hoxby and Turner (2013) show the difference a small amount of reliable information can make for many students. Similarly, the literature provides many examples of small behavioral economics tricks having nontrivial effects on college choices. Pallais (2015) finds that you can change college choices by changing the number of colleges to which students can send their ACT scores for free, and Bettinger et al. (2012) find that having H&R Block help with the federal financial aid form can have real effects on collegegoing. Scott-Clayton (2012) reviews the literature showing that students and parents often know very little about the likely costs and benefits of college. Finally, both the descriptive and ethnographic literature, such as Roderick et al. (2008), and the quantitative literature on sorting, such as Lincove and Cortes (2019), suggest that many students choose among colleges for reasons unrelated to academic quality, such as the football team or the presence of high school friends. While the value students place on these nonacademic traits may well *unconditionally* correlate with outcomes, we expect that variation among students in the nature and extent of the trade-off between academic and nonacademic aspects of colleges that they face produce useful, and random, *conditional* variation in college quality choice.

Two main issues motivate concerns about selection bias in estimates of the college quality main effect. First, students, their parents, and college admissions officers may have access to information on student ability that we, the researchers, do not. To the extent that those unobserved factors affect admissions, we would expect an upward bias in the estimated effect of college quality because it proxies in part for higher unobserved student ability or ambition (and we might expect this bias primarily at the upper end of the college quality distribution, where “holistic” rather than rule-based admissions dominate). Second, we might worry about measurement error in college quality, as in Black and Smith (2006). Though our use of a quality index based on multiple proxies addresses this issue, some measurement error surely remains, which we expect will push the estimated effect toward zero. Of course, we have no basis for arguing that these two biases cancel out in practice.

Now consider the interaction of college quality and student ability. If we overstate the effect of a high-quality college for all students, then overmatched students will look better than they should relative to other students of the same ability. Similarly, undermatched students will look relatively worse than they should. Thus, upward bias in the estimated effect of college quality should lead us to understate the effects of overmatch and to overstate the effects of undermatch, and so potentially to overstate the degree of complementarity between student ability and college quality. Measurement error in ability and/or in college quality, in contrast, should attenuate our estimates of the effects of both overmatch and undermatch. Indeed, Griliches and Ringstad (1970) highlight the particularly pernicious effects of measurement error in nonlinear contexts, such as interactions.

## VI. Effects of College Quality and Ability on College Outcomes and Earnings

### A. Graduation Rates

Table 3 presents our estimates of Equation 1 for degree completion within four and six years for both cohorts. The first three rows of estimates report the mean marginal effect of ability percentile at different points in the college quality distribution, constructed from our estimates of the flexible polynomial of ability and quality percentiles. The second three rows report the mean marginal effect of college quality at different points in the ability distribution. We scale both *A* and *Q* to lie in [0, 1].

Our first key finding consists of substantively meaningful and statistically significant main effects of both college quality and student ability on graduation within six years of starting college. For example, for a student in the NLSY-97 cohort attending a college at the 25th percentile of the quality distribution, each 10 percentile increase in a student’s ability increases the (conditional) probability of graduating within six years by 3.10 (0.1 × 100 × 0.310) percentage points. Along the same lines, for a college starter in the NLSY-97 cohort of median ability, increasing the quality of the first college attended by 10 percentiles increases the probability of graduating within six years by 3.60 percentage points. The magnitudes of our estimates fit comfortably within the existing literature. While strong main effects of student ability and college quality emerge for both cohorts, the relative importance of college quality increases noticeably for the later cohort.

If ability and quality have only independent effects, then we would expect a uniform effect of college quality across students of different ability levels. Alternatively, the effect of quality could vary with student ability. For example, college quality might increase degree completion probabilities more for students lower in the ability distribution. Our second key finding is that the effect of college quality on graduation varies very little with student ability. Likewise, the effect of student ability is quite steady at different points in the college quality distribution. Figure 2 plots the average predicted six-year graduation probability at each percentile of college quality. It shows that at the 25th, 50th and 75th percentiles of the ability distribution the probability of graduating within six years increases almost linearly in college quality percentile for the NLSY-79 cohort. The NLSY-97 cohort shows some evidence of complementarity: the probability of graduating within six years increases more slowly with college quality above the 60th percentile of colleges for students in the 25th percentile of ability.

We can quantify the evidence for heterogeneous effects in our college completion results in two ways. First, because our model nests a model with only main effects of college quality and ability, we can test the restriction that all coefficients on the interactions of ability and college quality jointly equal zero. The *p*-values from these tests appear in the third row in the bottom panel of Table 3. The *p*-values of 0.554 for the NLSY-97 cohort and 0.363 for the NLSY-79 cohort indicate that the restrictions implicit in the main-effects-only model cause little trouble for the data. Alternative statistical tests consider the null of equal average derivatives with respect to student ability and with respect to college quality, at the 25th, 50th and 75th percentiles of each. The *p*-values for these tests appear in the last two rows of Table 3; they comport with our interpretation based on the magnitudes above.

Second, we can look to Table 4, Panels A and B, which compare the observed completion rate with the completion rate implied by our model in a counterfactual world of full positive assortative matching. We obtain this value by predicting degree completion for every observation with their college quality percentile recoded to equal their ability percentile. On the basis of our model, we find that degree completion rises less than one percentage point, moving from 59.9 percent to 60.4 percent for the younger cohort and from 49.7 percent to 50.3 percent for the older one. The negative effect of moving lower-ability students away from high-quality colleges to their matched quality level almost entirely cancels out the positive effect of moving higher-ability students out of low-quality colleges. Chingos (2012) performs a similar calculation and also finds virtually no effect of resorting students.

The net effect of moving to full assortative matching masks large improvements in outcomes from moving some students to higher-quality colleges. The last columns of Table 4, Panels A and B, present a second counterfactual in which we ignore capacity constraints (and general equilibrium considerations) and assume that all students attend a college in the 90th percentile of college quality. Our model predicts that moving all students to a high-quality college would increase degree attainment within six years by 9.1 percentage points to 58.8 percent for the older cohort and by 10.6 percentage points to 70.5 percent for the younger cohort. This increase might seem smaller than expected, but student characteristics matter as well and differ strongly between students presently at the 90th percentile of college quality and those further down the distribution.

Now consider the results for graduation in four years, a standard that remains normative but has become increasingly aspirational for many students, as documented in, for example, Bound, Lovenheim, and Turner (2010). The average derivative estimates for completion in four years resemble those for completion in six years in sign, all positive, but differ in showing a clear and substantively meaningful pattern of complementarity between student ability and college quality, one that gets stronger for the NLSY-97 cohort. We can reject the nulls of equal mean derivatives for the 1997 cohort. In parallel (and substantively similar) results from linear probability models presented in Online Appendix Table A6, we strongly reject the null of zero coefficients on the interaction terms and the nulls of equal average derivatives with respect to ability and college quality for the 1997 cohort (and come closer to rejection of these nulls in the 1979 cohort). Overall, we find serious evidence of complementarities between ability and quality for on-time degree completion. In addition, the difference between the two cohorts in the relative importance of student ability and college quality, with college quality playing a smaller role for the older cohort, remains striking.

### B. Intermediate Educational Outcomes

To shed light on the mechanisms underlying our findings on completion rates, we consider the effects of student ability and college quality on some intermediate college outcomes. Students might react to large gaps between their ability and the quality of their college by changing their major, as in Arcidiacono, Aucejo, and Hotz (2016), who argue that some overmatched students switch from STEM majors to other, less challenging majors. A change of major could delay graduation, thereby lowering four-year, but not six-year, completion rates. Table 5 presents our estimates for STEM degree completion. For both cohorts, we find substantively and statistically significant effects of student ability for all levels of college quality, with only modest differences between the effects at the 25th, 50th, and 75th percentiles. We find substantively small effects, not statistically different from zero, of college quality at all levels of student ability. The *p*-values for the null of zero interaction equal 0.791 and 0.708 for the NLSY-97 and NLSY-79 cohorts. Our estimates predict that full positive assortative matching would change STEM degree completion by less than 0.6 percentage points in either cohort.

Students may react to learning they have made a poor initial choice by transferring to another school. Again, this midcourse adjustment could delay graduation beyond four years. We find some evidence consistent with ability–quality complementarity when looking at transfer behavior in the NLSY-97 cohort. The third column of estimates in Table 5 corresponds to Equation 1, with transfer up as the dependent variable, while the fourth column of estimates corresponds to transfer down.^{25} Increasing a student’s ability percentile by 10 percentage points raises the probability that she will transfer to a higher-quality college by 1.4 percentage points if she starts at a 25th percentile college. In contrast, student ability has virtually no effect on the probability of transferring to a higher-quality college if the student starts at a 75th percentile college. The second three rows show an expected pattern: increasing the quality of the first college a student attends lowers the probability that she will transfer to an even higher-quality college, with a larger effect for students higher in the ability distribution. The pattern of derivatives with respect to ability reflects students preferentially transferring toward assortative matching, while the pattern of derivatives with respect to quality is partly mechanical.

We see the reverse patterns when considering transfers to lower-quality colleges. More able students transfer to a lower-quality college less often, though only modestly and imprecisely so. Increasing the quality of the first college attended raises the probability that students will transfer down; these effects differ statistically from zero but not much by the level of student ability. Taken together, these transfer results provide some support for ability–quality complementarity, along with a strong (and again partly mechanical) main effect of college quality.

We cannot reject the null of only ability and quality main effects on transfer behavior. As shown at the bottom of Table 5, the *p*-values equal 0.384 and 0.328 for transferring to a higher- and lower-quality college, respectively. In Table 4, Panel B, we predict that full positive assortative matching would modestly decrease transfers to a higher-quality college and modestly increase transfers to lower-quality colleges. This counterfactual sorting mechanism substantially decreases the transfer probability for students who strongly undermatch at their first college, but such students constitute only a small fraction of the total. Since transfers often delay graduation, these moves have real costs for students (and often for the taxpayer) in terms of more time in school and less time in the labor force.

### C. Earnings

Table 6 presents our estimates for the effects of ability and college quality on average annual earnings during and after college. In Years 2–3, both college quality and student ability have generally negative effects on annual average earnings in both cohorts. In the NLSY-79, two to three years after starting college, a student at the 50th percentile of ability earns $208 less per year for each 10 percentile point increase in the quality of first college attended. Students at lower-quality colleges are more likely to have left without a degree and begun working full time two to three years after starting college. Higher-quality colleges may also require greater effort to keep up with course work, limiting the time students have to work while still in college. Finally, near the top of the college quality distribution, marginal increases in college quality may give students access to more financial aid and reduce their need to work during college. The negative relationship between ability and earnings likely reflects a similar short-run trade-off between current earnings and investment in skill accumulation, as well as access to more merit-based financial aid.

At 10–11 years after students begin college these patterns have completely reversed: both college quality and student ability strongly raise average annual earnings. For a student of median ability in the NLSY-97 cohort, each 10 percentile point increase in the quality of the first college is associated with an additional $1,480 of annual earnings. In keeping with the completion rate estimates, we find much larger ability effects in the NLSY-79 cohort and smaller college quality effects. For example, at the median of college quality, a 10 percentile point increase in student ability increases earnings at 10–11 years by $915 in the NLSY-79, compared to just $417 for the later cohort. While our average derivative estimates have large standard errors, we find persuasive evidence that college quality increases future earnings throughout the ability distribution.^{26}

As with degree completion in four years, the estimates for earnings 10–11 years after college start suggest a substantively important complementarity between college quality and student ability, particularly for the NLSY-97 cohort. The average derivative of earnings with respect to student ability has a much larger value, around $127 per percentile point for the NLSY-97, for students at the 75th percentile of college quality than for those at the 25th percentile or at the median. Similarly, the average derivative of earnings with respect to college quality increases with student ability, from about $104 per percentile point at the 25th percentile of ability to about $186 per percentile point at the 75th percentile of ability. Still, we cannot reject the nulls of no interaction effects (as well as other nulls involving much larger interaction effects) or of equal derivatives.

Table 4 shows that moving to full positive assortative matching in the NLSY-97 cohort would increase mean earnings by about $1,328 10–11 years after beginning college. The corresponding change for the NLSY-79 cohort is $694. We do not emphasize these point estimates given how far this scenario projects outside the data and given the likely importance of equilibrium effects of uncertain direction and magnitude (including the fact that resorting the students would change the quality of all of the colleges as we measure it). The data provide no evidence of any harmful individual effects for low-ability students who attend higher-quality colleges. At the same time, our estimates suggest that policies that place some students with lower ability at top colleges do impose some efficiency costs due to the complementarity between student ability and college quality.

The NLSY-79 cohort, now into their fifties, allows us to examine earnings outcomes for several decades after college start. These results appear in the last columns of Table 6, Panel A and in Figure 3. The data provide large, positive, and generally statistically significant estimates of the main effects of student ability and college quality at all durations from 10–31 years after college start. Even at long durations we can clearly rule out negative average effects of overmatch. We can also rule out standard models in which college quality simply signals student ability, as employer learning would surely have overwhelmed college quality effects over the horizons we consider.^{27}

In general, the average derivatives get larger as the time elapsed from college start increases, sometimes quite substantially so. The average derivative with respect to quality for students at the 75th percentile of ability increases from $724 for an increase in quality of 10 percentile points at 10–11 years to $2,527 at 20–21 years and to $4,048 at 30–31 years. As our data embody only one cohort, we have no way of separating these increases into components due to age and period effects. Additionally, a pattern consistent with complementarity between student ability and college quality, which was fairly weak in the estimates of earnings 10–11 years after college start for the NLSY-79 cohort, appears quite strongly in the longer-term followup estimates. As shown in the bottom row of the table, for earnings 20–21 and 30–31 years out we can reject the null of equal average derivatives with respect to college quality at different levels of student ability. Finally, we remind the reader of our imprecise estimates and the gentle decline in the sample size as individuals gradually attrit from the panel.^{28}

### D. Subgroups

We consider subgroups defined by sex and by parental education, where we partition the latter into “low” and “high” subgroups on the basis of whether at least one parent attended college. We interpret parental education as a proxy for several things, including tastes for college (and college quality) and family resources. In the NLSY-79 cohort nearly one-half of college entrants have parents with no more than a high school education, but by the NLSY-97 cohort only a quarter of college entrants have parents with no college education. We lack the sample size to usefully examine finer categories. Similarly, though of great substantive interest, we lack the sample sizes to present meaningful subgroup estimates for black and Hispanic students.

Tables 7 and 8 report the effects of student ability and college quality on earnings 10–11 years after starting college separately by subgroups.^{29} To limit the demands on our relatively small sample, we estimate these effects by interacting the ability-quality polynomials with subgroup indicators and continuing to estimate pooled coefficients on the other covariates. The main finding from the pooled estimates, positive effects of both student ability and college quality at all levels, generally holds for all subgroups, with more volatility in point estimates and predictably larger standard errors. In both cohorts, the pattern of complementarity between student ability and college quality is most apparent for the children of more educated parents, though the average effect of student ability is larger for the children of less educated parents. We mostly find larger effects of ability for female students than for male students, and much larger in the NLSY-79 cohort. However, the two main differences between the two cohorts hold for both men and women: student ability plays a relatively larger role in determining earnings in the earlier NLSY-79 cohort, and the younger cohort displays more evidence of complementarity between ability and college quality in degree completion (the latter results not shown).

### E. Identification

We now consider some evidence regarding our identification strategy. Tables 9 and 10 present estimates based on increasingly rich sets of conditioning variables for our two most important outcomes: degree attainment within six years and earnings in Years 10–11 after starting college. The lower rows of each table indicate the set of included conditioning variables; the categories correspond to those in Appendix Table A2. The estimates in Column 4 of each table correspond to those in Tables 3 and 6.

Overall, the tables reveal a substantial amount of movement in the coefficients when moving from Column 1 to Column 2 by adding additional measures of ability and socio-emotional skills, and when moving from Column 2 to Column 3, which corresponds to adding demographics and family characteristics.^{30} We see somewhat less movement (how much less varies across outcomes and across derivatives) when we add neighborhood characteristics in Column 4. Finally, and in parallel to the similar analysis in the Black, Daniel, and Smith (2005) study of college quality, moving from Column 4 to Column 5 changes the estimates very little. With each transition, including the last, the *r*-squared values meaningfully increase. These findings support a causal interpretation of our estimates or, at least, suggest that any remaining biases would not overturn our qualitative conclusions.

### F. Comparing the NLSY-79 and NLSY-97 Results

Our big picture stories apply to both cohorts: large amounts of overmatch and undermatch in the unconditional joint distribution of student ability and college quality, substantively and statistically significant positive main effects of student ability and college quality for college completion and earnings in the medium and long terms, some evidence of heterogeneous college quality effects in time to degree, and little or no evidence of heterogeneous effects for degree completion or for STEM degree completion. This stability surprised us somewhat. In this section, we briefly remark on two specific differences in the results: (i) the modest but not trivial shift towards positive assortative matching between cohorts and (ii) the relatively smaller role of college quality in determining outcomes for the NLSY-79.

As we noted in Section III.D, the changing sorting patterns between the cohorts comports with some other evidence in the literature. Following Hoxby (2009), we suspect that it results from ongoing reductions in the cost that students (especially high-ability students) face in obtaining information about admissions criteria, real (as opposed to posted) prices, and optimal strategy. Reductions in transportation and communication costs likely also play a role.

One reason why college quality may have smaller measured effects on outcomes in the NLSY-79 cohort is that colleges varied less in our input-based measures of quality for this earlier cohort. As shown in Figure 1, each percentile increase in our college quality index corresponds to a somewhat smaller change in expenditure per student in 1992 than in 2008. We may also suffer from attenuation bias due to measurement error in matching students to their first college attended. As noted in Section III.C, the NLSY-79 did not ask students the name of the college(s) they attended until 1984, part way through most students’ college enrollment. We use the first reported college, which we suspect functions as an excellent proxy for first college attended, but memory lapses and transfers could yield some mismeasurement.

### G. Comparisons with Other Studies

Two earlier published papers, Light and Strayer (2000) and Black, Daniel, and Smith (2005) estimate college quality effects interacted with ability using the NLSY-79 data. The estimates from the Light and Strayer probit model of degree attainment (they do not look at earnings) that appear in the two right-most columns of their Table 8 correspond most closely to our own.^{31} These estimates assume, implicitly, selection on observed variables; that is, they frame them as a sensitivity analysis in which they shut down their apparatus for dealing with selection on unobserved variables. Unlike us, they find that ability does not always increase degree attainment across all college quality quartiles, nor does college quality monotonically increase degree completion across all ability quartiles.^{32} In the latter case their estimates support the view that overmatch could make some students worse off. Several differences between the Light and Strayer setup and our own strike us as potential candidates to account for the difference in findings: (i) they treat transfers as dropouts; (ii) they restrict some of the interactions between ability quartile and college quality quartile to have zero coefficients on a priori grounds; (ii) they condition on variables that we think plausibly endogenous, namely living at home, receipt of financial aid, and actual tuition paid; and (iv) their remaining covariate set represents (in essence) a modest subset of our own, which raises the possibility of residual selection bias not present in our analysis.^{33}

The analysis in Black, Daniel, and Smith (2005), not surprisingly given the authorial overlap, differs less from our own. Qualitatively, we reach similar conclusions. While Black, Daniel, and Smith (2005) examine interactions only for their log wage outcome, and not for degree completion, they find strong main effects of both student ability and college quality for both degree attainment and log wages. Their Appendix Table 7 presents estimates from a parametric linear model with hourly wages as the dependent variable and a rich covariate set similar to our own (other than in its inclusion of years of schooling), along with main effects in ability and college quality and interactions between college quality and their versions of ASVAB1 and ASVAB2. They offer separate estimates for men and women; for both groups and both interactions they obtain estimated coefficients near zero and far from statistical significance.^{34}

As far as we know, just three other studies consider the persistence of the earnings effects of college quality over a very long interval after college start, but none of these studies consider how the effects vary with student ability. Turner (2002) considers earnings of men in the Panel Study of Income Dynamics (PSID) who completed a BA by 1975. She finds large and growing effects of college quality from 1975 to 1992 and provides suggestive evidence that the increases primarily represent period effects rather than lifecycle effects. The average SAT score of entering students proxies for college quality. She notes that the lack of a compelling proxy for student ability in the PSID hampers a causal interpretation of her estimates, which rely on a “selection on observed variables” identification strategy (see her Note 13). Figure 1 of Black, Daniel, and Smith (2005) shows impacts on log wages for the NLSY-79 for 1987–1998, about 15 years after starting college. They find persistent, stable, and substantively and statistically meaningful effects of college quality for both men and women using a selection on observed variables identification strategy (and conditioning on years of schooling).

Dale and Krueger (2014) link social security earnings data to two cohorts of the “College and Beyond” data set that includes students entering a nonrandom sample of relatively high-quality colleges in the fall of 1976 and 1989. They present estimates of log earnings impacts through 2007, or 31 years after college enrollment for the older cohort and 18 years after for the younger one.^{35} Dale and Krueger (2014) present estimates using two identification strategies. The first assumes “selection on observed variables” in the context of a covariate set that includes standardized test scores as well as a rich (but not as rich as ours) array of other relevant variables. Their second identification strategy attempts to deal with any remaining selection on unobserved variables by conditioning on the average SAT score of the schools to which each student applied; they call this their “self-revelation” model.^{36} The first identification strategy yields persistent and sizeable effects of college quality on later earnings for all groups. In marked contrast, the “self-revelation model” estimates reveal such impacts only for black and Hispanic students and those from disadvantaged family backgrounds.

Finally, in a broad sense our results coincide with the descriptive analysis presented in Chetty et al. (2017), who use U.S. income tax data linked across generations and find that within college quality tiers, average child income varies only very modestly with parental income. This pattern conflicts with strong negative effects of overmatch as students with relatively low income parents within a quality tier will also have relatively lower average ability as we define it.

## VII. Summary and Conclusions

This paper examines the effects of college quality and student ability on academic and labor market outcomes for two cohorts of college goers using the NLSY-79 and NLSY-97 data sets. We adopt a “selection on observed variables” identification strategy in both cases and do our best to ensure comparability in coding and conditioning. In both cohorts, we find strong evidence that college quality and student ability increase the probability of degree completion and later earnings. The relative importance of college quality increases in the later cohort; this finding parallels that of Castex and Dechter (2014), who document a similar change in the relative importance of ability and years of schooling as determinants of wages in the two NLSY cohorts.

For college students and their families, our most salient conclusion is that increasing college quality increases graduation rates and earnings at all points in the ability distribution.^{37} At the margin, all students will benefit in expectation from attending higher-quality colleges. In Dillon and Smith (2017) we find that well-informed and well-resourced students seek to attend higher-quality colleges, even if they will be overmatched at these institutions. Our current work validates this unconditional pursuit of college quality. Policies targeted at increasing the representation of certain groups of students at high-quality colleges will (on average) benefit the targeted students, but may do little to improve overall outcomes if the total number of seats at high-quality colleges remains unchanged.

The simple and compelling applied theory models in Rothschild and White (1995) and Sallee, Resch, and Courant (2008) posit complementarities between student ability and college quality, which can justify the observed long-term increase in positive assortative matching described in, for example, Hoxby (2009). We find modest but substantively important support for these theories. We can reject uniform effects of college quality across the ability distribution for some but not all of the outcomes we examine. The effects of college quality vary with student ability in the production of graduation in four years (but not six), particularly for the NLSY-97 cohort, for transfers, and for long-term (but not immediately post-college) earnings. The interaction effects we find do not overwhelm the uniformly positive main effects. These results suggest modest efficiency gains from better sorting of the strongest students into the top colleges and some efficiency costs to policies that weaken this sorting.

Less prepared students appear to adjust to the demands of higher-quality colleges by slowing their studies, leading to smaller gains in four-year graduation rates than their higher-ability classmates but similar gains in six-year graduation rates. In contrast to some other papers, we do not find similar evidence of adjustment by shifting out of STEM majors. One interpretation of our findings on earnings is that the networking and recruiting benefits of attending a higher-quality college benefit all students in their first job, but higher-ability students build more successfully on these early gains as they move through their careers, perhaps because of greater skill acquisition in college.

We conclude with five caveats. First, we interpret our estimates in partial rather than general equilibrium terms; as such, they apply primarily to moving around small numbers of students. Second, we pay for the plausibility of the conditional independence assumption with modest sample sizes. Particularly in the context of high variance outcomes such as earnings, some of the patterns we find show up more clearly in the estimates than in the statistical tests.

Third, measurement error remains a concern in multiple senses. While using multiple proxies for student ability and college quality reduces measurement error, it does not eliminate it. Less trivially, we know that individual students at larger colleges experience very different parts of what their institutions have to offer—for example, faculty research and teaching quality may differ across departments. Thus, even if our quality measure does well at capturing the average quality of a college, it may embody substantial measurement error at the student level at which our analysis operates.

Fourth, we consider only undergraduates at four-year colleges. Our results may not generalize to contexts, such as law schools, that provide students with fewer dimensions on which to respond to an environment that proves too challenging or not challenging enough. In law school, for example, students cannot easily change majors or take fewer courses. For this reason, overmatch might have very different overall effects in these contexts than in ours.

Finally, this paper examines only interactions between one characteristic of students, namely their academic ability, and college quality. As noted in Smith (2008), sorting based on other student and college characteristics represents an important omission from most of the literature. Perhaps the most obvious concerns matching on social class or socioeconomic status, or what an economist might prefer to call (at the cost of losing some nuance in interpretation) family resources. Recent scholarly books such as *Paying for the Party* (Armstrong and Hamilton 2013) and *Top Student, Top School?* (Radford 2013) highlight this dimension of sorting, as does Tom Wolfe (2004) in his novel of college life entitled *I Am Charlotte Simmons*. Because other student characteristics, such as social class, correlate with academic ability, they represent a potentially confounding treatment in our context.

## Footnotes

The authors thank Dan Black, Hoyt Bleakley, John Bound, David Deming, Sue Dynarski, Jose Galdo, Josh Goodman, Audrey Light, Lois Miller, Peter Mueser, Sarah Turner, Ophira Vishkin, and MartinWest for helpful comments, along with seminar participants at Aarhus University, the 2014 Bergen-Stavanger Workshop on Labour Markets, Families and Children, the 2013 CESifo education group meetings, Bristol, Chicago (family economics), Cleveland Fed, Colorado, Cornell PAM, Dartmouth, Guelph, HEC, Institute for Fiscal Studies, MDRC, Michigan CIERS, Mannheim, National University of Singapore, Ohio State, the 2017 Ottawa-Carlton Graduate School in Economics Launch Conference, Penn GSE, Seton Hall Conference on College Match, Stanford CEPA, Toronto, UIC, Washington, Washington University in St. Louis, and Wilfred Laurier and students in the Winter 2017 versions of Economics 622 and Public Policy 713 at the University of Michigan, particularly Ellen Stuart. Disclosure statement: This research was supported by NSF SES 0915467. The authors use restricted access versions of the NLSY 1979 and NLSY 1997 data sets that provide information on residential location and colleges attended not available in the public use versions of the data. These data are available from the Bureau of Labor Statistics; see https://www.nlsinfo.org/content/getting-started/accessing-data for further information. The authors are willing to assist (Jeffrey Smith, econjeff{at}ssc.wisc.edu).

Supplementary materials are freely available online at: http://uwpress.wisc.edu/journals/journals/jhr-supplementary.html

↵1. Kurlaender and Grodsky (2013) provide similar clarification in the sociology literature.

↵2. Recent studies that examine the “main effect” of college quality include Black and Smith (2004); Bowen, Chingos, and McPherson (2009); Cohodes and Goodman (2014); Cunha and Miller (2014); Dale and Krueger (2002, 2014); Hoekstra (2009); Hoxby (2015); Long (2008, 2010); and Zimmerman (2014). All agree on a positive causal effect for at least some groups.

↵3. We focus here on papers that address academic match at the undergraduate level and that use U.S. data. See Sander and Taylor (2012) for a survey of the related, tendentious, literature on academic match in law school.

↵4. Alon and Tienda (2005) examine academic match using the High School and Beyond and National Educational Longitudinal Study of 1988 (NELS:88) data sets. Unfortunately, they look only for effects of selectivity (their proxy for college quality) conditional on ability rather than for effects of the interaction of selectivity and ability.

↵5. The exception is Loury and Garman (1995), who find substantively important match effects on degree completion (including negative effects of college quality for black students) and post-college earnings in their study that uses the National Longitudinal Study of the High School Class of 1972. Their earnings estimates condition both on a much less rich set of background variables and on several intermediate outcomes—college GPA, major, years of college—and so correspond to a very different estimand than our own. Their completion estimates do not have the issue of conditioning on intermediate outcomes and so remain a puzzle. A replication in light of the subsequent literature would add value.

↵6. See in particular their Figures 10.5a, 10.5b, 11.1, 11.2 and 11.3. They use the terminology of match somewhat differently than we do. In particular, they sometimes refer to what we call the main effect of quality as a match effect when it applies to overmatched or undermatched students. Cunha (2009) critiques this study.

↵7. The fine Arcidiacono et al. (2013) paper looking at friendship networks in college also sheds some light on potential mechanisms.

↵8. In other recent regression discontinuity papers, Zimmerman (2014) and Goodman, Hurwitz, and Smith (2017) find substantively important effects of college quality (and/or type) on labor market outcomes toward the other end of the college quality spectrum, namely the margin between low-quality four-year colleges and two-year colleges.

↵9. In the NLSY-79 we omit the military and low-income white samples because both were dropped from the survey before most respondents had completed college. We use custom probability of inclusion weights, constructed by the NLSY, to combine the sampling groups in each survey, and to control for differing response rates by age, sex, and race/ethnicity.

↵10. The NLSY data sets also feature impressively high response rates, more than 80 percent of the initial respondents in most survey rounds. See https://www.nlsinfo.org/content/cohorts/nlsy79/intro-to-the-sample/retention-reasons-noninterview and https://www.nlsinfo.org/content/cohorts/nlsy97/intro-to-the-sample/retention-reasons-non-interview (accessed November 4, 2019).

↵11. The ASVAB test is not a straightforward measure of “innate” ability because it includes the influences and training that the student has experienced up to the point she takes the test. See Neal and Johnson (1996) for a more thorough discussion of what the ASVAB test measures. We do not mind if the ASVAB also measures intrinsic motivation, as argued by Segal (2012). More broadly, we use the term “ability” quite agnostically to mean the set of skills, innate or acquired, that students possess around the time of the college choice.

↵12. We follow Aucejo and James (2019) and include an index of petty antisocial behaviors before age 14 and early sexual activity. Following Cadena and Keys (2015), we also include an indicator of whether the NLSY interviewer rated the respondent as somewhat uncooperative in any of the first three rounds of interviews.

↵13.

*U.S. News and World Report*and IPEDS collect many of the same statistics and often report identical values. Combining data from the two sources gives us the most complete sample of colleges. We measure college quality somewhat later than the years when each cohort entered college. Many of the component measures first become available in IPEDS in 1992, and 2008 is the earliest year for which we could obtain recent*U.S. News and World Report*data. In both cases, the stability of the underlying proxies over time assuages any concerns about the modest temporal distance.↵14. We weight the percentiles by full-time-equivalent undergraduates. In each reference year, our sample includes all colleges that identify as offering four-year degrees in IPEDS that year, have at least one first-degree-seeking undergraduate enrolled, and reportat least two of our quality proxies. We exclude specialty institutions, such as nursing colleges and theological seminaries (Carnegie codes 51–60).

↵15. Our estimates also exhibit face validity. For example, for the NLSY-97 cohort the University of Michigan gets a 93, Michigan State a 74, Wayne State a 36, and Eastern Michigan a 28.

↵16. Our measure of student body size is full-time equivalent undergraduates.

↵17. Our measure also differs from that in Chetty et al. (2017). Rodriguez (2015) and House (2017) analyze various measures of academic match and find that different measures provide qualitatively different pictures of the nature and extent of deviations from assortative matching.

↵18. Black and Smith (2004, Table 4) and Light and Strayer (2000, Table 3) present alternative estimates of the joint distribution for the NLSY-79 cohort that tell the same basic story. See also Mattern, Shaw, and Kobrin (2010, Table 1).

↵19. Statistically, a chi-squared test rejects the null of a common joint distribution in the two cohorts. Substantively, our results comport with Hoxby (2009, Figure 1) and Herrnstein and Murray (1994, Chapter 1), which show that much, but not all, of the large increase in stratification by ability among colleges had played out by the time the NLSY-79 cohort entered college. Smith, Pender, and Howell (2013) reach a quite different conclusion from ours, arguing that undermatch decreased dramatically between the two cohorts they consider. The different timing of their two cohorts, their (quite) different definition of undermatch, and their inclusion of two-year schools stymie a detailed accounting of the differences between their finding and ours.

↵20. Light and Strayer (2004) document the nature and extent of transfers in the NLSY-79 and consider their relationship to wages. In keeping with the limited nature of the available data, they distinguish between two-year and four-year colleges but not more finely by quality.

↵21. See Table A-1 of their online appendix.

↵22. The NLSY data sets offer two different earnings measures: a Current Population Survey (CPS)-like measure based on a question about total earnings the previous year and a constructed variable that builds on information about wages, hours, and weeks on individual job spells. We use the CPS-like measure for both cohorts.

↵23. We compared many more and less parsimonious specifications, with and without additional covariates. For several outcomes, these tests do not reject the exclusion of

*all*ability–quality interaction terms. We include the most parsimonious specification that still allows for nonlinear interaction effects and report tests of the joint significance of these interaction terms in our results. Our thorough search leads us to think that other paths to functional form flexibility, such as substituting the log of earnings for the level or considering the levels of our ability and quality measures rather than their percentiles, would do little to alter our qualitative findings.↵24. We experimented with two other ways of using the SAT and GPA variables: one set of analyses omitted them, while the other combined them with the ASVAB components to create a broader ability index. Neither strategy affects our qualitative conclusions. We do not use the broader index as our primary ability measure because of the large number of observations with missing information on SAT and/or GPA.

↵25. Replacing the five percentile-point cutoff for a transfer up or down with a zero cutoff or a 10 percentilepoint cutoff yields qualitatively similar findings.

↵26. Our estimates for 6–7 years after college start (not shown) lie in between those for 2–3 and 10–11 years after college.

↵27. See Hershbein (2013) for subtler signaling theories of college quality.

↵28. All of the qualitative findings in Table 6 related to long-term earnings impacts for the NLSY-79 cohort persist if we restrict ourselves to a balanced panel.

↵29. Completion rate estimates by subgroup tell broadly the same story as the pooled completion estimates, with more volatility in individual point estimates due to smaller sample sizes.

↵30. The estimated effects of ability and college quality on graduation rates in the NLSY-79 cohort display less sensitivity to the conditioning set than the other outcomes we consider. We lack a good explanation for this pattern.

↵31. The corresponding completion probabilities, which we find easier to interpret, appear in their Table 12. Because the model underlying their Table 12 assumes independent errors, the distribution of unobserved variables does not depend on the choice of college and college quality in the first period. Thus, we interpret the three rows for the “overall sample” for each ability quartile as three independent simulations of the same parameter values.

↵32. Though substantively different, our estimates do not quite differ statistically from theirs. For example, in their Table 12 students in AFQT quartile 1, roughly our ASVAB quartile 1, suffer a reduction in college completion probabilities of around 0.07 from moving from their first to second quartiles of college quality. The corresponding estimate in our alternative specification using quartile indicators presented in Online Appendix Table A5 is an increase of 0.025 with a standard error of 0.049. The comparison is complicated by the fact that they do not present standard errors on their predictions and the fact that the two estimates, which rely on the same data, presumably have a nonzero covariance.

↵33. Other differences seem to us as a priori less likely to account for a large portion of the difference—for example, (i) Light and Strayer measure college quality differently than we do; (ii) Light and Strayer measure student ability a bit differently than we do, relying on the Armed Forces Qualifying Test (AFQT) score, a weighted average of four ASVAB component scores; and (iii) their sample differs somewhat from ours, as indicated by their sample size of 2,754 compared to ours of 2,441.

↵34. We also used our data to replicate their specification and then marched, one change at a time, from their setup to our setup. When we did what they did, we got estimates that look very much like what they got. Key differences result from using the first college rather than the last college attended, which reduces the estimated effect of college quality somewhat, and from including the county conditioning variables, which also reduce the estimated effect of college quality.

↵35. The earnings measure is the median value of log annual earnings over five-year intervals, excluding individuals with low enough values to suggest only marginal labor market attachment.

↵36. This identification strategy has issues of its own; see Hoxby (2009) for discussion.

↵37. With the possible statistically insignificant and empirically irrelevant exception of students in the 25th percentile of ability attending colleges in the top 10 percent of the quality distribution.

- Received August 2018.
- Accepted October 2018.

This open access article is distributed under the terms of the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0) and is freely available online at: http://jhr.uwpress.org.