Skip to main content

Main menu

  • Home
  • Content
    • Current
    • Ahead of print
    • Archive
    • Supplementary Material
  • Info for
    • Authors
    • Subscribers
    • Institutions
    • Advertisers
  • About Us
    • About Us
    • Editorial Board
  • Connect
    • Feedback
    • Help
    • Request JHR at your library
  • Alerts
  • Free Issue
  • Special Issue
  • Other Publications
    • UWP

User menu

  • Register
  • Subscribe
  • My alerts
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Human Resources
  • Other Publications
    • UWP
  • Register
  • Subscribe
  • My alerts
  • Log in
  • My Cart
Journal of Human Resources

Advanced Search

  • Home
  • Content
    • Current
    • Ahead of print
    • Archive
    • Supplementary Material
  • Info for
    • Authors
    • Subscribers
    • Institutions
    • Advertisers
  • About Us
    • About Us
    • Editorial Board
  • Connect
    • Feedback
    • Help
    • Request JHR at your library
  • Alerts
  • Free Issue
  • Special Issue
  • Follow uwp on Twitter
  • Follow JHR on Bluesky
Research ArticleArticles
Open Access

Do Administrative and Survey Data Tell the Same Impact Story?

Evidence from the Health Profession Opportunity Grants 1.0 Impact Study

Eleanor L. Harvill, View ORCID ProfileLaura R. Peck and Douglas Walton
Journal of Human Resources, May 2025, 60 (3) 1019-1053; DOI: https://doi.org/10.3368/jhr.0120-10673R2
Eleanor L. Harvill
Eleanor L. Harvill is a senior associate at Abt Associates .
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: [email protected]
Laura R. Peck
Laura R. Peck is a principal scientist at MEF Associates (corresponding author, ).
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Laura R. Peck
  • For correspondence: [email protected]
Douglas Walton
Douglas Walton is an associate at Abt Associates .
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: [email protected]
  • Article
  • Figures & Data
  • Info & Metrics
  • References
  • PDF
Loading

Abstract

Job training evaluations face a choice: whether to use survey data, administrative data, or both to estimate impacts. Using data from the Health Profession Opportunity Grants (HPOG 1.0) Impact Study, we investigate whether employment and earnings levels and impacts of gaining access to occupational training differ by source: survey data, National Directory of New Hires data, and state unemployment insurance data. Impacts of HPOG 1.0 on employment do not differ, but earnings impacts differ between the data sources. Administrative data analysis finds positive earnings impacts, whereas survey data analysis detects none. These findings differ from related research, which tends to report that earnings impacts estimated from survey data are larger than those estimated from administrative data.

JEL Classification:
  • I3
  • J3

I. Introduction

When assessing the labor market impact of job training and workforce development interventions, program evaluators face an important design decision: which data sources to use to measure outcomes and estimate impacts. The two most common sources are administrative records from government or administering agencies and data from surveys of individuals. Obtaining administrative records is typically much less expensive than fielding a survey, but surveys allow the researcher to ask precise questions to measure specific outcomes of interest. In this article, we use data from the Health Profession Opportunity Grants (HPOG 1.0) Impact Study to explore how employment and earnings levels and impacts differ based on alternative sources of data on employment and earnings (Abt Associates and Peck 2022).1

From an evaluation perspective, it is important to distinguish between two types of cases: (i) cases where differences in earnings and employment affect the treatment and control groups in the same way and (ii) cases in which the differences affect the two groups differently.

In the first case, if the measured outcome overstates or understates the true impact value by a constant proportion for both groups, then the observed impact and standard error will be proportional to the true impact. That is, the data sources’ differences that affect outcome levels the same way for treatment and control group members will not lead to different conclusions about whether or not the program had an impact (because tests for statistical significance would not be affected). That said, even if the statistical significance of the impact is not affected, a change in the magnitude of impacts between data sources may affect interpretation of findings, potentially leading to different conclusions.

Proportional over- or understatement affects observed outcome levels (and not impact conclusions), which we label as such. In comparison, when the over- or understatement affects the treatment and control groups differently, we refer to these as differences in relative impacts. Differences in impacts affect the conclusions that evaluators, program administrators, and policymakers draw regarding the effectiveness of the intervention.

A number of studies have explored differences in levels and impacts between survey-reported and administrative-recorded earnings. Reviews of experimental impact evaluations conclude that the estimated impacts on earnings often differ between data sources, with survey data typically resulting in absolutely higher levels of employment and earnings and, to a lesser extent, relatively larger impacts (Barnow and Greenberg 2015, 2019). This pattern occurs across a variety of interventions. For example, the national evaluation of the Job Corps program found both higher earnings levels and larger earnings impacts in survey data than in data collected from state unemployment insurance (UI) records. Researchers found that the difference was due to a variety of factors, including survey response bias, UI undercoverage (particularly of short-term casual jobs), and survey respondents’ overreporting their hours worked (Schochet, Burghardt, and McConnell 2008). Similarly, an experimental evaluation of individual training accounts found that survey-based earnings levels and impacts were substantially higher than those based on earnings data from UI records. The evaluators attributed this largely to differences in the types of employment each data source covered (Moore, Perez-Johnson, and Santillano 2018). A study of welfare recipients in Wisconsin found that workers with unsteady work history, uncovered jobs, or out-of-state employment tend to have larger differences between survey-based and UI earnings; however, a regression analysis found that these factors can explain only about 10 percent of the difference, suggesting that most of it is due to other factors (Wallace and Haveman 2007).

There are other cases for which survey data lead to higher earnings levels but no detectable differences in relative impacts. Using data from the evaluation of the Job Training Partnership Act (JTPA), Kornfeld and Bloom (1999) found that survey-based average quarterly earnings for adults were about 30 percent higher than earnings reported in UI data. However, the impact estimates from the two sources, when expressed as a share of the control group, are similar, even though the survey impact is larger in dollars (due to the higher earnings level). The authors found a more substantial difference in earnings estimates for youth with a prior arrest, for whom the survey-based impact was much more negative than the impact using UI data.

Although most of the existing research has used state UI data as the source of administrative earnings records, some evaluations have used more comprehensive sources. The most recent experimental evaluation of the Workforce Investment Act found differences between survey-based earnings and earnings based on two different administrative data sources: administrative tax data and the National Directory of New Hires (NDNH), which is a national database of UI earnings records. Consistent with other evaluations, impacts on survey-based outcomes tended to be larger than impacts on outcomes based on administrative data. Notably, impacts were larger using tax data than using NDNH data. The researchers concluded that this is likely because the administrative tax data included earnings from self-employment and independent contract work, whereas NDNH data did not (Mastri, Rotz, and Hanno 2018).

Although it is not possible to know whether the administrative data–based or survey data–based impact estimates are closer to the true impact, the literature points to several explanations for why estimates might differ between data sources. These explanations offer various reasons to prefer one data source over the other.

This article makes several contributions to the literature on the trade-offs between administrative and survey data in estimating program impacts on labor market outcomes of employment and earnings. We propose a conceptual framework for considering the possible explanations for differences in earnings levels and impacts. As we detail in the next section, these explanations include (i) survey nonresponse bias, (ii) various types of concept misalignment, and (iii) measurement error. We apply this conceptual framework to data from the HPOG 1.0 Impact Study, considering how two of these factors—survey nonresponse and concept misalignment—affect outcome levels and impacts.

The analysis of the HPOG 1.0 data makes several further contributions to the literature comparing administrative and survey data sources. Most notably, much of the previous literature has relied on administrative earnings data from individual states. In contrast, we use data from a national database of quarterly earnings (the NDNH), which consists of administrative earnings data from all states. We use a novel approach to estimate the improvement in using data from the NDNH relative to obtaining data from individual state data. To our knowledge, there is little literature comparing NDNH-based earnings with state UI–based or survey-based earnings, because researcher access to the NDNH is relatively new (for example, Mastri, Rotz, and Hanno 2018). We consider both employment and earnings, whereas much of the prior work has focused primarily on earnings alone. We also examine two measures of survey-reported earnings, highlighting how alternative measures of earnings capture distinct constructs, which may or may not be appropriate to align with one another. Methodologically, we suggest an approach to test statistically for differences in outcome levels, absolute impacts, and relative impacts—an approach that, to our knowledge, has not yet been used in this literature.

In the following, we first describe our analytic framework of possible factors that might contribute to differences in levels and impacts between administrative and survey data. A methodology section then details the data sources, measures, and analytic methods we use. We then report findings and conclude by discussing implications for future research.

II. Framework for Analysis: Factors Explaining Differences Between Results Obtained from Different Data Sources

A main goal of an impact evaluation is to accurately measure the outcome of interest—in this instance, labor market outcomes—and to assess the impact of an intervention on that outcome. As shown in previous research, the measurement of these outcomes varies among data sources. Because it is not possible to compare observed measures to the true value of the underlying construct (that is, to objective reality), we use differences between data sources to understand inaccuracies in measures from each source. In this section, we propose a framework for systematically assessing the factors driving these differences among data sources—which we classify as (i) survey nonresponse bias, (ii) concept misalignment, and (iii) measurement error. We introduce each of these with relevant grounding in prior research that offers evidence of each as a factor that explains observed differences both in levels and in impacts.

A. Survey Nonresponse

Survey nonresponse refers to the reality that some portion of a study sample will not respond to a survey, meaning that survey data are available for only a subset of the study sample. There are two kinds of survey nonresponse: unit nonresponse, where a sample member provides no survey data, and item nonresponse, where a sample member responds to the survey but does not provide data for a particular item. Both kinds of survey nonresponse can generate bias. If the level of an outcome is related to the probability of nonresponse, then the outcome level derived from a survey could be biased (Bollinger and Hirsch 2013; Groves 2006). When treatment and control groups respond at different rates, survey nonresponse bias could yield differences in both absolute and relative impacts (Barnow and Greenberg 2015, 2019).

The influence of unit nonresponse can be tested by comparing the level of administrative-based earnings—which are recorded for everyone—for respondents versus nonrespondents. In previous evaluations, survey respondents generally had higher levels of earnings in administrative data than did nonrespondents (for example, Barnow and Greenberg 2015, 2019). Item nonresponse is also a concern. Survey respondents might consider earnings and income to be sensitive topics and choose not to provide answers to these questions (Riphahn and Serfling 2002).

B. Concept Misalignment

Concept misalignment refers to whether measures constructed from various data sources capture the underlying concept of interest. In our scenario, where the concept of interest is total earnings, the concept measured in a survey can be quite different from the concept measured in administrative data.

Concept misalignment is an important consideration for both administrative data–based and survey data–based measures. The structure and wording of a survey question can affect the measurement of the concept of interest. For example, a validation study of respondents to the Panel Study of Income Dynamics (PSID) found that participants asked to report “usual” weekly earnings and work hours overreported their actual average earnings by about 6 percent (Bound et al. 1989). The authors posited that respondents might be interpreting “usual” to mean something other than “average” (for example, the median or mode), which results in an overestimate of the true average. In a study of individuals with low incomes eligible for training under JTPA, Smith (1997) found that overtime earnings are substantially overreported. Respondents who reported any overtime were asked to report the amount of overtime pay earned in an average week and the number of overtime hours worked in a typical week. Because of this wording, respondents likely reported the average overtime pay that they received in weeks in which they worked overtime; this is a different concept than total overtime pay averaged across all weeks employed, including weeks without any overtime hours.

In administrative data, coverage is a key potential type of misalignment. The most commonly used source of administrative data on employment and earnings—state UI systems—excludes out-of-state employment as well as self-employed workers, independent contractors, certain agricultural workers, and those who work for the federal government (Hotz and Scholz 2002; Stevens 2007; Bureau of Labor Statistics [BLS] 2017). Together, these exclusions imply that UI data could be missing 10–15 percent of all jobs in the economy.

Concept misalignment certainly has implications for outcome levels, but it may also matter in estimating impacts. If the intervention being evaluated creates differences in the alignment of outcomes between the treatment and control groups, then some of what is estimated as impacts could reflect that concepts are aligned differently for the two experimental groups. For example, in an intervention that shifts treatment participants out of self-employment and into UI-covered employment, UI-based estimated earnings impacts might be larger than survey-based estimates because UI does not cover self-employment whereas a survey can.

As we detail for HPOG later, we suggest five potential types of concept misalignment, where labor market outcomes constructed from different data sources might capture different concepts of interest: reporting interval, reporting period, gross versus net earnings, classes of covered employment, and handling of multiple jobs. This article provides evidence on three of these from HPOG 1.0 data: (i) reporting interval, (ii) gross versus net earnings, and (iii) classes of covered employment.

C. Measurement Error

Measurement error is a potential concern in both survey and administrative data. In a review of literature on measurement error in surveys, Bound, Brown, and Mathiowetz (2001) explored how both social psychology and survey design can contribute to measurement error. In their review of validation studies that match survey responses to administrative data (either tax data or employer records), they found that the degree of error differs across economic concepts. Reporting of gross annual earnings tends to be reliable with little bias, which they posited was due to reinforcement of one’s annual earnings throughout the year (for example, in the preparation of federal taxes, or application for credit). In contrast, hourly earnings and hours worked are reported with substantially more error. The study of populations with low incomes eligible for JTPA found similar results, with earnings calculated indirectly from hourly wage and weekly hours exceeding a direct measure of annual earnings by as much as 20 percent (Smith 1997).

Recall error is another type of measurement error that arises as the time between the experience in question and the survey increases. Prior research documents that recall error increases with longer follow-up periods (Kornfeld and Bloom 1999; Schochet, McConnell, and Burghardt 2003). Moreover, social desirability concerns might influence respondents to overreport or underreport their earnings, depending on whether they feel pressure to exaggerate or understate them (Hariri and Lassen 2017).

In administrative data, measurement error could arise if firms fail to report their payroll information to state unemployment agencies. However, this is generally not a major factor because coverage rates typically exceed 96 percent (BLS 2018). Errors may also occur in state processing of payroll information or in transferring data from one administrative system to another.

As with the other two factors (survey nonresponse, concept misalignment), measurement error is a concern for measuring both outcome levels and impacts. If measurement errors operate differently in treatment and control groups, then those errors can bias impact estimates. We recognize that measurement error has the potential to influence outcomes and impacts, but we do not have the means to examine how or to what extent measurement error is present in HPOG 1.0 data. For that reason, we include the factor in this framework but present no associated results, returning to discuss measurement error only briefly in concluding.

III. Methodology

This section begins by describing the data sources and some issues pertaining to their coverage and measures. Then it defines the outcomes, along with the analytic methods used in this analysis.

A. Data Sources

1. HPOG 1.0 Impact Study

Data for this analysis come from the HPOG 1.0 Impact Study, an experimental evaluation of a career pathways–based sectoral training for Temporary Assistance for Needy Families (TANF) recipients and other adults with low incomes. The evaluation randomized 13,802 eligible participants at 42 local programs operated by 23 grantees nationwide.2 This article uses a subset of the evaluation data: 10,617 study participants in the 36 HPOG programs where we have consistent survey data on labor market measures of interest.3

The HPOG 1.0 Impact Study is a useful case study for exploring differences between administrative and survey data for several reasons: the sectoral and cross-site nature of the data and the self-employment and federal employment issues that the study raised, all of which arise in use of survey versus administrative data in evaluation research. In addition, the sample size is larger than in many previous studies in this literature, which provides more statistical power for detecting differences, should they exist.

2. National Directory of New Hires

The NDNH is maintained by the U.S. Department of Health and Human Services (HHS), Administration for Children and Families, Office of Child Support Enforcement (OCSE). Established by the 1996 Personal Responsibility and Work Opportunity Reconciliation Act, the NDNH collects quarterly earnings records from all state UI systems, as well as earnings data from federal agencies that participate in the Unemployment Compensation for Federal Employees system. OCSE may disclose NDNH data to a federal, state, or local agency for research purposes, provided that the research contributes to the purposes of the federal TANF program or the federal–state child support program (Office of Child Support Enforcement [OCSE] 2019).4

The NDNH provides a more comprehensive source for employment and earnings data than do UI data, which until recently have been the main administrative data source for social welfare policy evaluations. An important limitation, however, is that access to UI data is available only to government agencies or their contractors.

For this analysis, we employ a novel approach to compare estimates from three different sources of administrative data. We simulate these sources by taking subsets of the NDNH data set. The first subset, single-state UI, is based only on state UI records in the NDNH from the state in which the participant’s HPOG program is located. This mimics obtaining UI data from the state where the program is located. As such, it excludes out-of-state and federal employment. The second subset, national UI, is based on all states’ UI records in the NDNH, excluding federal data. This mimics having UI data from all states but not the federal government. The third data set, full NDNH, consists of all data in the NDNH, including all state UI records and federal records.

3. HPOG Participant Survey

A participant follow-up survey was fielded to HPOG 1.0 Impact Study participants beginning 15 months after random assignment. Average time of completion was 18 months after random assignment. Of the total study sample, 8,091 study participants responded to the survey, a response rate of about 76 percent.5 The survey covered many topics; those of central interest to this analysis relate to labor market outcomes, including employment, hours worked, and wages.6

B. Potential Sources of Differences Between Administrative- and Survey-Based Outcomes in the HPOG 1.0 Impact Study

This article’s framework examines both survey nonresponse and various types of concept misalignment that might influence measurement of outcome levels and estimation of impacts differently for certain administrative versus survey data sources. This section considers survey nonresponse and particular types of concept misalignment—reporting interval, gross versus net earnings, and classes of covered employment—in the specific context of HPOG data.

1. Survey nonresponse

As noted just above, the HPOG 1.0 Impact Study’s short-term follow-up survey had a response rate of 76 percent. Even for the small differential response rate between the treatment and control groups (78 vs. 73 percent), both outcome levels and impacts estimated on the survey subsample have the potential to differ from those estimated on the full sample. We assess potential survey nonresponse bias by estimating impacts from administrative data, separately for the full sample and for the survey respondent sample.

2. Reporting interval

Unemployment insurance earnings data consist of total covered earnings received during a calendar quarter. In comparison, the HPOG survey asked respondents to report their weekly hours worked and an hourly wage, or earnings over some other respondent-chosen interval (for example, per biweekly pay period or by whatever unit the respondent felt most comfortable reporting). This difference could result in misalignment when converting weekly hours and hourly wages to quarterly levels, a process that implicitly assumes constant values for the entire period. Moreover, the number of pay periods contained in a quarter varies, so total earnings will vary quarter-by-quarter even if weekly earnings remain constant.7

3. Gross (pre-tax) earnings versus net (take-home) earnings

Unemployment insurance earnings consist of all earnings subject to reporting requirements and are reported before taxes and deductions. Surveys can specifically ask respondents to report gross (pre-tax) or net (take-home, after-tax) earnings. In the HPOG survey, participants were asked to report earnings two different ways: (i) hourly wage and hours worked per week and (ii) total earnings in the past month. For the former measure, the survey specifically asks participants to report their hourly earnings before taxes; for the latter measure, the survey asks participants “About how much did you receive in job earnings” in the prior month. The word “receive” is ambiguous and could be interpreted as asking for any one of gross, after-tax, or take-home earnings. In this article’s results, we assess how levels and impacts differ between these two earnings measures.8

4. Classes of covered employment

Unemployment insurance coverage is broad, constituting a virtual census of nonfarm employees on private payrolls, as well as nearly all (96 percent) of state and local government workers (BLS 2018).9 However, certain classes of workers are excluded from coverage. Notably, UI data from a single state exclude employment in other states and employees of the federal government. As a result, evaluations that use UI earnings data from a single state (or just a few states) do not capture out-of-state or federal employment, resulting in a downward bias in earnings levels. Moreover, if out-of-state or federal employment is more common among those in the treatment group than in the control group (or vice versa), then impact estimates based on single-state UI earnings could be biased. By including earnings for both out-of-state and federal employment, the NDNH provides greater coverage than data from single-state UI.10

In comparison to the NDNH and UI administrative data, survey data have no such exclusions, and they may include under-the-table or gig economy employment that administrative data would miss.11 As such, survey-based employment and earnings levels are likely to be greater than administrative data–based employment and earnings levels because of the survey data’s more comprehensive coverage.

In summary, we focus in this article on differences in levels and impacts among data sources. However, it is possible that some sources of error do not systematically over- or understate the true value, but simply add noise to measured earnings. This type of error would not bias estimates but could increase the residual variance of earnings and decrease the precision. Among the sources of error described above, reporting interval and inconsistent interpretation of data as pre- or post-tax seem to have the most potential to add noise to measured earnings. Moreover, to the extent that survey outcomes have larger variance than administrative outcomes, the additional noise will be reflected in larger standard errors describing the precision of estimates.

C. Outcome Measures

In this section, we define a set administrative data–based outcomes (one each for employment and earnings) and the survey-based outcomes (one for employment and two for earnings). We estimate impacts on these outcomes to explore issues of survey nonresponse concept misalignment.

1. NDNH-based outcome measures

For each of the three subsets of data (single-state UI, national UI, full NDNH), we construct outcome measures for employment and earnings. As in the HPOG 1.0 Short-Term Impacts Report (Peck et al. 2018), the outcomes are based on the fifth quarter after random assignment, which is roughly consistent with the timing of the evaluation’s short-term follow-up survey. We constructed two measures from each of the three configurations of NDNH data: (i) a binary indicator of whether the participant had any earnings in the fifth quarter after random assignment (Employment in Q5) and (ii) total earnings from all employers in the fifth quarter after random assignment (Earnings in Q5).

2. Survey-based outcome measures

From the survey data, we construct a single measure of employment as a binary indicator of whether the participant was working at the time of the survey (survey employment).

Regarding earnings, the HPOG follow-up survey asked several questions. One question asked participants to report their total job earnings from the previous month; a separate set of questions asked participants to report their hourly wage and hours worked per week. This leads us to define two alternative “quarterly equivalent” measures of survey-based earnings. To make the measures roughly equivalent in reporting period (the quarter), we convert monthly and weekly survey-reported earnings into quarterly estimates. Then we construct quarterly earnings (survey monthly earnings × 3), defined as total reported job earnings in month prior to survey, multiplied by three.12 We also construct quarterly earnings (survey weekly earnings × 13), defined as the hourly wage, multiplied by the number of hours worked per week, multiplied by 13.13 The conversion required some assumptions—namely, that earnings, hours, and wages reported at the time of the survey were stable across the entire quarter.14 As noted above, this may be an area where we have concept misalignment; however, given the lack of detailed work history in the survey, we believe this is the most reasonable assumption for this exercise.

Table 1 summarizes the details of our outcome measures, including the label we use for each, definition (including survey question text for survey outcomes), and data source.

View this table:
  • View inline
  • View popup
Table 1

Outcome Measures

D. Analytic Methods

To estimate HPOG’s impact, we estimate the following parametric linear model by ordinary least squares:

Embedded Image

where yi is the outcome for individual i, Ti is a binary indicator for whether the participant was randomly assigned to the treatment group, the coefficient βT captures the impact of being in the HPOG treatment group, Xik is a vector of baseline characteristics,15 and εi is an error term. For participants with missing data on baseline characteristics Xik, we impute the mean value and include an indicator identifying the measure as missing (dik), in line with common practice (Puma et al. 2009). The coefficient for each missing baseline characteristic is γk.16 Although we do not include indicator variables for the 36 HPOG programs in the regression, we account for clustering within programs when calculating standard errors.17,18

For survey outcomes, we use weights to address nonresponse. We use the baseline covariates listed above to model nonresponse separately for the treatment group and the control group.19 This approach is valid as long as the probability of response does not truly depend on unobservable characteristics that are also related to outcome. Because the evaluation randomized with a ratio of two treatment group members for each control group member, the sample includes 7,116 treatment group members and 3,501 control group members. Because the random assignment probability was the same in all sites, we do not adjust for this in the model. The sample of survey respondents includes 5,566 treatment group members and 2,525 control group members.

To construct standard errors for impact estimates, we use a cluster-bootstrap with clusters defined by the 36 HPOG programs. We constructed 2,000 bootstrap samples by resampling program-level clusters with replacement. Each bootstrap iteration repeated the construction of survey weights; estimation of the treatment group mean, control group mean, impact, and relative impact for each outcome; and calculation of the difference in means and impacts across outcomes. The variance across bootstrap samples was used to construct standard errors.20,21

This approach to constructing standard errors also allows us to test for differences in outcome levels and impacts across employment outcomes by data source, a test that other analyses of administrative versus survey data have not explicitly done. The HPOG data use agreement prevents us from combining individual-level, survey-based measures of earnings with the NDNH data. Bootstrapping at the cluster level allows testing empirically for differences in both levels and impacts between survey and administrative measures of earnings without combining individual-level data sets. To do this, we select the same sample of programs for each bootstrap iteration in the NDNH and survey data, estimate the impacts for each iteration separately for each data source, and then merge the bootstrap estimates. This produces the joint variance of impacts on survey and NDNH measures of earnings along with a test of differences in those impacts.

IV. Results

The framework we proposed above hypothesized three reasons why the average levels of and impacts on employment and earnings might differ between administrative and survey data sources: (i) survey nonresponse bias, (ii) concept misalignment, and (iii) measurement error. This section considers survey nonresponse bias and concept misalignment in turn, assessing the evidence that each factor yields differences in levels of and impacts on earnings outcomes in the HPOG 1.0 Impact Study. As noted, we recognize that measurement error has the potential to influence outcomes and impacts, but we do not have the means to examine how or to what extent measurement error is present in HPOG 1.0 data.

A. Exploring Reasons for Differences Between Data Sources: Survey Nonresponse

If survey respondents differ systematically from nonrespondents, then survey-based measures could be biased in both levels and impacts. To test for evidence of bias, we compare those outcome levels and impacts for the full sample to outcome levels and impacts for the survey respondents, using the same NDNH outcome measures. The NDNH-based earnings and employment outcomes are nearly universally available for the study sample, providing a suitable source of data for assessing nonresponse bias.22

The first two panels of Table 2 show the differences in the levels and impacts for earnings between survey respondents and the full sample. To assess nonresponse bias, we first compare employment and earnings levels and impacts in the full sample to the unweighted survey sample; we then compare the full sample to the survey sample after adjusting for differences in baseline characteristics with nonresponse weights.

View this table:
  • View inline
  • View popup
Table 2

Does Survey Nonresponse Affect Impact Estimates?

The average treatment group member in the full sample earns $3,878 in the fifth quarter after random assignment, and the average treatment group member in the unweighted sample of survey respondents earns $3,856. The difference of $22 is not statistically significant and is substantively small.23 Similarly, the $41 difference in average earnings between the control group full sample and the control group survey respondents ($3,648 and $3,606, respectively) is not statistically significantly different from zero. These differences in levels are small and do not meaningfully change our understanding of the level of earnings.

Given the lack of meaningful differences in levels, we would not expect to see large differences in the impacts between survey respondents and the full sample. Analyses of both the full sample and the unweighted sample of survey respondents reveal small but statistically significant, positive impacts on earnings (Panel 1). Based on full-sample estimates, HPOG increases earnings by $231 in the fifth quarter after random assignment, a 6 percent relative increase; in the unweighted survey sample, the impact is $250. The $19 difference in absolute impacts and 0.6 percentage point difference in relative impacts between analyses are not statistically significantly different from zero, nor do they meaningfully alter the magnitude of the effect of the program. However, given the standard error estimated for the difference, the magnitude of this difference could plausibly be more than $100 per quarter.

Next, we compare earnings and employment in the full sample to those in the weighted survey sample (Panel 2). This comparison explores whether the application of nonresponse weights may improve the alignment of survey earnings and employment with those measured in the administrative data. Results for the weighted survey sample are quite similar to results for the unweighted survey sample. The differences in the levels and impacts on earnings and employment between the full sample and the weighted survey sample are on the order of only a few dollars and only a few tenths of a percentage point, respectively. The relative impact is the same regardless of whether nonresponse weights are used.

The third and fourth panels of Table 2 show the results for employment. Rounding to the nearest percentage point, we find that about 76 percent of the treatment group and 74 percent of the control group are employed, irrespective of whether we use the full sample, the weighted subsample of survey respondents, or the unweighted sample of survey respondents. In the third panel, the differences in levels of employment between the full sample and the unweighted sample of survey respondents are less than half of a percentage point and are not statistically significantly different from zero. The impact on the full sample is 1.3 percentage points, and the impact on the unweighted sample of survey respondents is 1.6 percentage points. The 0.3 percentage point difference in impacts is not statistically significantly different from zero. Comparing the full NDNH sample to the weighted sample of survey respondents (Panel 4) yields similar conclusions.

In summary, for both the unweighted and weighted estimates, the small differences in levels of earnings and employment do not lead to detectable differences in the impacts of the HPOG Program, leading us to conclude that survey nonresponse does not lead to biased estimates.24 Because there is no indication that one approach is preferable for impact estimation, our preferred specification includes nonresponse weights because this is the standard approach in the impact evaluation literature.

B. Exploring Reasons for Differences Between Data Sources: Concept Misalignment

We next turn to concept misalignment, which focuses on whether outcomes constructed from alternative data sources measure the same underlying concept. As described above, we assess three different potential types of concept misalignment in the HPOG 1.0 data: (i) reporting interval, (ii) gross versus net earnings, and (iii) classes of covered employment. We compare outcome levels and impacts across the administrative data-based and survey-based measures defined above. Differences in the levels of these outcomes would suggest that they are measuring different underlying concepts; differences in impacts would be evidence that the misalignment is related to having access to HPOG.

1. Alternative survey measures of earnings

First, because respondents reported both their total job earnings from the previous month and their typical hourly wage and hours worked per week, we can compare earnings at the individual level. Table 3 reports the distribution of survey-reported weekly and monthly earnings for control group participants across three bins: $0, $1–$3,999, and $4,000 or more. A majority of respondents were in the same earnings bin for both measures. However, there are some notable differences. On average, earnings calculated from the weekly report were higher than those based on the monthly report. For example, nearly half of participants (1,101 out of 2,320, or 47 percent) reported that they earned $4,000 or more per quarter based on weekly earnings. Just 28 percent (639 out of 2,320) reported that they earned $4,000 or more per quarter based on monthly earnings.

View this table:
  • View inline
  • View popup
Table 3

Cross-Tabulation of Survey-Reported Weekly Earnings and Monthly Earnings (Converted to Quarterly, for Comparability)

To explore whether the difference in earnings may be driven by reported hours, we calculate average quarterly earnings for different levels of reported hours. As shown in Table 4, the ratio of earnings from the weekly report to the monthly report increases with the number of hours worked. This pattern is quite similar to the one reported by Smith (1997), who also found that the ratio of indirect earnings (that is, weekly report) to direct earnings (that is, monthly report) increases with the number of hours worked.

View this table:
  • View inline
  • View popup
Table 4

Average Survey-Reported Weekly Earnings and Monthly Earnings (Converted to Quarterly, for Comparability), by Weekly Hours Worked

We now assess how earnings levels and estimated impacts differ between the two survey measures. As shown in Table 5, the levels differ substantially. When calculated from weekly earnings, the control group’s quarterly earnings are estimated to be $4,053; when calculated from monthly earnings, the control group’s quarterly earnings are estimated to be $2,601. The difference of $1,452 in the levels is both statistically significant and substantively quite large. That said, the impacts based on each of these measures are not statistically different from each other, nor are the impacts themselves statistically different from zero—$142 for earnings calculated from weekly earnings (relative impact of 3.5 percent) and $99 for earnings calculated from monthly earnings (relative impact of 3.8 percent).

View this table:
  • View inline
  • View popup
Table 5

Do Earnings Levels and Impacts Differ Between Two Survey Measures?

The differences in levels are proportionately similar for the treatment and control groups. For both the treatment and control groups, quarterly earnings constructed from self-reported monthly earnings are about 64 percent of the quarterly earnings levels constructed from self-reported weekly hours and hourly wage. We conclude that the two different survey questions produce different earnings levels, but do not find a difference in impacts. A precisely estimated zero would indicate that the difference in wording affects the treatment and control groups in a similar manner. However, because the standard error on the difference in impacts is large, we cannot rule out the possibility of a meaningful difference in impacts.

These results appear to be driven by two important differences between the survey measures. The first difference is the reporting interval: one measure is based on a survey question that directly asked respondents to report their total earnings over a month, whereas the other constructs earnings indirectly from reported hours worked and hourly wage. Prior literature has found that earnings constructed indirectly from weekly hours worked and wages tend to exceed earnings reported directly over a longer interval, such as monthly or annually (Bound, Brown, and Mathiowetz 2001; Bound et al. 1994; Smith 1997). Constructing quarterly earnings from either weekly or monthly earnings requires the assumption that the week or month represents the average earnings over the quarter, including periods of reduced employment. If, when asked how many hours per week on average they are currently working, respondents give the number of hours they work in a week with no unpaid time off, then that will tend to overstate the number of hours for which they are actually paid.25 A month-long reference period might better capture the reality of fluctuations in hours worked over the course of a month. In addition, salaried workers might report that they work more than 40 hours per week, but report their hourly wage as if they were working a standard 40-hour week.

The second difference between the two measures relates to whether they are gross or net, pre-tax or post-tax. In the survey, participants were asked to report their hourly earnings before taxes, but the monthly earnings question asked participants to report “About how much did you receive in job earnings” in the prior month, which is ambiguous and could be interpreted as asking for earnings after taxes or asking for take-home pay instead of gross earnings. The control group mean earnings calculated from monthly income is 34 percent lower than the mean earnings calculated from hourly wage and hours worked. This is larger than the 20 percent difference in earnings levels in the JTPA data documented by Smith (1997), in which the survey explicitly asked for pre-tax data.

2. Alternative administrative data sources

We next consider whether earnings and employment levels and impacts differ among the three variants of administrative data that we examine. We compare results across the single-state UI, national UI, and full NDNH administrative data sources to assess how the outcomes might differ by whether out-of-state or federal employment is included in each source.

As expected, including additional sources of wage data increases the level of earnings. The first two panels of Table 6 show that average earnings for the control group are $3,283 according to UI data from the states where the HPOG program was located (single-state UI), $3,594 according to data from all state UI systems (national UI), and $3,648 according to all state UI plus federal data (full NDNH). For these three measures, the treatment group mean is $3,561 (single-state UI), $3,831 (national UI), and $3,878 (full NDNH), respectively. Although differences in absolute impacts are not statistically significant, differences in relative impacts are—data from a single state suggest that HPOG increased earnings 8.5 percent, whereas data from all states leads to the conclusion that HPOG increased earnings 6.6 percent. This result suggests that control group members were more likely to be employed out of state than were treatment group members.26 This finding highlights the importance of considering the control group experience when comparing data sources.

View this table:
  • View inline
  • View popup
Table 6

Do Earnings and Employment Levels and Impacts Differ Among Alternative Administrative Data Sources?

Similarly, each additional data source increases the level of employment in the sample, as shown in the third and fourth panels of Table 6. According to single-state UI, 70.5 percent of treatment group members and 69.1 percent of control group members were employed. Employment rates are higher in the national UI data, rising to 75.0 percent for treatment group members and 73.7 percent for the control group. In the full NDNH, employment rises to 75.5 percent for the treatment group and 74.2 percent for the control group. Across all three samples, treatment–control differences are quite similar, ranging between 1.3 and 1.4 percentage points, and not statistically significant.

Next, we examine whether differences in outcome levels and impacts between administrative data sources are larger for states with high levels of out-of-state employment. For each state where an HPOG program is located, Table 7 reports the proportion of treatment group members with in-state employment, out-of-state employment, and federal employment. The rate of out-of-state employment ranged considerably across program locations, with Missouri having the highest rate of out-of-state employment (18.8 percent) and Texas having the lowest rate (1.7 percent). The rate of out-of-state employment is driven largely by the location of the program within the state; for example, the Missouri HPOG program is located in Kansas City, so we would expect a sizeable number of participants to be employed in the neighboring state of Kansas.

View this table:
  • View inline
  • View popup
Table 7

In-State, Out-of-State, and Federal Employment Among HPOG Treatment Group Participant States, Ordered by Percentage of Out-of-State Employment

Three states have rates of out-of-state employment that are statistically significantly greater than the overall rate of out-of-state employment: Missouri, Kentucky, and Kansas. We focus on these states to test whether moving from single-state UI to national UI affects impacts.

Table 8 presents earnings and employment impacts for programs in the three states with the highest levels of out-of-state employment. As expected, both earnings and employment levels are substantially higher when measured with national UI data than with single-state UI data. However, for both earnings and employment, there is no difference in either the absolute or relative impacts between data sources. Programs in these states represent just a fraction of the HPOG sample (1,627 out of 10,370 sample members with valid earnings data, or about 16 percent of the sample) and are not necessarily representative of all HPOG programs. However, the results for programs in these states reinforce the general conclusion that although there are statistically significant differences in earnings and employment levels between administrative data sources, there are no detectable differences in impacts.

View this table:
  • View inline
  • View popup
Table 8

Do Earnings and Employment Levels and Impacts Differ Among Alternative Administrative Data Sources, for States with High Out-of-State Employment?

3. Covered classes of employment

Another type of potential concept misalignment is the coverage of self-employment, which is included in the survey-based measures but not in the NDNH measures. Below, we present data from the Current Population Survey Annual Social and Economic Supplement on the rate of self-employment in various occupations among adults in households with low incomes (Table 9). For each major occupation group, other than the armed forces, we give the proportion of survey respondents who were self-employed, working in the private sector for a wage or salary, working in the government sector, or working as an unpaid family worker. Healthcare occupations are included in two major occupation groups: The professional and related occupations include healthcare practitioner and technical occupations, such as doctor, nurse, paramedic, and licensed vocational nurse. The service occupations include healthcare support occupations, such as medical assistant and home health aide.27

View this table:
  • View inline
  • View popup
Table 9

Distribution of Self-Employment, Across Major Occupation Groups, for Nonmilitary Workers Whose Incomes Are <250 Percent Poverty Level

Among healthcare workers in these two occupation groups, approximately 4 percent are self-employed. Self-employment is higher in occupation groups outside the healthcare sector, particularly management, business, and financial occupations and construction and extraction occupations. If the HPOG control group is more likely than the treatment group to work in occupations outside the healthcare sector (where the rate of self-employment is higher), NDNH-based measures might underestimate the control group’s employment and earnings, resulting in overstated impact estimates according to NDNH data.28

In addition to self-employed persons, certain types of government workers are excluded from some administrative data sets. As discussed above, the NDNH includes data on earnings for federal government employees, which, according to Table 9, accounts for about 3 percent of employment in the healthcare practitioners and technical occupations and about 1 percent of employment in healthcare support occupations. This stands in comparison to 2 percent across all occupations, implying little difference. State UI systems, however, do not include data on federal workers. Still, researchers often rely on more-limited sources of administrative data on earnings and employment, such as UI data from a single state or multiple states.

C. Overall Differences in Findings from Administrative and Survey Data Sources

The analyses above considered the influence of HPOG participant follow-up survey nonresponse and three types of concept misalignment on outcome levels and impacts. Any of those differences has implications for the overall assessment of program impacts. For that assessment, we compare the full NDNH-based outcomes and impacts to those derived from the HPOG participant follow-up survey.

The first two panels of Table 10 contain results for earnings. As noted previously, the average earnings for the control group differ substantially between the two survey measures—$4,195 for the measure based on weekly earnings and $2,699 for the measure based on monthly earnings. The NDNH measure falls between these two extremes at $3,878 per quarter.

View this table:
  • View inline
  • View popup
Table 10

Do Earnings and Employment Levels and Impacts Differ Between Administrative and Survey Data Sources?

The two data sources yield different conclusions about HPOG’s earnings impact. The NDNH data lead us to conclude that HPOG had an impact on Q5 earnings ($231, p < 0.05), whereas both survey outcomes lead to the conclusion that HPOG did not produce a detectable increase in average earnings (a nonsignificant $142 for earnings based on the weekly report and a nonsignificant $99 for earnings based on the monthly report). Although the difference in impacts is not statistically significant, the administrative data lead to the conclusion that HPOG modestly increases earnings, whereas the survey data lead to the conclusion that HPOG does not.

This finding differs from the previous literature, which generally finds survey-based impacts on earnings to be at least as large as those based on administrative data (for example, Barnow and Greenberg 2015, 2019; Kornfeld and Bloom 1999; Mastri, Rotz, and Hanno 2018; Moore, Perez-Johnson, and Santillano 2018; Schochet, Burghardt, and McConnell 2008).

The impact on the survey measure derived from weekly earnings has a much larger standard error than either of the other measures. To the extent that the NDNH finding can be attributed to a less noisy outcome, the NDNH impact estimate might be the preferred measure of total earnings. However, it is also possible that the survey-based impact estimate might better measure the concept of earnings, perhaps in part because it captures self-employment and informal employment. As we saw in Table 9, self-employment in the healthcare sector is lower than in other occupations. Because the HPOG Program increased employment in the healthcare sector (Peck et al. 2018), this could imply that the control group might be more likely to be self-employed. In this instance, the control group’s NDNH-based earnings are relatively lower than their survey-based earnings, thereby implying that the NDNH impact might be expected to be slightly larger than the survey-based impact (where control group earnings would be higher and, in this instance, closer to the earnings of the treatment group). Self-employment offers one possible explanation for the findings that HPOG has an earnings impact according to NDNH data but not according to the survey data.

The third panel of Table 10 reports the results for employment. A distinctive contribution of the employment analysis is its ability to permit us to statistically test whether the administrative-based versus survey-based impacts differ from each other. We can do this because we can link the employment data (but not earnings data) from the two sources.

The results do not find evidence of a difference in impact between NDNH-reported and survey-reported employment, although there is a statistically significant difference in levels: the employment rate is 4.4 percentage points lower in the control group and 3.3 percentage points lower in the treatment group according to the survey than observed in NDNH data. Both of these differences in levels are statistically significant. This difference in levels results in a one percentage point difference in the impacts reported from the two sources, which is not statistically significant. This result is consistent with the JTPA findings reported by Kornfeld and Bloom (1999) that the impact on survey-reported employment was slightly larger but substantively the same as the impact on UI-measured employment.

The survey measure of employment indicates whether respondents were working when they answered the survey, whereas the NDNH measure of employment indicates whether respondents received any wages in the fifth quarter after random assignment. Workers who received wages at some point during that quarter but were not working when they responded to the survey would be employed according to the NDNH measure and not employed according to the survey measure. Because these two outcomes measure different concepts (point-in-time employment versus any employment during the quarter), it is not surprising that we observe different levels on these outcomes.

V. Discussion and Conclusion

This article provides a case study that examines differences in employment and earnings outcomes and impacts between administrative and survey data. Unlike previous research, this analysis considers three kinds of administrative data (single-state UI, national UI, and full NDNH) and two measures of survey earnings (one calculated from weekly hours and wage and a second from earnings received in the previous month). We develop a conceptual framework for our case study and analysis, separately considering three possible explanations for differences between data sources: (i) survey nonresponse bias, (ii) concept misalignment, and (iii) measurement error.

Focusing on these potential sources of differences, we find that survey nonresponse does not affect levels or impacts on outcomes derived from responses to the HPOG participant follow-up survey.

Concept misalignment is a potential explanation for why earnings levels and impact estimates might differ between administrative and survey data sources. In the survey, earnings levels differ substantially based on how respondents are asked to report their earnings—either weekly or monthly, before-tax or after-tax. In the administrative data, both employment and earnings levels differ based on the inclusion or exclusion of out-of-state or federal earnings. Despite these differences in levels, we find little evidence that concept misalignment matters for impacts. One exception is that the relative impact on quarterly earnings did differ between single-state UI and national UI (see Table 6), suggesting that excluding out-of-state employment could bias the relative impact.

For the earnings outcomes, we find statistically significant positive impacts in administrative data and no detectable impacts in survey data. Although the difference in impacts between survey and administrative impacts is not statistically significant, the observed pattern runs counter to previous research, which generally finds larger earnings impacts from survey data than from administrative data (for example, Barnow and Greenberg 2015, 2019). There are differences in the level of earnings between the two survey outcomes, likely due to measurement and construct differences. The monthly income survey question did not specify whether pre- or post-tax income should be reported. The NDNH includes only UI-covered earnings (vs. all earnings), is pre-tax (vs. post-tax), and captures quarterly earnings (vs. weekly or monthly).

We find that employment impacts do not differ statistically significantly between administrative and survey data sources. However, this may be due to the magnitudes of the impacts themselves. Given that impacts are not detected for any data source, we cannot rule out the possibility that there are changes in impacts proportional to changes in levels and that these changes are undetectable. Another possibility is that employment—as a binary measure—has fewer potential sources of variability and is therefore less likely to differ between administrative and survey data sources. Although most related scholarship does not analyze employment, our results are consistent with JTPA findings reported by Kornfeld and Bloom (1999).

Although we did not formally assess the incidence of measurement error, it is worth considering the nature of measurement error in each data source, even though the extent of error is unknowable. Measurement error in survey measures may arise from recall problems, social desirability bias, or misinterpretation of the wording of survey items. Because the person making the error is the same person whose outcomes we are seeking to measure, it is plausible that there is a systematic relationship between measurement errors and outcomes. Even if there is no systematic relationship between measurement error and treatment group, the magnitude of the error will affect the residual variance, standard errors, and significance tests. In contrast, measurement error in administrative data is more likely due to data entry or coding errors, which are most likely unrelated to outcomes or treatment status. This suggests that measurement error in surveys may result in differences in impacts, which affect the conclusions drawn from the evaluation, whereas measurement error in administrative data is most likely to result in differences in levels, which do not affect tests of statistical significance.

Our findings suggest that administrative data on earnings and employment offer valid estimates of both levels and impacts at a much lower cost than survey data. However, survey data offer insights that are not available in administrative data. The key advantage of survey data collection is that the evaluator can ask the specific questions they want answered. Although it may be possible to obtain administrative data from evaluation partners on outcomes beyond earnings and employment, other measures may not be available from administrative sources. Survey data are often the best way to understand what services control group members accessed outside the program, and the comparison of services received by treatment and control group members is important for interpretation of impact findings. Other outcome measures of interest, such as measures of psychological well-being, are not typically available from administrative sources.

Cost is the major drawback of survey data. Collecting survey data is expensive to do well. The HPOG participant follow-up survey included both telephone and in-person follow-up and attempted to contact more than 10,000 study participants across 19 states. Fielding a survey of this magnitude costs several hundred dollars per completed survey, and costs vary based on the complexity and length of the survey, accuracy of contact information, and response rate goal. Difficulty obtaining a high response rates increases over time, though participant engagement techniques, such as quarterly mailings or birthday cards to request updated contact information, can help.

In contrast, the cost of administrative data from either state unemployment systems or from the NDNH is low, primarily consisting of the labor cost to the research team. For this evaluation, the direct costs of obtaining NDNH data were covered by the federal agency sponsoring the evaluation. As an illustrative example of obtaining UI data from a single state, UI data from the New York State Department of Labor costs $1,000 for the initial setup and an additional $80–$100 per hour required to fulfill the request. The low cost, the ease of longer-term follow-up, and the availability of data on the full sample make NDNH data or data from state UI systems a very attractive option.

If possible given the evaluation budget, obtaining both survey and administrative data for a short-term follow-up is best practice. This approach allows researchers the opportunity to use survey data to explore the limitations of administrative data by asking questions about self-employment and to use administrative data to explore the limitations of survey data by comparing outcomes for survey respondents and nonrespondents. These comparisons can inform decisions regarding whether administrative data are sufficient for longer-term follow-up.

If the evaluation budget precludes conducting a survey, then administrative data on earnings and employment would have to be sufficient. In that circumstance, our results show that NDNH data more fully capture employment and earnings than national UI data, which in turn more fully captures employment and earnings than single-state UI data. Our sensitivity analyses found larger differences between single-state UI and national UI data when we restricted attention to states with high levels of out-of-state employment. Evaluations that cannot obtain NDNH data might consider obtaining UI data from multiple states, particularly if the program is located in an area where employment across state lines is common or where mobility is high. If an evaluation is able to obtain only UI data from a single state, we recommend providing contextual information about levels of cross-state employment and mobility and explicitly interpreting the outcomes as employment in that state or earnings in that state.

Acknowledgments

Authors are equal contributors listed in alphabetical order. The authors are grateful for input from Larry Buron, Alan Werner, and Jacob Klerman (Abt Associates), and Nicole Constance (U.S. Department of Health and Human Services, Administration for Children and Families, Office of Planning, Research, and Evaluation). They also gratefully acknowledge editorial assistance from Suzanne Erfurth and Bry Pollack (Abt Associates). This article was funded by the U.S. Department of Health and Human Services, Administration for Children and Families, Office of Planning, Research, and Evaluation, Contract HHSP23320095624WC, Task Order HHSP23337019T. The views expressed in this article do not necessarily reflect the views or policies of the funder. This article uses a combination of data collected by the HPOG 1.0 Impact Study and confidential data from the National Directory of New Hires maintained by the U.S. Department of Health and Human Services Office of Child Support Enforcement. Survey data from the HPOG 1.0 Impact Study are available online: https://doi.org/10.3886/ICPSR37290. National Directory of New Hires data can be requested from the Office of Child Support Enforcement (https://www.acf.hhs.gov/css/resource/a-guide-to-the-national-directory-of-new-hires). The authors are willing to assist (Eleanor Harvill, eleanor_harvill{at}abtassoc.com.

Footnotes

  • JEL Codes: I Health, Education, and Welfare; I3 Welfare, Well-Being, and Poverty; J Labor and Demographic Economics; J3 Wages, Compensation, and Labor Cost

  • ↵1. The Health Profession Opportunity Grants Program was authorized by Congress in 2010 to support education and training in the healthcare field for adults with low incomes. In 2010, ACF awarded the first round of HPOG grants to 32 grantees. A second round of funding was awarded to 32 organizations in 2015. As a result, the first round of funding became known as HPOG 1.0 and the second round known as HPOG 2.0. This article focuses only on HPOG 1.0.

  • ↵2. The study’s Short-Term Impacts Report (Peck et al. 2018) found that HPOG increased occupational training and receipt of academic support, career support, and other services. The treatment group experienced more favorable outcomes than the control group in terms of educational progress, employment in the healthcare sector, and earnings. In a supplemental analysis to the main report, the study also examined the extent to which the outcome levels and associated impacts differ for employment and earnings from administrative versus survey data sources (Harvill et al. 2018). This article extends that analysis.

  • ↵3. The full HPOG Impact Study sample includes some programs that were separately evaluated as part of the Pathways for Advancing Careers and Education project, which fielded a slightly different follow-up survey.

  • ↵4. The HPOG 1.0 Impact Study was funded by the Office of Planning, Research, and Evaluation (OPRE), within the Administration for Children and Families of the U.S. Department of Health and Human Services. OPRE facilitated access to the NDNH data through an application process and data use agreement with OCSE (for application instructions, see OCSE 2019).

  • ↵5. The overall response rate was 76.2 percent, and it was 72.3 percent for the control group and 78.4 percent for the treatment group.

  • ↵6. Data from the 15-month survey are available as restricted-use data files from the Inter-university Consortium for Political and Social Research (ICPSR) Child and Family Data Archive: https://doi.org/10.3886/ICPSR37290.v5. The data use agreement between OPRE and OCSE did not permit NDNH data to be included in the restricted-use data file.

  • ↵7. Our conceptual framework also includes “reporting period” as a potential source of concept misalignment. In an evaluation where sample intake spans some period of time, outcomes based on UI earnings are typically defined relative to the point of random assignment (for example, earnings in the fifth post-randomization quarter). However, survey earnings are measured at the time of survey follow-up, which does not necessarily align with the UI earnings quarter. For example, in the HPOG 1.0 Impact Study, administrative data-based earnings were pegged to the fifth quarter after random assignment, whereas the typical survey response was 18 months after random assignment, and ranged from 15 to 27 months after. We mention this here but do not explicitly analyze the data to ascertain whether reporting period influences outcome levels or impact results.

  • ↵8. Differences between pre- and post-tax earnings become more important if any sample members are active duty military personnel and therefore not subject to federal income tax (Martorell, Klerman, and Loughran 2008).

  • ↵9. Although BLS (2018) describes a different data source (the Quarterly Census of Employment and Wages), the NDNH and the QCEW draw from the same original data sources and have the same sample coverage and exclusions.

  • ↵10. The authors of this report had access to de-identified NDNH data under contract with the U.S. Department of Health and Human Services, Administration for Children and Families, Office of Planning, Research, and Evaluation.

  • ↵11. In addition to reporting interval, reporting period, and gross versus net earnings, administrative and survey data treat having multiple jobs differently. UI data include earnings from all covered jobs held during a quarter, and earnings outcomes typically aggregate across all of these jobs. However, surveys sometimes ask respondents to report earnings from their main job, and exclude earnings from secondary jobs. In the HPOG survey, participants were asked to report hours worked across all jobs, and hourly earnings from their main job. This article does not explicitly examine whether the treatment of multiple jobs is a type of misalignment that might influence the program’s estimated impacts from administrative versus survey sources.

  • ↵12. Outcome is based on response to the question “About how much did you receive in job earnings in [month prior to interview]?”

  • ↵13. Outcome is based on response to the questions “How many hours per week on average are you currently working?” and “About how much do you typically earn per hour before taxes in your current job? Answer for your main job if more than one.” Respondents could also report earnings for some other interval (per day, week, biweekly) rather than per hour, which were converted into hourly earnings.

  • ↵14. These assumptions yield accurate individual-level earnings data. We can obtain an unbiased estimate of the mean so long as the observed mean is representative of the mean in the unobserved weeks/months of the quarter.

  • ↵15. Baseline covariates include number of dependent children, race/ethnicity, educational attainment, receipt of WIC and/or SNAP, born outside the United States, and earnings in the year prior to random assignment. Because of data protection issues, we cannot combine survey measures of earnings with individual-level NDNH data; consequently, we do not include NDNH measures of pre-intervention earnings as covariates in the analyses of survey-based earnings.

  • ↵16. This model differs from the most common specification of earnings equations, the Mincer equation, which specifies the log of earnings as a function of years of schooling (s) and experience (x) as log[Y(s, x)] = α + ρss + β0x + β1x2 + ε, where E(ε | x, s) = 0 (see Heckman, Lochner, and Todd 2006). This article, however, is not focused on estimating the returns to an additional year of schooling or to experience. Rather, we are interested in the extent to which the HPOG Program increases expected earnings of those randomly offered access. This regression model is therefore designed to capture the mean difference between the groups.

  • ↵17. The analysis in Peck et al. (2018) uses a three-level model to estimate impacts that generalize to a larger super-population of HPOG programs, given that not all HPOG programs participated in the Impact Study. In comparison, this article estimates the impacts at the individual level rather than averaging up to the program level for two reasons: (i) it removes the assumed randomness in the program’s impact and can be estimated with weaker assumptions, and (ii) this article considers the trade-offs between data sources and does not intend to generalize to HPOG programs. We cluster the standard errors at the program level because this allows us to test explicitly for differences across survey and NDNH measures of earnings. Results from analyses with individual-level standard errors are summarized in the notes of the tables presenting findings, and full results of these sensitivity analyses are available upon request.

  • ↵18. Our agreement with the HPOG programs prevents us from summarizing findings at the program level. However, in another article, we relate impact variation across HPOG programs to variation in program characteristics (Walton, Harvill, and Peck 2019).

  • ↵19. To construct nonresponse weights, we constructed an indicator for survey response equal to one for individuals who answered the question on current employment and zero for individuals who did not respond to the survey at all or did not indicate whether or not they were currently employed. (This survey question determined whether the individual was asked the set of questions of interest to this analysis.) We ran separate logistic regressions for the treatment and control groups, with the response indicator as the dependent variable and our baseline covariates as the independent variables, and used these estimates to predict the probability of nonresponse. Because using predicted probabilities can create individuals with very large weights, we follow common practice and stratify the sample into bins based on their estimated probability of response. We generate the weight for the sample as the inverse of the rate of survey response within each stratum. Stratification and creation of survey response weights was conducted separately for the treatment and control groups.

  • ↵20. We used the Stata command bsample to draw each bootstrap sample at the program level, the command simulate to repeatedly draw bootstrap samples, and the command bstat to report bootstrap results. To select the number of iterations, we compared clustered standard errors produced by ordinary least squares to our calculated standard errors.

  • ↵21. We present regression-adjusted treatment group means, constructed from the observed control group mean and the estimated impact. Relative impacts express the size of the impact relative to the control group mean and are calculated as the estimated impact divided by the control group mean.

  • ↵22. We observe NDNH outcomes for 98 percent of the individuals randomly assigned (10,370 out of 10,617). NDNH data are missing when the process used by NDNH to validate the name and social security number produces an error, most likely due to data entry errors on study intake, or name changes.

  • ↵23. The unweighted analysis of survey respondents yields a treatment group mean that is not statistically significantly different than the one from the analysis of the full sample. The magnitude of the difference, $58, is substantively small.

  • ↵24. The study’s Three-Year Follow-Up Analysis Plan conducted sensitivity analyses of alternative weighting models. It found that including or excluding pre-enrollment earnings and employment measures in weight construction had a negligible effect on estimated impacts (Litwok et al. 2018). We conclude that nonresponse is not strongly associated with observable baseline characteristics. Moreover, given that the earnings and employment levels for the unweighted survey sample are quite similar to those for the full sample, it appears that nonresponse is not very selective for earnings and employment outcomes.

  • ↵25. The wording of the question is, “How many hours per week on average are you currently working? Include all jobs if you have more than one job.”

  • ↵26. The $41 difference between impacts for single-state UI and national UI is not statistically significant (p-value = 0.105).

  • ↵27. Because we have restricted the sample to include only respondents living in households earning less than 250 percent of the federal poverty line, lower-paying occupations are likely to be more heavily represented here than they are in the broader population.

  • ↵28. The distribution of employment from the Current Population Survey is based on the respondents’ primary occupation. If respondents work as independent contractors in addition to their primary employment (for example, driving for Uber on the weekends), then those earnings will be missing from NDNH wage data.

  • Received January 2020.
  • Accepted September 2022.

This open access article is distributed under the terms of the CC-BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) and is freely available online at: https://jhr.uwpress.org.

References

  1. ↵
    Abt Associates and Laura Peck. 2022. “Evaluation of the First Round of Health Profession Opportunity Grants (HPOG 1.0), United States, 2010–2020.” ICPSR. https://doi.org/10.3886/ICPSR37290.v6
  2. ↵
    1. Barnow, Burt S., and
    2. David Greenberg
    . 2015. “Do Estimated Impacts on Earnings Depend on the Source of the Data Used to Measure Them? Evidence from Previous Social Experiments.” Evaluation Review 39(2):179–228.
    OpenUrlCrossRefPubMed
  3. ↵
    1. Barnow, Burt S., and
    2. David Greenberg
    . 2019. “Special Issue: Survey Data Versus Administrative Data for Estimating the Effects of Social Programs Editors’ Essay.” Evaluation Review 42(5–6):231–65.
    OpenUrl
  4. ↵
    1. Bollinger, Christopher R., and
    2. Barry T. Hirsch
    . 2013. “Is Earnings Nonresponse Ignorable?” Review of Economics and Statistics 95(2):407–16.
    OpenUrlCrossRef
  5. ↵
    1. Bound, John,
    2. Charles Brown,
    3. Greg Duncan, and
    4. Willard Rodgers
    . 1989. “Measurement Error in Cross-sectional and Longitudinal Labor Market Data: Results from Two Validation Studies.” NBER Working Paper 2884. Cambridge, MA: NBER.
  6. ↵
    1. Bound, John,
    2. Charles Brown,
    3. Greg Duncan, and
    4. Willard Rodgers
    . 1994. “Evidence on the Validity of Cross-Sectional and Longitudinal Labor Market Data.” Journal of Labor Economics 12(3):345–68.
    OpenUrlCrossRef
    1. Bound, John,
    2. Charles Brown, and
    3. Nancy Mathiowetz
    . 2001. “Measurement Error in Survey Data.” In Handbook of Econometrics, Volume 5, ed. James J. Heckman and Edward Leamer, 3705–843. New York: Elsevier.
    OpenUrlCrossRef
  7. ↵
    Bureau of Labor Statistics (BLS). 2018. “Employment and Wages Online Annual Averages 2017.” https://www.bls.gov/cew/publications/employment-and-wages-annual-averages/2017/home.htm (accessed November 4, 2024).
    1. Flood, Sarah,
    2. Miriam King,
    3. Renae Rodgers,
    4. Steven Ruggles and
    5. J. Robert Warren
    . 2018. Integrated Public Use Microdata Series, Current Population Survey: Version 6.0 [data set]. Minneapolis, MN: IPUMS. https://dx.doi.org/10.18128/D030.V6.0
  8. ↵
    1. Groves, Robert M.
    2006. “Nonresponse Rates and Nonresponse Bias in Household Surveys.” Public Opinion Quarterly 70(5):646–75.
    OpenUrlCrossRef
  9. ↵
    1. Hariri, Jacob Gerner, and
    2. David Dreyer Lassen
    . 2017. “Income and Outcomes: Social Desirability Bias Distorts Measurements of the Relationship Between Income and Political Behavior.” Public Opinion Quarterly 81:564–76.
    OpenUrlCrossRef
  10. ↵
    1. Harvill, Eleanor,
    2. Daniel Litwok,
    3. Shawn Moulton,
    4. Alyssa Rulf Fountain, and
    5. Laura R. Peck
    . 2018. “Technical Supplement to the Health Profession Opportunity Grants (HPOG) Impact Study Interim Report: Report Appendices.” OPRE Report 2018-16b. Washington, DC: Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services.
  11. ↵
    1. Heckman, J.J.,
    2. L.J. Lochner, and
    3. P.E. Todd
    . 2006. “Earnings Equations, Rates of Return and Treatment Effects: The Mincer Equation and Beyond.” In Handbook of the Economics of Education, ed. Eric A. Hanushek and Finis Welch, 307–458. Amsterdam: Elsevier.
    1. Hotz,
    2. V. Joseph, and
    3. John Karl Scholz
    . 2002. “Measuring Employment Income for Low-Income Populations with Administrative and Survey Data.” In Studies of Welfare Populations: Data Collection and Research Issues, ed. Maureen Ver Ploeg, Robert A. Moffit, and Constance F. Citro, 275–315. Washington, DC: National Academy Press.
  12. Internal Revenue Service (IRS). 2014a. “Form W-4.” Washington, DC: Internal Revenue Service. https://www.irs.gov/pub/irs-prior/fw4–2015.pdf (accessed November 4, 2024).
  13. Internal Revenue Service (IRS).. 2014b. “Employer’s Supplemental Tax Guide (Supplement to Publication 15 (Circular E), Employer’s Tax Guide).” Publication 15-A Cat. no. 21453T. Washington, DC: Internal Revenue Service. https://www.irs.gov/pub/irs-prior/p15a--2015.pdf (accessed November 4, 2024).
  14. ↵
    1. Kornfeld, Robert, and
    2. Howard S. Bloom
    . 1999. “Measuring Program Impacts on Earnings and Employment: Do Unemployment Insurance Wage Reports from Employers Agree with Surveys of Individuals?” Journal of Labor Economics 17(1):168–97.
    OpenUrlCrossRef
  15. ↵
    1. Litwok, Daniel,
    2. Douglas Walton,
    3. Laura R. Peck, and
    4. Eleanor Harvill
    . 2018. “Health Profession Opportunity Grants (HPOG) Impact Study’s Three-Year Follow-Up Analysis Plan.” OPRE Report 2018-124. Washington, DC: Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services. https://www.acf.hhs.gov/opre/report/health-profession-opportunity-grants-hpog-impact-studys-three-year-follow-analysis-plan (accessed Noember 4, 2024).
    1. Martorell,
    2. Francisco,
    3. Jacob A. Klerman, and
    4. David S. Loughran
    . 2008. “How Do Earnings Change When Reservists Are Activated? A Reconciliation of Estimates Derived from Survey and Administrative Data.” Santa Monica, CA: RAND Corporation. https://www.rand.org/pubs/technical_reports/TR565.html (accessed November 4, 2024).
  16. ↵
    1. Mastri, Annalisa,
    2. Dana Rotz, and
    3. Elias S. Hanno
    . 2018. “Comparing Job Training Impact Estimates Using Survey and Administrative Data.” Washington, DC: Mathematica Policy Research. https://www.dol.gov/sites/dolgov/files/OASP/legacy/files/WIA-comparing-impacts.pdf (accessed November 4, 2024).
  17. ↵
    1. Moore, Quinn,
    2. Irma Perez-Johnson, and
    3. Robert Santillano
    . 2018. “Decomposing Differences in Impacts on Survey- and Administrative-Measured Earnings from a Job Training Voucher Experiment.” Evaluation Review 42(5–6):515–49.
    OpenUrlCrossRefPubMed
  18. ↵
    Office of Child Support Enforcement (OCSE). 2019. “NDNH Guide for Data Submission.” Washington, DC: Office of Child Support Enforcement, Administration for Children and Families, U.S. Department of Health and Human Services. https://www.acf.hhs.gov/sites/default/files/documents/ocse/ndnh_guide_for_data_submission.pdf (accessed November 4, 2024).
  19. ↵
    1. Peck, Laura R.,
    2. Alan Werner,
    3. Eleanor Harvill,
    4. Daniel Litwok,
    5. Shawn Moulton,
    6. Alyssa Rulf Fountain, and
    7. Gretchen Locke
    . 2018. “Health Profession Opportunity Grants (HPOG 1.0) Impact Study Interim Report: Program Implementation and Short-Term Impacts.” OPRE Report 2018-16a. Washington, DC: Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services. https://www.acf.hhs.gov/opre/resource/health-profession-opportunity-grants-hpog-10-impact-study-interim-report-implementation-short-term-impacts (accessed November 4, 2024).
  20. ↵
    1. Puma, Michael J.,
    2. Robert B. Olsen,
    3. Stephen H. Bell, and
    4. Cristofer Price
    . 2009. “What to Do When Data Are Missing in Group Randomized Controlled Trials.” NCEE 2009-0049. Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. https://files.eric.ed.gov/fulltext/ED511781.pdf (accessed November 4, 2024).
  21. ↵
    1. Riphahn, Regina T., and
    2. Oliver Serfling
    . 2002. “Item Non-Response on Income and Wealth Questions.” IZA Discussion Paper 573. Bonn, Germany: IZA. https://ideas.repec.org/p/iza/izadps/dp573.html (accessed November 4, 2024).
  22. ↵
    1. Schochet, Peter Z.,
    2. John Burghardt, and
    3. Sheena McConnell
    . 2008. “Does Job Corps Work? Impact Findings from the National Job Corps Study.” American Economic Review 98(5):1864–86.
    OpenUrlCrossRef
  23. ↵
    1. Schochet, Peter Z.,
    2. Sheena McConnell, and
    3. John Burghardt
    . 2003. National Job Corps Study: Findings Using Administrative Earnings Records Data. Princeton, NJ: Mathematica Policy Research, Inc.
  24. ↵
    1. Smith, Jeffrey.
    1997. “Measuring Earnings Levels Among the Poor: Evidence from Two Samples of JTPA Eligibles.” Unpublished. London, ON, Canada: Department of Economics, University of Western Ontario.
  25. ↵
    1. Stevens, David W.
    2007. “Longitudinal Employer-Household Dynamics: Employment That Is Not Covered By State Unemployment Insurance Law.” Technical Paper Tp-2007-04. Suitland, MD: U.S. Census Bureau. https://www2.census.gov/ces/tp/tp-2007-04.pdf (accessed November 4, 2024).
  26. ↵
    1. Wallace, Geoffrey L., and
    2. Robert Haveman
    . 2007. “The Implications of Differences Between Employer and Worker Employment/Earnings Reports for Policy Evaluation.” Journal of Policy Analysis and Management 26(4):737–53.
    OpenUrlCrossRef
    1. Walton,
    2. Douglas,
    3. Eleanor L. Harvill, and
    4. Laura R. Peck
    . 2019. “Which Program Characteristics Are Linked to Program Impacts? Lessons from the HPOG 1.0 Evaluation.” OPRE Report 2019-51, Washington, DC: Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services. https://www.acf.hhs.gov/opre/resource/which-program-characteristics-are-linked-to-program-impacts-lessons-from-the-hpog-10-evaluation (accessed November 4, 2024).
PreviousNext
Back to top

In this issue

Journal of Human Resources: 60 (3)
Journal of Human Resources
Vol. 60, Issue 3
1 May 2025
  • Table of Contents
  • Table of Contents (PDF)
  • Index by author
  • Front Matter (PDF)
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Journal of Human Resources.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Do Administrative and Survey Data Tell the Same Impact Story?
(Your Name) has sent you a message from Journal of Human Resources
(Your Name) thought you would like to see the Journal of Human Resources web site.
Citation Tools
Do Administrative and Survey Data Tell the Same Impact Story?
Eleanor L. Harvill, Laura R. Peck, Douglas Walton
Journal of Human Resources May 2025, 60 (3) 1019-1053; DOI: 10.3368/jhr.0120-10673R2

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Do Administrative and Survey Data Tell the Same Impact Story?
Eleanor L. Harvill, Laura R. Peck, Douglas Walton
Journal of Human Resources May 2025, 60 (3) 1019-1053; DOI: 10.3368/jhr.0120-10673R2
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One
Bookmark this article

Jump to section

  • Article
    • Abstract
    • I. Introduction
    • II. Framework for Analysis: Factors Explaining Differences Between Results Obtained from Different Data Sources
    • III. Methodology
    • IV. Results
    • V. Discussion and Conclusion
    • Acknowledgments
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • References
  • PDF

Related Articles

  • No related articles found.
  • Google Scholar

Cited By...

  • No citing articles found.
  • Google Scholar

More in this TOC Section

  • Heterogeneous Returns to Active Labour Market Programs for Indigenous Populations
  • Leadership & Gender Composition in Managerial Positions
  • The Impact of Paid Family Leave on Families with Health Shocks
Show more Articles

Similar Articles

Keywords

  • I3
  • J3
UW Press logo

© 2025 Board of Regents of the University of Wisconsin System

Powered by HighWire