ABSTRACT
Accurately measuring government benefit receipt in household surveys is necessary when studying disadvantaged populations and welfare programs. The Food Stamp Program is especially important given its size and recent growth. To validate survey reports, we link administrative data on participation in two states to three key household surveys. We find that between 23 and 50 percent of true food stamp recipient households do not report receipt. A substantial number of true nonrecipients are also recorded as recipients. We examine reasons for these errors, including imputation, an important source of error. Error rates vary with household characteristics, implying complicated biases in multivariate analyses, such as regressions. We directly examine biases in common survey-based estimates of program receipt by comparing them to estimates from our linked data. We find that the survey estimates understate participation among single parents, nonwhites, and low-income households and also lead to errors in multiple program receipt and time and age patterns of receipt.
I. Introduction
Accurately measuring government benefit receipt in household surveys is important to assess the economic circumstances of disadvantaged populations, program take-up, the distributional effects of government programs, and other program effects. The Food Stamp Program (or FSP, now the Supplemental Nutrition Assistance Program, or SNAP) is especially important given its large and growing size and findings of its effects on health, labor supply, food security, consumption, and other outcomes.1 Recognizing that surveys may have errors, this study examines the misreporting of Food Stamp Program receipt using a new linkage of administrative microdata from two states to three major survey data sets. We argue that our administrative measure of program receipt is sufficiently accurate to study error at the household level, allowing us to examine rates of misreporting and imputation error. We study how survey errors vary with household characteristics to assess their likely determinants and consequences. Program participation is often used as either a dependent or independent variable in multivariate models, such as regression analyses.2 We examine how survey error affects such estimates of program receipt, which are commonly obtained from the error-prone survey data we evaluate.
There is growing evidence that program receipt is badly reported in household surveys. The most extensive and frequently cited evidence compares weighted totals of dollars or recipients in household surveys to analogous figures provided by government agencies. The most comprehensive research of this form in terms of programs and surveys covered is Meyer, Mok, and Sullivan (2015a,b), which references many earlier studies.3 It finds net underreporting of program receipt that is substantial, widespread, and steadily growing over time. A common criticism of these aggregate studies is that they identify net underreporting only, so they may understate errors by missing false negative reports (failures to report true receipt) that are cancelled out by false positive reports (incorrect reports of receipt). The results are also potentially biased by frame, nonresponse, and weighting errors. Furthermore, such aggregate studies have a limited ability to examine how survey error varies with interview and respondent characteristics. This limitation hinders their ability to study the determinants of errors, their consequences for substantive studies, and potential corrections.
Linking surveys to administrative microdata provides a potential solution to these limitations. By comparing survey values to true values, linked validation data can allow us to uncover the extent of error, its causes, and consequences. Unfortunately, surveys of the literature have noted that there exist very few “complete record check” studies that use validation data for the entire population. Such studies are needed to assess false positive reports and thus net reporting of receipt (Bound, Brown, and Mathiowetz 2001). The few studies that do complete record checks tend to suffer from small sample sizes and often rely on a single state or survey. In addition, they are rarely able to analyze or correct possible biases that might result from linkage problems.
In this study, we link administrative data on food stamp receipt from two states to three of the most important economic surveys: the Current Population Survey (CPS), the American Community Survey (ACS), and the Survey of Income and Program Participation (SIPP). The CPS is the most used labor economics survey and the source of official income and poverty statistics. The ACS replaced the Census long-form data and is the largest general household survey, allowing fine geographic analyses. The SIPP is the most detailed survey of program receipt and commonly thought to have the highest-quality data. The Social Security Numbers on the food stamp records that we use have been verified (compared to SSA records) as a necessary condition for receipt of benefits, so the accuracy of the linkage is very high. We discuss likely remaining biases due to linkage error.
Our recent related papers use linked data to analyze how survey error biases specific estimates, such as the estimated poverty rate (Meyer and Mittag 2019a) and mean dollar transfers (Meyer and Mittag 2021a), as well as how linked data can be used to improve the accuracy of these estimates in the presence of survey error (Mittag 2019; Davern, Meyer, and Mittag 2019). The exceptional data accuracy and sample size in this study allow us to go beyond aggregate statistics and study survey error at the household level. In contrast to prior papers, this accuracy allows us to analyze errors in whether a household reports program receipt (both over- and underreporting) and what predicts these errors. We then return to aggregate survey estimates by examining how errors at the household level skew estimates of the determinants of program receipt. Some recent and ongoing work already builds on our findings in this study. Meyer and Mittag (2017) derives theoretical results to better understand the patterns of bias in multivariate models we find here and evaluates bias corrections. In Meyer and Mittag (2019b), we directly extend this study by examining two aspects of survey errors—how they vary across interviewers and geography—that we cannot examine with the data we use here. In Celhay, Meyer, and Mittag (2021), we use a larger linked sample from New York to extend the analyses of misreporting in this paper to imputation error and the reporting of multiple programs. Celhay, Meyer, and Mittag (2022) uses the same data to analyze potential causes of the reporting errors we document here. See Meyer and Mittag (2021b) for an overview that puts these studies in the context of the broader literature.
We find substantial underreporting of food stamp receipt, with a quarter to half of true recipient households not recorded as such, depending on the survey. A much smaller share of nonrecipients are recorded as receiving food stamps. Since most households are nonrecipients, these false positives can nonetheless have a substantial effect on net reporting. A large share of these false positives, though not a majority, are imputed observations. The large differences across the three surveys in false negative and false positive rates suggest that survey design plays an important role in survey accuracy.
We show that both false negative and false positive reports are associated with a variety of household and interview characteristics, including income and race. From a methodological perspective, this shows that survey errors are not random. Thus, the errors also lead to complicated biases in multivariate analyses that are difficult to correct. In addition, instrumental variable methods will be inconsistent.4 Our evidence on the determinants of errors also sheds light on theories of misreporting. We briefly examine the role of comprehension, salience, recall, and stigma, which the literature has suggested as causes of misreporting. Since there are few situations where we have independent and accurate measures to evaluate survey quality, this evidence on program receipt should also aid the improvement of household surveys.
Finally, we examine how survey error affects studies of the use of government programs, a large literature that often relies on error-ridden self-reports of program receipt. For example, the surveys we examine were used in several recent studies of Food Stamp Program participation (Blank and Ruggles 1996; Gundersen and Oliveira 2001; Haider, Jacknowitz, and Schoeni 2003; Wu 2010; Ganong and Liebman 2018). Similar binary choice models with reported program participation as the dependent variable are also frequently estimated for other programs; see Bitler, Currie, Scholz (2003) for an example and Currie (2006) for an overview. These estimates likely suffer from bias due to underreporting as well. Models of program receipt are also used to increase take-up and better target programs to the most needy, an issue that has long concerned policymakers (see U.S. General Accounting Office 2004 for efforts to raise food stamp participation). However, as Bound, Brown, and Mathiowetz (2001) note, little work has examined the consequences of program receipt errors for substantive analyses.5 Despite the large literature on the distributional consequences of welfare and social insurance programs, few studies attempt to correct for misreporting.6
Using our data with both error-ridden and true measures of program receipt, we analyze the consequences of this nonclassical measurement error for a prototypical application: binary choice models with program receipt as the dependent variable. Such models are often used to study program take-up, typically showing that participation rates among eligibles are well below one.7 Given the extent of underreporting, a major part of what appears to be nonparticipation may actually be recipients whose receipt is not recorded in the survey. Our linked data indicate that the survey data understate participation by single parents, nonwhites, and the elderly, as well as the extent to which participation declines as incomes rise. Maybe surprisingly, we also find that the sign of the association of most variables, such as age, education, and family type, is correct.
In the next section, we review the literature on misreporting of government transfers. Section III describes our data sources and linkage. Section IV provides our main evidence on the extent of survey error and discusses likely biases from linkage errors. Section V analyzes how survey error varies with household characteristics. Section VI examines the bias from survey error and how it affects our understanding of program receipt. Section VII offers conclusions.
II. Misreporting in Survey Data
Several studies document significant misreporting of transfer program income in survey data. Bound, Brown, and Mathiowetz (2001) and Moore, Stinson, and Welniak (2000) provide reviews of the literature, so we focus our summary on their main conclusions and newer studies. We examine reporting of whether a program was received rather than the amount received. The evidence on reporting of amounts is scant, but there is some evidence that the main determinant of underreported dollars is whether receipt is reported at all (Moore, Marquis, and Bogen 1996; Moore, Stinson, and Welniak 2000; Meyer, Mok, and Sullivan 2015a).
Three main approaches are used to assess the validity of survey reports: comparisons of survey aggregates to administrative totals, partial validation studies, and full validation studies. Comparisons of estimated totals from survey reports to administrative totals show that the survey reports generally fall substantially short of actual program spending. See Meyer, Mok, and Sullivan (2015b) and the many earlier studies that they cite. The rate of net underreporting differs sharply across programs and surveys and has tended to rise over time. Comparing survey totals to official statistics points to severe data quality issues, but this approach leaves many important questions open. If weighting does not correct for undercoverage or nonresponse, the difference between survey and administrative totals estimates the combined bias from misreporting and other sources of survey error. Aggregate comparisons also cannot provide information on the extent to which false negatives are counterbalanced by false positive reports. In addition, they can only provide very limited information about the factors that are associated with survey error. Finally, aggregate data cannot be used to assess bias in applications using multivariate data or to devise and evaluate corrections for the bias in such analyses. Consequently, aggregate studies provide an important indicator of survey problems, but an accurate measure of receipt at the individual or household level is needed to determine the causes and consequences of survey error.
Linking survey and administrative data can provide this accurate measure to validate survey responses.8 Most early linkage studies are partial record check studies that only examine the survey response of known program recipients. Past food stamp validation studies have found substantial rates of false negative reports that differ considerably across studies. For example 20 percent of true recipients are not recorded as such in the 1984 SIPP (Marquis and Moore 1990), and 40 percent are not in the Maryland sample of the 2001 predecessor to the American Community Survey (Taeuber et al. 2004). There are large differences in the false negative rates across these studies. While these studies can provide evidence on false negative rates and characteristics associated with failure to report receipt, they cannot examine false positive reporting. Consequently, they only allow inference about net reporting rates under the assumption that the effect of false positives is negligible. Marquis et al. (1981) and Moore, Stinson, and Welniak (2000) review the findings of this literature. Both reviews document substantial false negative rates for many transfer programs but also argue that the literature overemphasized underreporting because that is what the existing partial record check studies are able to capture.
This line of argument leads both Moore, Stinson, and Welniak (2000) and Bound, Brown, and Mathiowetz (2001) to call for more complete record check studies that validate the reports of both recipients and nonrecipients to examine both types of error. To advance research in this way requires linking the survey to the universe of program recipients, so that not being included in the administrative data confirms not receiving the program. Unfortunately, such linked data are rarely available. If they are, they typically only cover a short time period and a small subset of the survey respondents, such as those from a single state. Yet complete record check studies provide important additional insights about survey error.
The few existing complete validation studies agree on the finding that false positive rates are much lower than false negative rates. However, there is some variation in the rates of false positives across studies. For the Food Stamp Program, false positive rates range from 0.3 percent in Bollinger and David (1997) to 2–3 percent in Moore, Marquis, and Bogen (1996). As there are far more nonrecipients than recipients, even such low rates of false positives lead to high error counts. Early complete record check studies point towards substantial counts of errors in both directions, leading to slight net overreporting. While this result challenges the notion that the net effect of misreporting is to understate total program receipt (Marquis et al. 1981), more recent validation studies tend to find net underreporting of food stamp receipt (Marquis and Moore 1990; Moore, Marquis, and Bogen 1996; Taeuber et al. 2004; Nicholas and Wiseman 2009, 2010; Kirlin and Wiseman 2014; Meyer and Mittag 2019a,b). Given that most studies focus on a single survey or state, it is unclear whether the differences between studies are due to state, survey, or other study-specific factors. Consequently, important questions remain on the sign of the net bias and the extent to which it depends on the type of survey.
Even in the most favorable case of small or no net survey error, the substantial error rates these studies find at the household level are likely to bias analyses of subpopulations and multivariate models, especially if errors are correlated with individual and household characteristics (see, for example, Meyer and Mittag 2019a; Nguimkeu, Denteh, and Tchernis 2019). It is common to assume that errors are independent of other variables in order to provide a simple summary measure of the degree of survey error (for example, Moore, Stinson, and Welniak 2000) or to correct the bias due to survey error (for example, Hausman, Abrevaya, and Scott-Morton 1998). In light of the importance of this assumption, it is surprising that few studies examine whether misreporting is indeed unrelated to other variables. Notable exceptions are Bollinger and David (1997, 2001, 2005), who reject this assumption by showing that reporting of food stamp receipt is related to income, gender, education, and household structure, as well as later survey attrition.
As Bound, Brown, and Mathiowetz (2001) point out in their review, there are only few analytic results on the consequences of such nonclassical measurement error, with the biases often being intractable and model-specific. The literature on identification and bias of treatment effects in the presence of misclassification of the treatment variable provides ample examples of the complex, but often severe consequences of nonclassical measurement error. Kreider (2010) provides an extreme example of the consequences in a specific case. Other recent papers on the estimation of treatment effect with misclassification examine the consequences of nonclassical measurement errors (for example, Millimet 2011; Almada, McCarthy, and Tchernis 2016), as well as partial identification (for example, Gundersen and Kreider 2009; Gundersen, Kreider, and Pepper 2012; Kreider et al. 2012; Jensen, Kreider, and Zhylyevskyy 2019) and point identification (Nguimkeu, Denteh, and Tchernis 2019).
In the absence of formal results, complete record check studies offer a unique opportunity to analyze the biases in specific cases by comparing models relying on a validated variable to those using a survey variable. For example, Bollinger and David (1997, 2001) and Meyer and Mittag (2017) use validation data to analyze the impact of survey error on multivariate models with a misclassified dependent variable. A few recent and ongoing studies use a similar approach to examine program receipt rates (Cerf Harris 2014; Meyer and Mittag 2019b), poverty statistics (Meyer and Mittag 2019a; Nicholas and Wiseman 2009, 2010), and program effects (Kang and Moffitt 2019). This validation data approach makes it possible to examine directly whether survey error can explain surprising empirical findings, such as the low take-up of government programs among the elderly (Haider, Jacknowitz, and Schoeni 2003) and among households in extreme poverty (Tiehen, Jolliffe, and Gundersen 2012).
High error rates also raise the question why people misreport in surveys. In their review of the literature, Sudman and Bradburn (1974) point out the lack of a general theory of reasons for survey errors. Along the same lines, Bound, Brown, and Mathiowetz (2001) note that few fundamental principles have been established in the literature. They divide reasons for misreporting into three areas: cognitive processes, social desirability, and essential survey conditions or survey design.
The cognitive process of answering a question involves comprehension of the question, recalling information from memory, and communicating the result. Cognitive factors may lead to misreporting because of, among others, difficulties understanding questions, difficulties recalling information, and information that is not salient (for a review, see Sudman, Bradburn, and Schwarz 1996). Much of the empirical literature focuses on recall and retrieval problems. The research provides some evidence that a longer recall period leads to more errors, but the evidence is mixed and far from conclusive. For example, Meyer and Mittag (2019b) provide evidence of recall effects for SNAP. Marquis and Moore (1990) find no effect of recall when analyzing the receipt of food stamps and other programs. Bound, Brown, and Mathiowetz (2001) suggest that rather than the mere passage of time, the complexity of the experience over time is related to misreporting. Thus, households with irregular or infrequent receipt should be more likely to fail to report. Complex patterns of receipt may also lead respondents to confuse government programs and fail to report a program they receive, while reporting receipt of another program. Recall periods could affect survey accuracy in more complex ways because respondents often misstate the timing of events. They tend to report events that occurred before or after the reference period as having happened in the reference period. Such “telescoping” of events can lead to both false negative and false positive errors. Another precept in the literature on how cognitive processes affect survey errors is that more salient events are more easily remembered. Sometimes it has been found that high salience can lead to overreporting.
Another important reason for misreporting is social desirability, which refers to a tendency of respondents to report socially desirable answers whether or not they are true. See Bound, Brown, and Mathiowetz (2001) for a comprehensive discussion. The economic literature focuses on the social stigma associated with dependence on government programs as a reason not to report receipt. This idea suggests underreporting among those with higher income and education for whom welfare receipt seems more out of place. We would also expect underreporting due to stigma to be more prevalent among those who may seem less needy, such as the elderly, two-parent families, and the childless. More generally, social desirability may affect respondent cooperativeness, which Bollinger and David (2001) emphasize as a determinant of accurate reporting.
Finally, features of the survey design, such as the survey mode and method, also affect the accuracy of survey data (see Groves 1989 for a review). Survey design may also affect the accuracy of the data in mechanical ways through the coding and editing process. Given the high rates of item nonresponse in some household surveys, the imputation methods employed by the survey can be another important source of error.
III. Data and Linkage
We examine three large and frequently used household surveys: the 2001 ACS,9 the 2002–2005 CPS, and data from January 2001 to April 2005 from the 2001 and 2004 panels of the SIPP. Our administrative records provide information on food stamp receipt for all recipients in Illinois and Maryland. The monthly records report program receipt, amounts (for some years), and Social Security Numbers (SSN). The source of the Illinois data is the Illinois Department of Human Services (IDHS) client database. From this database, Chapin Hall created the Illinois Longitudinal Public Assistance Research Database (ILPARD), a longitudinal database of public assistance cases. The ILPARD is updated monthly with new cases from the IDHS system and records that IDHS has changed in the past month. The Food Stamp Program records for Illinois contain monthly information on program utilization of all members of the household. The data supplied to the Census Bureau cover calendar years 1998–2004. The source of the Maryland data is the Client Automated Resource and Eligibility System (CARES) of the Maryland Department of Human Resources. The data provided to the Census Bureau cover the period 1998–2003 and include monthly information on all Maryland residents receiving food stamps during that period.
Analyzing two states raises the question whether the results generalize to the population overall. Table 1 compares our sample to the entire United States. Many demographic and economic characteristics are similar, but a larger share of our sample is Black and a smaller share of Hispanic origin. Our sample is more educated and less poor, so reported program receipt is lower than in the entire population. These differences are mainly driven by the Maryland sample. The net dollar reporting rates (total survey dollars divided by total dollars paid according to the Bureau of Economic Analysis) for Illinois and Maryland combined is 52 percent for 2001–2004, which is lower than in the entire United States (59 percent). Meyer and Mittag (2019b) use a different survey to analyze geographic variation in misreporting. They document some variation between states, but of a magnitude that would be unlikely to overturn our qualitative conclusions.
Reported Food Stamp Receipt and Demographics, 2002—2005 CPS
We link the survey and administrative data using the Protected Identification Key (PIK), which is an anonymized version of the SSN. PIKs are assigned to both the administrative data and the survey data separately. The administrative records contain SSN because an individual must have a validated SSN in order to receive food stamps (their name, gender, and date of birth must match SSA records). The food stamp data are subject to regular audits by the USDA. The validated SSN in the administrative data is converted to a PIK by the Census Bureau. A PIK is obtained for 96 percent of the Illinois food stamp records over the entire period and 98 percent of the Maryland records. To obtain PIKs for the survey data, the Census Bureau uses name, address, and date of birth from the sampling frame and survey records to match each individual in the survey to a PIK/SSN in a reference file that contains all transactions recorded against a SSN. See NORC (2011) and Wagner and Layne (2014) for further discussion. PIKs are assigned to individuals, allowing us to link the administrative records to the surveys at the individual level. We then aggregate the linked data to the household level.
The administrative records contain every individual on a program case, so we can link most households in which at least one member is assigned a PIK (see Section IV.C for further details). A PIK is successfully obtained for at least one member of 93 percent of ACS households in Illinois and 95 percent of ACS households in Maryland. The rates are considerably lower for CPS households. Prior to 2005, respondents were asked to supply their SSN in the CPS to allow linking, and a PIK was not determined for those who did not supply an SSN, reducing the share of households that can be linked. We have a PIK for at least one member of 68 percent of Illinois CPS households and 81 percent of Maryland CPS households. The PIK rate is similar in the SIPP, in which 71 percent of all households have a PIK. The rates are slightly lower for those who are likely food stamp recipients in all three surveys. For example, in the ACS the rates are 89 percent in Illinois and 92 percent in Maryland for households with income below twice the federal poverty line.
The main sample for our analyses consists of households with at least one household member who has been assigned a PIK. We examine what household characteristics are associated with a household being unable to be linked to a PIK. The results of probit models for whether a household has a PIK are reported in Online Appendix Table 1. In each survey, several observable characteristics predict whether a household has a PIK, so we can reject that PIKs are missing at random. Yet there are few variables that systematically predict having a PIK in all surveys. We multiply survey weights by the inverse of the predicted probability of a household having a PIK (Wooldridge 2007). The covariates used in that prediction can be seen in Online Appendix Table 1. We discuss how the linkage process can affect estimated error rates further at the end of Section IV.
In all three surveys, the sample for our analyses is households with a householder at least 16 years of age. The food stamp assistance unit is notoriously difficult to capture in survey data, but this complication does not impinge on our analyses. We simply examine whether a household in the ACS, CPS, or SIPP that reports (or does not report) receipt of food stamps has any member that is a recipient in the administrative data. This reliance on the survey household definition greatly simplifies the analysis. Note that a survey household may contain more than one FSP assistance unit or only part of a unit.10 For the analyses of the extent of errors and hence data accuracy in the next section, we use the entire (PIKed) population of households. To examine the accuracy of estimates typically obtained from the data, rather than its overall accuracy, we focus on a sample that would typically be used for such analyses (households below twice the poverty line).
The administrative data record food stamp receipt on a monthly basis, which allows us to match the reference periods of the survey questions. The ACS asks about receipt in the past 12 months. To match this definition, we create a binary variable using the administrative data that indicates whether food stamps were received in the survey month or the previous 12 months by anyone in the household.11 Food stamp receipt in the CPS refers to receipt in the previous calendar year, which we mimic in the administrative data. Seam bias is known to be an issue in the SIPP (Moore 2008), so we combine the four monthly reports of food stamp receipt from each interview to create an indicator for receipt during the four-month period, which we also do in the administrative data.
IV. Agreement between Survey and Administrative Reports
We first use our linked data to examine the differences in food stamp receipt according to the linked administrative variable and the survey reports. We take the administrative receipt measure to be accurate.12 To obtain more precise estimates, we pool our two states in all surveys. We find substantial underreporting by true recipients and low rates, but sizable numbers of false positives in all surveys. The rates differ considerably between the three surveys, which leads the ACS and CPS to understate net food stamp receipt and the SIPP to overstate it slightly. We show that imputations are an important source of survey error, particularly of false positives. Finally, we examine to what extent the data linkage process is likely to affect our results.
A. Misclassification of Food Stamp Receipt
Table 2 presents sample sizes and statistics comparing food stamp receipt according to the administrative records and survey reports of receipt by the same household for the three surveys we examine. The first four columns contain unweighted observation counts. The population estimates and percentages in the remaining columns are weighted by household weights adjusted for missing PIKs. The first row of Table 2 contains the ACS results for Illinois and Maryland for the 2000–2001 period to which the survey refers. According to the linked administrative variable, 7.49 percent of households in Illinois and Maryland receive food stamps in a year. However, reporting errors are common; the false negative rate is 33 percent. Thus, one-third of households that receive food stamps are not recorded as recipients in the survey. The share of true nonrecipients who report receipt is 0.73 percent. Note that some recipients fail to report receipt, while some nonrecipients overreport. These two errors bias the reported receipt rate in different directions, leading to a net understatement lower than the false negative rate.13 Overall, the high rate of false negatives leads the survey report of food stamp receipt to be 5.69 percent for a net understatement of receipt of 24 percent in the ACS survey data.
Reported and Administrative Food Stamp Receipt
The second line of Table 2 reports the same statistics for the CPS data; 8.69 percent of the households in the CPS receive food stamps in a calendar year according to the linked administrative variable. The share of food stamp recipient households that do not report receipt in the CPS is even higher than in the ACS; 49 percent of recipients do not report receipt. This share of false negatives has increased over the three (MD) or four years (IL) for which the administrative data are available. The increase is pronounced in Maryland, where by 2004 more than 60 percent of recipient households are not recorded as recipients. As in the ACS, the share of nonrecipients that report receipt is low, 0.84 percent. The net effect of false positives and false negatives is a substantial 40 percent understatement of the share of households receiving food stamps. This result accords quite closely with the net understatement by 39 percent for the Illinois time period and 38 percent for the Maryland time period that Meyer, Mok and Sullivan (2015a) find based on national aggregate data for months of participation.
The third line of Table 2 presents the same statistics for the 2001–2005 SIPP data; 5.95 percent of households in the SIPP receive food stamps according to the administrative data, and 23 percent of them fail to report receipt. Thus, the false negative rate in the SIPP is lower than in the ACS and substantially lower than in the CPS. On the other hand, the false positive rate is roughly twice as high as in the ACS and CPS; 1.64 percent of nonrecipient households report food stamp receipt. At least part of these differences is likely due to the fact that we consider a household to report food stamp receipt if any household member reported receipt in any of the four reference months in the SIPP. This choice could drive down the rate of false negatives and increase the rate of false positives because anyone mistakenly reporting receipt in any of the four months results in a false positive.14 The combination of the lower false negative and the higher false positive rate results in slight overreporting (by 3 percent) of food stamp receipt in the SIPP. Our findings support that the SIPP is the most accurate of the three data sets in measuring program receipt. It has the lowest false negative rate and the most accurate net reporting rate. Slight overreporting may well be preferable to the substantial underreporting in the ACS and CPS, particularly if one is mainly concerned with receipt rates. However, roughly half of this improvement stems from the higher false positives rate, that is, from introducing additional error, which may well aggravate the consequences of survey error in multivariate analyses, such as the models we analyze in Section VI.
In summary, we find low rates of false positives in two of the three surveys but substantial rates of false negatives in all three. These false negative rates are higher than those found in previous studies, often substantially so. The false negative rates exceed 50 percent in some cases, so analyses of government programs and the recipient population are likely to be severely biased in many situations. The low false positive rates in the ACS and the CPS imply that the aggregate underreporting rate (one minus the reporting rate) is a good approximation to the rate of false negative reports in those surveys, but not in the SIPP. This result is useful since aggregate rates are available for most years and for the entire United States, while our matched results are geographically and temporally limited. The large differences between false positive and false negative rates in all three surveys shows that misclassification is not completely random; that is, it depends on the true value. In the next section, we examine whether or not it is random conditional on truth, which is assumed in corrections such as in Hausman, Abrevaya, and Scott-Morton (1998).
We also find large differences in the error rates and hence net reporting across the three surveys. This degree of variation is in line with the wide range of misreporting rates found in previous studies. Contrary to these studies, we were able to link the same administrative data to three surveys using the same matching procedure. Hence, the differences we find between surveys can only be due to survey-specific characteristics, such as survey design, the focus of the survey, or its target population. For example, one factor that could contribute to the lower false negative rate in the SIPP is the shorter reference period, which should mitigate recall error.15 More generally, the differences between the surveys provide further justification for the skepticism of both Bound, Brown, and Mathiowetz (2001) and Moore, Stinson, and Welniak (2000) that a general theory of misreporting can be developed. They also emphasize that survey error heavily depends on the implementation of a survey. This observation is borne out by two surveys as similar as the ACS and CPS yielding substantially different error rates for a relatively straightforward question. Consequently, conclusions regarding important issues, such as net reporting rates or whether and how the errors are related to observable characteristics, may have survey- and program-specific answers. This finding underlines the importance of further research into the determinants of survey errors, but also makes it unlikely that they can be explained by a general theory.
B. Accuracy of Imputed Observations
An important source of error in the overall data is item nonresponse. Our linked data provide the true recipiency status of nonrespondents, providing us with a unique opportunity to examine the accuracy of the imputed values the surveys include to address the problem of item nonresponse and whether imputation improves the quality of the data.16 The bottom panel of Table 2 reports the same statistics as the top panel, but now only for item nonrespondents. Several patterns are evident from these estimates.
First, item nonresponse is an important issue for analyses of transfer programs. Even though overall imputation rates are low at 1.9, 3.6, and 7.3 percent of the population in the ACS, CPS, and SIPP,17 respectively, a large share of recipient households are imputed: 13.6 percent of those receiving food stamps in the ACS, 9.3 percent in the CPS, and 13.6 percent in the SIPP. These statistics imply that item nonresponse predicts true receipt. The share of true food stamp recipients is higher among those who are imputed than among respondents, so excluding nonrespondents biases estimated receipt rates downwards. This potential bias is particularly pronounced in the ACS, where 53.4 percent of the imputed households are actual recipients, compared to 6.6 percent among nonimputed observations. The shares of true recipients among imputed and nonimputed observations also differ substantively in the CPS (22.5 percent compared to 8.2 percent) and the SIPP (11 percent compared to 5.6 percent). Thus, item nonresponse is not (unconditionally) random in all three surveys because the probability of obtaining a response is lower among true recipients in all three surveys. Most imputation methods yield consistent estimates under the weaker assumption that reporting status is independent of the true value conditional on covariates. That the likelihood of item nonresponse depends so strongly on the true value casts doubt on this key assumption. The result also underlines that the nature of item nonresponse is survey-specific because the households that choose not to respond differ in their probability of receiving food stamps across the three surveys. This result suggests that item nonresponse is significantly influenced by survey design.
Second, the imputations also fail to capture the marginal distribution of food stamp receipt: 22.5 percent of nonrespondents in the CPS are true food stamp recipients, but the CPS imputations only assign receipt to 12 percent of nonrespondents, thereby understating the rate of receipt by 46 percent. On the other hand, the imputations overstate true food stamp receipt among nonrespondents in the ACS by 21.6 percent and in the SIPP by 29.1 percent. Another criterion to evaluate imputations is whether they make the distribution in the entire sample align better with the true distribution. The overimputation in the ACS improves the net underreporting in the ACS. However, this “improvement” comes from introducing additional error, which may have negative effects on the joint distribution of food stamp receipt and other variables in the survey. The imputations in the other two surveys make net survey error worse, by leading to more overreporting in the SIPP and adding to the underreporting in the CPS.
Third, comparing imputed receipt to administrative receipt reveals that imputations induce substantial error at the household level. False negative rates among imputations are much lower than the overall rate in the ACS (3 percent), much higher in the CPS (80 percent), and slightly higher in the SIPP (29 percent). The low rate of false negatives in the ACS comes at the expense of a staggering false positive rate of 28 percent. False positive rates are substantially higher in the two other surveys as well, at 10 percent in the CPS and 7 percent in the SIPP. Consequently, a substantial share of false positives is due to imputation. Imputed observations account for 38 percent of false positives in the ACS, despite being no more than 1.6 percent of the total sample. Similarly, 3.6 percent of the sample are imputed, but account for 37 percent of the false positives in the CPS. Despite the much higher imputation share in the SIPP (7.7 percent), the imputed observations account for a lower, but still substantial, 27 percent of the false positives. When excluding imputed observations, the slight overreporting in the SIPP changes to slight underreporting. Because of these imputed false positives, the overall false positive rate in all of the surveys is much higher than the rate of overreporting by respondents. Thus, it is not a good indicator of households’ tendency to report receipt when they are not recipients.
Taken together, our findings suggest that neither including nor excluding imputed observations is likely to solve the problem of item nonresponse. Receipt rates differ between respondents and nonrespondents, so excluding them will cause sample selection bias. However, including the imputed observation leads to bias from the substantial error rates we document. Therefore, data users are faced with the dilemma that both including and excluding imputed observations causes bias, and which strategy yields less bias is application-specific and unknown.
C. Potential Biases due to the Linkage Process
The data linkage process may lead to errors in the linked data for reasons such as missing or mismatched PIKs and households moving into one of the two states during the reference period. In this section, we discuss the extent of these problems and the likely biases they may cause in our estimated error rates.
First, some individuals may receive food stamps, but have no PIK in the survey data. We include households in our samples if anyone in the household has a PIK. So as long as at least one true recipient in the household has a PIK, we are able to classify it correctly as a recipient household. However, if none of the true recipient household members has a PIK (but another member does), we would falsely classify the household as a nonrecipient household. This misclassification would understate true food stamp receipt. Affected households are true recipients, so we might reasonably assume that they have reporting rates higher than nonrecipients. It may also be reasonable to assume that their reporting rates are lower than those of the average recipient households, most of which have only recipient members. Then, as shown in the Online Appendix, the false positive rate is biased upward, and the false negative rate is biased downward. About 14 percent of ACS households with at least one PIK have members without a PIK, while 24 percent of CPS households in Illinois (15 percent in Maryland) have this situation. Thus, this bias could be substantial.
Second, a small fraction of the administrative records do not have a PIK. As in the previous case, this type of error will lead some true recipient households to not appear as recipients according to our administrative measure. If such households have reporting rates higher than true nonrecipients, but lower than other true recipients, the false positive rate would be overstated, and the false negative rate understated. The first condition seems likely given that these households are true recipients, while the second inequality is less clear. The share of administrative data without PIKs is very small, however.
Third, a PIK may be incorrectly assigned to a survey individual. If the household of this individual is a true recipient household, then the situation is analogous to the second case above and likely to increase false positives. The situation is slightly better because there is still a small chance that the erroneously assigned PIK belongs to another recipient, so that the household is still correctly classified. However, if the household is a true nonrecipient household, false negatives may be overstated if the incorrectly assigned PIK is from a true recipient. This situation should be uncommon. Most households do not receive food stamps, so the incorrectly assigned PIK is more likely to belong to a true nonrecipient household, which would lead us to classify the household correctly. Thus, the incorrect false negatives require the joint occurrence of two low probability events: an incorrectly assigned PIK and administrative food stamp receipt for that PIK.
Finally, a household that moved into the current state during the reference period of the survey may have received food stamps in their previous state, but not in their current state of residence. The administrative data from their current state of residence would not report that receipt. Thus, mobility across state lines will lead to an understatement of true food stamp receipt. As above, it seems reasonable to assume that these households have higher reporting rates than nonrecipients because they are true recipients. However, they are not currently receiving the program, so it also seems likely that they report at a lower rate than the average household. Under these assumptions, the false positive rate will be biased upward and the false negative rate biased downward (see the Online Appendix for a proof). Since only about 2 percent of individuals move across state lines in a year, the likely bias is small.
Overall, three of the four sources of error likely lead the administrative variable in the linked data to understate true receipt rates, implying that the linked data understate the false negative rate and overstate the false positive rate. The third case is hard to evaluate since the frequency of incorrectly assigned PIKs is not known, but the bias seems likely to be small. In consequence, linkage error likely results in an understatement of the true receipt rate and thereby an overstatement of the true reporting rate. In the presence of net underreporting, this overstatement means that the linked data make the survey look more accurate in terms of the net reporting rate of the number of recipients. In terms of error rates, linkage errors likely make the data understate false negatives and overstate false positives.
V. What Affects the Agreement between the Survey Reports and the Administrative Records?
We next examine how misreporting of food stamp receipt differs across households. The previous section shows that error rates differ by true receipt status, so we analyze how errors vary with household characteristics conditional on true receipt. If misreporting does not depend on household characteristics conditional on true receipt, then it is fairly straightforward to analyze the bias it causes and correct estimates of takeup and the distributional effects of programs. Examples of such corrections can be found in Hausman, Abrevaya, and Scott-Morton (1998) and under the assumption of no false positives in Meyer, Mok, and Sullivan (2015a) and Meyer (2010). However, if misreporting is correlated with household characteristics, such corrections do not work well, and the biases are difficult to assess (Meyer and Mittag 2017; Mittag 2019; Nguimkeu, Denteh, and Tchernis 2019). Nonetheless, in such cases, we can use models of survey error, such as the ones we estimate in this section, to adjust statistical analyses (Bollinger and David 1997). We first examine the determinants of false negatives and then examine the determinants of false positives. We examine households with income less than twice the poverty line to focus on a group for whom food stamp receipt is especially relevant.18 Due to the smaller SIPP sample, we continue to pool the data from Illinois and Maryland. Online Appendix Table 2 provides summary statistics for these samples.
Table 3 reports probit estimates for the determinants of false negative reporting in the ACS, the CPS, and the SIPP. Here the subsample consists of those who, according to the administrative data, are recipients of food stamps. We report average marginal effects on the probability of being a false negative reporter rather than coefficients to aid the interpretation of the magnitudes. The explanatory variables differ slightly due to availability in the three surveys, but all models include family type, number of adults and children, number of members who had a PIK, age categories, gender, education, ethnicity and employment status of the householder, whether the household is in a rural area, income relative to the household poverty line, reported receipt of other programs, receipt of TANF, and length of food stamp receipt from administrative data, as well as whether food stamp receipt was imputed. In the ACS and SIPP, we also examine whether the householder is disabled or a U.S. citizen and the role of language. In the CPS and the SIPP, we control for the time period. In the SIPP, we also include time in months since last food stamp receipt, a dummy if the household is in Maryland, and variables that are related to the quality of the interview.
Determinants of False Negatives, Probit Average Derivatives, Households with Income Less Than Twice the Poverty Line
Despite fairly small samples, there are many statistically significant determinants of false negative reporting. In all surveys, we easily reject the hypothesis that errors are unrelated to household characteristics. Consequently, misreporting is not conditionally random, because reporting rates vary with household characteristics even among true recipients. This finding violates the assumption of most corrections for misreporting and implies that (linear and nonlinear) instrumental variable methods are unlikely to give consistent estimates. Nguimkeu, Denteh, and Tchernis (2019) show that instrumental variable estimates of treatment effects can be severely biased. It also implies that the bias caused by survey error depends on the covariates that predict reporting errors. Intuitively, misreporting will cause larger downward biases for survey estimates of the receipt rates of subpopulations that are less likely to report true receipt. Attempting to address the errors by scaling up receipt rates by the net underreporting rate leads to overestimation for good reporters and underestimation for groups that report poorly. Consequently, understanding which variables predict survey error is important to assess what kind of analyses are likely to be biased and to examine the likely bias in practice. The predictors of errors are also informative about some of the theories of misreporting discussed above.
Even though the marginal effects of many of the variables are imprecisely estimated, some common themes emerge. Households with a householder 50 years or older are more likely to be false negatives (by 9–15 percentage points), except in the Maryland CPS sample, where the effect is large and negative. Several papers (discussed below) argue that the elderly are less likely to report program receipt for reasons such as stigma. Except for the positive effect in the Maryland CPS, our results support this hypothesis. As a consequence, part of the decline in estimated participation rates with age comes from decreasing reporting rates rather than decreasing rates of program receipt. The fact that higher income increases the likelihood that a recipient will not report receipt is also consistent with stigma being among the causes of misreporting. We also find that households where a language other than English is spoken (ACS) or where the householder speaks poor or no English (SIPP) are much more likely to fail to report food stamp receipt. This result can be taken as evidence that comprehension of the question is among the causes of misreporting. However, we also find that non-U.S. citizens are surprisingly less likely to fail to report, and the difference is significant in the SIPP and the ACS Illinois sample.
In terms of other demographic characteristics, households with a white householder are less likely to fail to report in all samples. The difference is sizeable (5–11 percentage points) and significant in the ACS and the SIPP. For the remaining demographic variables, our results are mixed or inconclusive. The marginal effects for households in rural areas are negative in four out of five models. They are less likely to fail to report in the ACS and CPS. The difference in the probability of reporting is large (ten percentage points) and significant in the ACS, but insignificant and small in the CPS and SIPP. Misreporting seems to be related to the gender of the householder, but the signs of the marginal effects differ across surveys. There is some evidence that households with a more educated householder are more likely to underreport, but the estimates are imprecise. Similarly, the marginal effect of being a single parent household with children is negative in four out of five models, but only significant in the SIPP. The effect of the number of adults is positive in four out of five cases, but always imprecisely estimated.
Quite uniformly, true recipients who report receipt of other programs (public assistance, housing assistance) are more likely to report food stamp receipt. The difference is large—for example, in the ACS, food stamp recipient households reporting public assistance receipt are nearly 20 percentage points less likely to fail to report food stamp receipt. We also find sizable effects of the duration of receipt in the reference period. An additional month of food stamp receipt is estimated to decrease the probability of failing to report by two to five percentage points. This result agrees with the idea that regularity of receipt is important and is also consistent with recall error being one of the reasons for false negatives. The SIPP provides further evidence of recall error, where we show that the number of months since last food stamp receipt in the reference period increases the probability of false negatives, by four percentage points per month. As the earlier analysis of imputed observations in Section IV.B presages, an imputation indicator is significant in all samples. We find little evidence of any effect of other variables related to the quality of the data, the interview, and the matching process.
In addition to our analysis of underreporting, we also examine the frequency of reporting receipt by those who are truly nonrecipients in Table 4. The sample for this false positive analysis, those who are truly nonrecipients, is much larger than that used for the false negative analysis. However, the false positive rate is so low that the number of false positives is much smaller than the number of false negatives. We can still easily reject the hypothesis that overreporting is unrelated to household characteristics, which confirms that survey error is not random, even after conditioning on truth. Given the small number of “ones” in this probit analysis, there are fewer significant determinants of reporting in these equations. Households with a householder 50 or older are less likely to misreport if they do not receive food stamps. The effect is negative throughout and significant for Illinois in the ACS and CPS. The fact that both recipient and nonrecipient elderly households are less likely to report receipt may indicate that stigma plays a larger role for the elderly. Similarly, income relative to the poverty line decreases the probability of false positives. The effect is significant except for Maryland in the ACS. This result may be additional evidence of stigma, but could also be explained by the fact that these households are less likely to receive food stamps and thus are less likely to make mistakes about their recipiency status. Households with a disabled householder are more likely to overreport. The number of household members under 18 matters, but goes in different directions in the surveys. It is significant and positive in the ACS (IL only) and the SIPP, but negative for Maryland in the CPS. There is some evidence that true recipient households with a white householder report more accurately. Reporting receipt of other programs, particularly a report of public assistance receipt, increases the probability of a false positive. This finding supports the hypothesis that misreporting is partly due to respondents confusing government programs (Nicholas and Wiseman 2009). The marginal effects of the imputation indicators confirm the finding of the last section that many false positives are due to imputation and that imputations are worse than reports by true nonrecipients in all surveys.
Determinants of False Positives, Probit Average Derivatives, Households with Income Less than Twice the Poverty Line
In conclusion, we show that both false positives and false negatives are systematically related to household characteristics in all surveys. Nonetheless, we find few consistent patterns. This finding may be due to the small sample sizes or because misreporting is mainly survey-specific. The variables that consistently predict survey error support common explanations for misreporting, such as comprehension, salience, recall, confusing government programs, and stigma. Even though specific effects are imprecisely estimated, they are jointly significant, so that we reject the hypothesis that errors are random conditional on truth in all surveys. It is unclear how this systematic survey error affects estimates, such as program effects or those from the binary choice models, that are commonly used to examine program take-up. Coefficient estimates, such as the ones produced here, could be used to correct such models as in Bollinger and David (2001) or Meyer and Mittag (2017). These estimates also enter the formulas for the bias in estimated program effects in Nguimkeu, Denteh, and Tchernis (2019).
VI. The Effect of Survey Error on Estimates of Program Receipt
The previous sections show substantial misreporting of food stamp receipt and that it is systematically related to household characteristics. It is well known that such nonclassical measurement error causes bias, but little is known about the direction and magnitude of the bias in general. We use our measure of truth in the linked administrative data to analyze an important case, binary choice models of program receipt. Such models are often used to analyze program targeting (see Currie 2006 or Haider, Jacknowitz, and Schoeni 2003).19 ,Meyer and Mittag (2017) derive the bias for probit models with reported receipt as the dependent variable that these analyses usually employ. Their results imply a tendency of marginal effects to retain the correct sign. However, this prediction can be overturned, for example, when the covariates strongly predict survey errors. Thus, it is important to assess whether such results provide a useful characterization of the consequences of survey error in practice.
Having true food stamp receipt matched to survey data gives us the opportunity to estimate this bias directly and examine whether the use of administrative data provides a different understanding of the determinants of food stamp receipt than the survey data alone. We first estimate the determinants of receipt using only survey data. We then reestimate the determinants of receipt using the survey covariates, but with the administrative measure of receipt as the dependent variable. We then compare the two equations for the determinants of food stamp use. Throughout this section, we report average marginal effects20 and restrict our sample to households with income below twice the poverty line to have a sample for which food stamp receipt is a likely possibility.
The determinants of food stamp receipt in the ACS are in the first four columns of Table 5. The results using only survey data are in Column 1 for Illinois and in Column 3 for Maryland. The survey estimates suggest that, controlling for household income, a single parent household is about ten percentage points more likely to be a recipient than a married couple household in both states. Those 50 or older are much less likely to be reported participants than those ages 40–49 in Illinois, while in Maryland the effect is only evident for those 60 or older. The differences in receipt for these older groups are large—ten percentage points in Illinois and nine percentage points in Maryland—compared to those 40–49. The marginal effects of education and income have the expected signs, with high school dropouts six percentage points more likely to report participation in Illinois and seven percentage points more likely in Maryland than those with some college. Income is a strong predictor of reported food stamp receipt. In Illinois, households with income equal to half the poverty line are seven percentage points more likely to report food stamp receipt than households with income 1.5 times the poverty line. In Maryland, the difference is ten percentage points. The survey estimates also suggest that households with a nonemployed or disabled householder are much more likely to receive food stamps. In Illinois, nonwhites are more likely to report participation, while there is little difference by race in Maryland. According to survey reports, those reporting housing assistance receipt are more than 1.5 times as likely to be recipients than an average household. Those reporting public assistance receipt are more than twice as likely to be recipients.
Determinants of Reported and Administrative Food Stamp Receipt, Probit Marginal Effects, Linked Households with Income Less than Twice the Poverty Line
Replacing the mismeasured ACS survey receipt variable with the administrative measure of receipt paints a different picture of the determinants of food stamp participation. Columns 2 and 4 of Table 5 repeat the analysis, substituting an administrative dependent variable for the poorly reported survey measure of receipt. Superscript letters in Columns 2 and 4 indicate the level of significance from tests of equality of the marginal effects based on the survey data alone and those based on the survey and administrative combined data. The joint χ2-tests in the last row clearly reject that the combined data yield the same estimated marginal effects as the ACS survey data alone for both states. Single households, both with and without children, are much more likely to be recipient households in the combined data. In Illinois, the difference is four to five percentage points, while in Maryland, it is six to nine percentage points, and most differences are at least marginally statistically significant. The average marginal effects for race also differ significantly, with the administrative specifications indicating that participation is four percentage points greater for nonwhites than the survey data only specifications indicate in each state. Most marginal effects for reported receipt of public assistance or housing benefits are significantly different. In Illinois, the marginal effect of age, particularly for age 50–59, is quite different in the combined data, and the difference is statistically significant. The association with speaking English only is also significantly different. For Maryland, the association with income is quite different in the combined data, indicating a substantially quicker decline in participation with income. Overall, 16 marginal effects differ significantly, but only four out of 46 marginal effects change their direction due to survey errors. This pattern is in line with the theoretical results from Meyer and Mittag (2017).
We report the determinants of food stamp participation using the CPS data in Columns 5–8 of Table 5. Again, Columns 5–7 of this table provide the average marginal effects for the models that use only survey data. As in the ACS survey data results, all else equal, single parent households are more likely to be recipients, though the relationship is not significant in Maryland. Households with many children are more likely to report food stamp receipt, and this difference is significant in both states. Households with householders 70 years or older are less likely to receive food stamps, while those that have very low income, a nonemployed householder, or who report receipt of either public assistance or housing benefits are significantly more likely to receive food stamps in both states according to the CPS reports. In Illinois, those without a high school degree are more likely, and those with a college degree less likely, to receive than those with some college. The survey data alone do not suggest that food stamp receipt has been rising with time in either of the states.
When we substitute the administrative measure of receipt for the poorly reported survey measure in Columns 6 and 8 of Table 5, the determinants of reporting change in important ways. As the χ2-test p-values at the bottom of Columns 6 and 8 indicate, in both states we reject that the marginal effects are jointly the same using the administrative and the survey dependent variable. Eleven out of 42 marginal effects in the CPS change their sign due to survey error. On one hand, this pattern is surprising given the prediction that sign changes require strong conditions. On the other hand, it is not surprising that we find sign changes to be most frequent in the survey with the highest misclassification rate. Substantively, the difference in participation between single parent families and those with married parents changes from five percentage points to 13 in Illinois and from one percentage point to eight in Maryland with the administrative data measure. In Illinois, the change is statistically significant, while it is not in Maryland. In Maryland, there is some evidence of an increased marginal effect of the number of children in a household. Food stamp participation is also much higher among nonwhites and drops off more quickly with income than in the survey data alone in Illinois. Contrary to the survey data, which showed no time trend, the combined data provide evidence of increasing receipt in both states, which is relevant to recent research by Mulligan (2012) and Ganong and Liebman (2018).
The results using the SIPP survey reports are in Column 9 of Table 5 and are similar to the other two surveys. Single parents are again more likely to receive food stamps, all else being the same. In the SIPP this also applies to single individuals. As in the other two surveys, income relative to the poverty line has a negative impact on reported program take-up. Households in rural areas and with a householder reporting a disability or poor English skills are more likely to receive benefits according to SIPP reports. Households with a nonwhite householder are more likely to be (reported) participants. There is a strong positive association between reporting food stamps and receipt of other programs (housing assistance and TANF). Reported participation seems to decline with age, but the evidence is weak. Contrary to the CPS, there is a time trend in the survey reports, but it is flat until 2003 and then increases sharply.
Column 10 of Table 5 reports the SIPP results that use the administrative dependent variable. The joint test rejects that the results from the two dependent variables are the same. The SIPP estimates align well with the prediction from Meyer and Mittag (2017), as only six out of 27 marginal effects change sign. Several marginal effects are significantly different. The number of adults has a pronounced negative effect in the administrative data. As in the other two surveys, the effects of race and income are more pronounced when using administrative food stamp receipt, while the association with reporting other programs is weaker. The marginal effects of two age categories (30–39 and 50–59) change significantly. While the survey data suggest that participation declines over the life-cycle, the relation is U-shaped in the administrative data, increasing sharply after age 50, though the marginal effects are imprecisely estimated. Despite being equally likely to receive food stamps as those in Illinois, households in Maryland are almost five percentage points less likely to report receiving food stamps. The time trend is clearly different when using the administrative dependent variable. Growth in program participation is more rapid in the first half of the time period and slower in the second half using the accurate data.
In summary, survey error clearly changes what we learn about program receipt. One of the key differences between the combined administrative and survey data and the survey data alone is in participation by age. Haider, Jacknowitz, and Schoeni (2003) and Wu (2010) emphasize lower food stamp take-up by older households in survey data. Gundersen and Ziliak (2008) find a more complicated pattern by age. In some cases, the differences in misreporting by age we document in Section V make the combined data show much less of a difference between the aged and the nonaged, thus explaining a significant part of the puzzle in past work. We see this pattern in our largest sample, that for Illinois using ACS data, although it is not evident in the CPS data. Another noteworthy difference is the impact of income relative to the poverty line. Food stamp receipt declines more rapidly with income in the administrative data, so analyses using survey data only are likely to understate the distributional consequences of the Food Stamp Program. Finally, survey error has a pronounced impact on the time trend in food stamp receipt. In the CPS, the survey reports conceal the time trend, while in the SIPP they suggest a flat profile followed by a steep increase instead of a more steady increase. The time pattern of receipt has been a key issue in recent work on food stamps, such as Mulligan (2012) and Ganong and Liebman (2018).
However, while the survey data alone would lead one to make incorrect inferences in some cases, the overall picture obtained from the survey data is fairly accurate in qualitative terms. Most of the significant marginal effects remain significant, and changes in the sign of marginal effects are rare when one goes from the survey data alone to the combined data. Overall, only 21 out of 115 marginal effects change sign. This pattern holds even in the CPS, where half of true food stamp recipients fail to report. That few marginal effects change sign conforms to theoretical predictions in Meyer and Mittag (2017), suggesting that the asymptotic biases can be useful in assessing the bias in practice. If the tendency for misclassification to not affect the sign of estimates in such models holds more generally, we may still be able to draw important qualitative conclusions from contaminated survey data. Future research should explore the generality of this result.
VII. Conclusions
Benefit receipt in major household surveys is often misreported, hindering our understanding of government programs and the economic circumstances of disadvantaged populations. We use administrative data on Food Stamp Program participation from Illinois and Maryland matched to ACS, CPS, and SIPP household survey data to examine the extent and consequences of such survey errors. We show that more than 30 percent of true recipient households do not report receipt in the ACS, approximately 50 percent do not report receipt in the CPS, and 23 percent do not report in the SIPP. False positive rates are much lower, at less than 1 percent in the ACS and CPS and 1.6 percent in the SIPP. Imputation matters for analysis of program receipt because item nonresponse is frequent among recipients. Receipt rates differ between respondents and the overall population, so only using respondents results in biased population estimates. Imputed observations introduce substantial error, with a large share of false positives being due to imputation in all three surveys. Imputations do not correctly reproduce the probabilities of receipt among item nonrespondents. We discuss the potential bias from linkage errors on these error rates, finding that such errors likely lead to an understatement of false negative errors and an overstatement of false positive errors.
Misreporting, both false negatives and false positives, varies with household characteristics such as income, race, and age. The relation of these errors to frequently used covariates will lead to biases that are difficult to assess and complicated to correct for because it renders most corrections for misreporting invalid and makes it difficult to distinguish the effect of such characteristics on reporting from their effect on true receipt. See Mittag (2019) and Davern, Meyer, and Mittag (2019) for discussion and methods of correction based on estimates such as the ones provided here. The characteristics that predict misreporting suggest that comprehension, salience, recall, stigma, and complex patterns of program receipt are among the determinants of survey errors, as theories of misreporting predict. However, with our small sample, it is difficult to provide definitive tests of theories of misreporting.
Finally, we examine bias in the determinants of program receipt using our combined administrative and survey data, which include accurate participation from the administrative data and household explanatory characteristics from the survey that are missing in the administrative data. Our food stamp participation results differ from conventional estimates using only survey data in several important ways. Participation is higher among single parents and nonwhites and declines more quickly with income than the survey data alone suggest. Participation by age and the patterns of multiple program participation are also different using the administrative variable. The results indicate that underreporting is part of the explanation for the low receipt rate among the elderly. Lastly, using only the survey data, one would miss much of the rise in food stamp participation. It is also possible to think of the glass as half full, rather than half empty. It is striking that the signs of most determinants of food stamp receipt in the survey data alone match those in the combined administrative and survey data, even in the CPS, where half of true food stamp recipients are not recorded as recipients. Further evidence on this pattern might clarify the conditions under which this finding holds more generally.
Our results also suggest biases in other studies where program receipt is used as an explanatory variable in a regression. We show that the errors of measurement are correlated with the true values as well as with a range of explanatory variables. This non-classical form of the errors means that the bias will usually take a complicated form. Substantively, erroneous program receipt will affect studies of who receives benefits and why they do and of program effects on labor supply, health, consumption, and other outcomes. Studies that examine the extent to which food stamps increase the resources of poor families will tend to understate their impact. A better understanding of underreporting and how it may bias program receipt estimates is important for both policymakers and researchers. Accurate estimates of program receipt are needed to know who benefits from programs, why some choose not to participate in certain programs, and how individual characteristics affect participation. Since we find that survey error leads to biased estimates of the determinants of program receipt, policies based on survey data alone may be misguided.
Footnotes
The authors thank David Johnson, John Kirlin, Gayatri Koolwal, Alan Krueger, Cathleen Li, Wallace Mok, Daniel Schroeder, James Spletzer, Jane Stavely, Shelly Ver Ploeg, Derek Wu, two anonymous referees, and audiences at the American Economic Association Meetings, American Statistical Association Meetings, Baylor University, European Congress of Methodology, ITSEW, Harvard University, USDA, Yale University, and ZEW for beneficial comments. The authors are grateful for the assistance of many Census Bureau employees, including David Johnson, Amy O’Hara, Lynn Riggs, and Frank Limehouse. Lucy Bilaver, Kerry Franzetta, and Janna Johnson provided excellent research assistance. This research was supported by the Economic Research Service of the USDA, the Russell Sage Foundation, Alfred P. Sloan Foundation, Charles Koch Foundation, the Menard Family, the Czech Science Foundation (through grant no. 16-07603Y), and the Czech Academy of Sciences (through institutional support RVO 67985998). Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the USDA or the U.S. Census Bureau. The data analysis was conducted at the Chicago RDC and was screened to avoid revealing confidential data. Due to confidentiality restrictions, the linked data used in this paper can only be accessed at secure facilities of the U.S. Census Bureau by researchers with Special Sworn Status. The authors cannot make the data or intermediate results publicly available, but researchers can request access to the data by writing a proposal to the U.S. Census Bureau. Further information is available at https://www.census.gov/about/adrm/linkage/guidance.html and from the authors.
Supplementary materials are freely available online at: http://uwpress.wisc.edu/journals/journals/jhr-supplementary.html
↵1. See Hoynes and Schanzenbach (2009); Almond, Hoynes, and Schanzenbach (2011); and Schmidt, Shore-Sheppard, and Watson (2016), for example.
↵2. Examples where program participation is the main dependent variable include Blank and Ruggles (1996); Haider, Jacknowitz, and Schoeni (2003); Figlio, Gundersen, and Ziliak (2000); Currie and Grogger (2001); and Ziliak, Gundersen, and Figlio (2003), while cases where it is an explanatory variable include Schmidt, Shore-Sheppard, and Watson (2016); Gundersen and Ziliak (2003); and Blundell and Pistaferri (2003).
↵3. Also see Coder and Scoon-Rogers (1996), Roemer (2000), Wheaton (2007), and Rothbaum (2015).
↵4. Nguimkeu, Denteh, and Tchernis (2019) provide a formal treatment showing that the bias with instrumental variable methods can be severe.
↵5. Notable exceptions include Bollinger and David (1997, 2001), Pierret (2001), and Gundersen and Kreider (2008); see Section II for further discussion and references.
↵6. See, for example, Wheaton (2007); Scholz, Moffitt, and Cowan (2009); and Meyer (2010) for exceptions.
↵7. For excellent reviews of research on take-up of food stamps and other programs, see Remler and Glied (2003) and Currie (2006).
↵8. We do not mean to argue that administrative or linked data are more accurate in general. Obtaining an accurate measure via data linkage requires high-quality administrative records and linkages, as discussed in Meyer and Mittag (2021b). Administrative data can contain substantial amount of error; see Niehaus and Sukhtankar (2013) for an extreme case. See Courtemanche, Denteh, and Tchernis (2019) and Meyer and Mittag (2019b) for a discussion of a specific linked data source. We discuss likely inaccuracies and their consequences in Section IV.C. If the linked data contain substantial error, other methods are required (for example, Abowd and Stinson 2013; Kapteyn and Ypma 2007; Oberski et al. 2017; and Meijer, Rohwedder, and Wansbeek 2012).
↵9. Strictly speaking, we used the 2001 Supplementary Survey or SS01, which is a predecessor of the ACS.
↵10. To be clear, we are able to determine accurately what share of true recipient survey households report receipt, but we cannot determine what share of true recipient assistance units report receipt.
↵11. It is not entirely clear whether the reference period should include the month of the survey or not. We include it throughout, so that we define receipt based on a 13-month period. Error rates are only negligibly different when defining administrative receipt based on the 12 months preceding the current month.
↵12. As discussed above, we consider it accurate in the sense that the potential sources of error we discuss likely at most have a negligible impact on our estimates.
↵13. The difference is larger than the low false positive rate seems to suggest due to the much larger pool of nonrecipients.
↵14. Contrary to the case of false negatives, pooling four months could also reduce the false positive rate if false positives mainly stem from reporting receipt in the wrong months, but we consider this unlikely.
↵15. A shorter reference period likely provides less or less-relevant information, so the potential error reduction would come at a cost.
↵16. Food stamp receipt in the ACS, CPS, and SIPP is imputed using hot deck methods. In the ACS, households (not in group quarters) are classified into cells defined by full interactions of family type, presence of children, poverty status, and the race of the reference person in each state. The data go through what is called a “geosort” before the imputation process. The most recent nonmissing response from a given cell at the smallest level of geography available is substituted for a missing response. In the CPS hot deck, households are classified into a much larger number of cells, but at the national level. The cells are defined by full interactions of number of people in the household (six categories), household income (nine categories), household type (three categories), age of the householder (two categories), and receipt of public assistance (two categories), for a total of 648 cells. Finally, the SIPP also imputes at the national level and only uses donors from the current wave. It applies a geosort to the data, but with much less geographic detail than the ACS. Food stamp receipt is then imputed within cells formed by age (six categories), race (two categories), sex (two categories), marital status (four categories), number of children (three categories), and work experience (three categories), for a total of 864 cells.
↵17. Note that in the SIPP we consider an observation to be imputed if any of the four reports was imputed.
↵18. We focus on this sample to make the results informative about the bias in the estimated models of receipt in the next section. Qualitative conclusions on the determinants of errors remain unchanged when analyzing the entire sample.
↵19. A related strand of literature examines estimates of program effects in the presence of misclassification of program receipt, see, for example, Gundersen and Kreider (2009); Kreider (2010); Millimet (2011); Gundersen, Kreider, and Pepper (2012); Kreider et al. (2012); Alamada, McCarthy, and Tchernis (2016); Jensen, Kreider, and Zhylyevskyy (2019); and Nguimkeu, Denteh, and Tchernis (2019).
↵20. The overall results are very similar for the coefficients, though the differences are smaller in some cases, but not uniformly so.
- Received August 2018.
- Accepted May 2020.
This open access article is distributed under the terms of the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0) and is freely available online at: http://jhr.uwpress.org.