Abstract
Biases from truncation caused by coresidency restriction have been a challenge for research on intergenerational mobility. Estimates of intergenerational schooling persistence from two data sets show that the intergenerational regression coefficient, the most widely used measure, is severely biased downward in coresident samples. But the bias in intergenerational correlation is much smaller and is less sensitive to the coresidency rate. The paper provides explanations for these results. Comparison of intergenerational mobility based on the intergenerational regression coefficient across countries, gender, and over time can be misleading. Much progress on intergenerational mobility in developing countries can be made with the available data by focusing on intergenerational correlation.
I. Introduction
There has been a renewed interest in intergenerational economic mobility over the last few decades, with heightened concerns about widening inequality despite significant growth and poverty reduction in many developed and developing countries (Atinc et al. 2005; Minton Beddoes 2012). Notwithstanding the recent interest, intergenerational economic persistence in developing countries remains an underresearched area, primarily due to data limitations.1
A major issue that has stunted progress in this research agenda is that the standard household surveys suffer from truncation because coresidency is used as a criterion to define household membership (Bardhan 2014; Behrman 1999; Deaton 1997).2 A standard household survey such as the Living Standards Measurement Survey (LSMS) done by theWorld Bank or the Household Income and Expenditure Survey (HIES) done by national statistical agencies usually includes only the coresident parents and children. 3 While the Demographic and Health Surveys (DHSs) have information on how many children were nonresident at the time of the survey, they do not collect data on their characteristics, including schooling. Since the pattern of coresidence is not random, most of the studies suffer from potentially serious sample truncation bias when estimating intergenerational persistence in economic status. This has discouraged research on intergenerational economic mobility in developing countries.4
Although potential biases from the coresidency restriction have been a major stumbling block, to the best of our knowledge, there is no evidence on the direction and magnitude of the coresidency bias in the standard measures of intergenerational persistence. Are the estimates from the coresident sample biased to such an extent that they are of little use in understanding intergenerational economic mobility?5 Are the different measures of intergenerational persistence affected by coresidency bias to the same degree, or are some measures more robust than others with relatively small bias? This paper makes progress on these questions, providing evidence, analysis, and guidance for working with coresident samples that have wide ranging implications for research on intergenerational economic mobility.
We use a simple model of truncation where children leave the parental household after marriage to analyze the bias in the two most widely used measures of intergenerational persistence in the literature: intergenerational regression coefficient (henceforth IGRC) and intergenerational correlation (henceforth IGC). While both IGRC and IGC measure linear persistence across generations, they are conceptually different. IGRC shows the effects of a one-year higher parental schooling on the schooling attainment of children, while IGC (squared) represents the proportion of the variance in children’s schooling that can be attributed to the variance in parents’ schooling.6 The estimated IGRC and IGC differ from each other, except for the case when the variance in schooling does not change across generations. Since the variance in children’s schooling is usually higher than that in parents’ schooling, the IGRCestimates are in general larger in magnitude.We show that IGC and IGRC are affected differently by truncation bias; the bias in IGC is likely to be significantly less compared to that in IGRC, and the IGC estimates are much more robust in the sense that they are less sensitive to differences in coresidency rates.
To estimate the biases due to coresidency restriction empirically, the challenge is to find surveys that (i) include all of the children and the parents irrespective of their residency status and (ii) identify the subset of individuals coresident in a household at the time of the survey. We take advantage of two high quality household surveys in villages of India and Bangladesh and provide a series of estimates of IGC and IGRC.
The empirical evidence on intergenerational schooling persistence in India and Bangladesh presented below shows that IGRC, the most widely used measure of intergenerational persistence, suffers from large downward bias because of truncation due to coresidency. In contrast, the downward bias in the estimated IGC in coresident samples is much smaller—in many cases, less than one-third of the bias in the corresponding IGRC estimate. In the sample of 13–60 years age range, the average bias in IGRC estimates is 29.7 percent in the case ofBangladesh, while the corresponding bias in IGCestimates is only 8.7 percent. The extent of truncation bias in India is smaller because of higher coresidency rates observed in the data.However, the IGRCestimates in India are also substantially biased downward; the average bias is 17.6 percent. Again, the corresponding average bias in the IGC estimates is much smaller at 10.4 percent.7 Considering estimates across different age ranges and gender, the average biases in IGRC and IGC are 24.4 percent and 6.5 percent, respectively, in Bangladesh and 14.12 percent (IGRC) and 7.6 percent (IGC) in India.Moreover, we provide suggestive evidence that the bias in IGC estimate is less sensitive to the variations in the coresidency rate.
The analysis and findings presented below have important and wide ranging implications for research on intergenerational economic mobility. First, our analysis implies that much progress in understanding intergenerational mobility can be made with the household surveys available in developing countries by focusing on IGC as the measure of mobility. These data sets are currently shunned by most researchers because of the worry that the estimates from the coresident sample suffer from bias of unknown direction, and possibly of very high magnitude.8 Second, the results in this paper can be helpful in sorting out often conflicting evidence on intergenerational mobility from coresident samples. For example, in India, according to the IGRC estimates, educational mobility has improved substantially after the 1991 reform, but remained largely stagnant according to the IGC estimates. Our analysis suggests that the conclusions based on the IGC estimates in such instances of conflict are more credible. Third, the evidence that the estimates are biased downward can be helpful in understanding changes in intergenerational mobility over time. If the estimates from coresident samples show no change or an increase in persistence over time, we can be confident that mobility has declined, as the coresidency rate in a country usually declines over time, making the downward bias in the estimates for the younger generation larger in magnitude.9 Fourth, our results have important implications for cross-country comparisons of economic mobility. Most of the available data sets suffer from coresidency restrictions, and the extent of truncation is likely to vary across countries significantly. Since the bias in IGRC is larger, and it responds more to changes in the coresidency rate, a ranking according to IGRC is more likely to be incorrect compared to a ranking based on IGC.10 Fifth, the evidence indicates that the IGRC estimates in coresident samples are likely to underestimate the gender gap in intergenerational economic mobility, because coresidency rates for girls are much lower in many developing countries, especially where girls leave the natal family after marriage.
The rest of the paper is organized as follows. Section II provides a brief discussion on the related literature focusing on developing countries and puts the contribution of this paper in perspective. Section III develops a simple model of sample truncation and provides an analysis of the bias in IGRC and IGC estimates. Section IV discusses the data sources and variables used in the analysis. Section V reports the estimates of IGRC and IGC in educational attainment for Bangladesh and India data, both for the full and the coresident samples. Section VI presents additional evidence and a discussion on the sensitivity of IGRC and IGC with respect to coresidency rate in a survey. Section VII discusses the implications of the results for sorting out the conflicting evidence from the existing studies on intergenerational mobility in developing countries. The paper concludes with a summary of the results and their implications for the emerging literature on intergenerational mobility in developing countries.
II. Related Literature
The literature on intergenerational economic mobility in developed countries is vast, but the corresponding literature on developing countries is limited at best. The economics literature on intergenerational mobility in developed countries has focused on intergenerational income correlations, with an emphasis on the link between fathers and sons (see, for example, Solon 1992, 1999; Mazumder 2005; Corak and Heisz 1999; Bowles et al. 2005; Bowles 1972).
Research on intergenerational economic persistence in developing countries has been constrained primarily by two types of data limitations. First, the income data on parents and children are not available for more than a few years to allow reliable estimation of permanent income across generations. As shown by a substantial body of literature on developed countries, it is necessary to have good quality income data over a period of more than a decade to address the attenuation bias in the estimate of income persistence (Solon 1992;Mazumder 2005). The household surveys available in developing countries usually provide income information only for a single year, and estimating individual income may be a daunting task in rural areas where self-employment, work sharing, and informal activities predominate (Deaton 1997). The second challenge, which constitutes the focus of this paper, comes fromthe coresidency restriction;most of the surveys suffer from truncation due to coresidency used to define household membership.
The recent economics research on intergenerational mobility in developing countries includes Behrman et al. (2001), Hertz et al. (2007), Binder and Woodruff (2002), Thomas (1996), Lillard andWillis (1995), Lam and Schoeni (1993), Emran and Shilpi (2011, 2015), Bossuroy and Cogneau (2013), Maitra and Sharma (2010), Azam and Bhatt (2015), and Bhalotra and Rawlings (2011, 2013). Most of the studies on economic mobility in developing countries rely on education and occupation as markers of economic status because reliable data on income for long enough time periods to calculate permanent income are not available.11 Most of them also use data selected nonrandomly due to the residency requirement for household membership. Table 1 provides a summary of the data and measures used in 13 studies on intergenerational educational mobility in developing countries.
There is, however, no uniformity in the definitions of “household” across different surveys, although all are concerned with “living together,” “eating together,” and sometimes with “pooling of funds” (Deaton 1997). Examples of household surveys that usually have coresidency as a defining criteria include Household Income and Expenditure Survey (HIES), Demographic and Health Survey (DHS), and Living Standard Measurement Survey (LSMS). There are some household surveys that include limited information on the parents of household head and spouse, but do not include the nonresident children of the household head. Hertz et al. (2007) use household surveys from 21 developing countries (ten Asian, four African, and seven Latin American) and eight formerly Communist countries where household surveys provide information on household head’s parents, but do not include the nonresident children.12 The Indonesian family life survey (IFLS) collects information on up to two noresident children randomly. The IFLS thus has full information for only those households that do not have more than two nonresident children.13 Some of the ongoing panel data surveys track the household members carefully over the years, thus taking care of attrition. But most of the panel surveys we are aware of still rely on some form of coresidency restrictions to define the baseline household membership and thus suffer from truncation bias. For example, the India Human Development Survey does not collect information on married daughters who left the natal family or the sons who started new households after marriage. It is possible that we are not aware of some ongoing panel surveys that include all of the children of household head at the baseline and thus would add substantial value in understanding intergenerational mobility.
Intergenerational Educational Mobility in Developing Countries: Data and Measures
When nonresident children are excluded from the survey, it results in truncation of the sample, information on both the dependent and explanatory variables for them is missing from the data set. This also implies that, in most of the cases, it is not possible to estimate a sample selection equation to correct for the biases because it is not possible to identify if a household is missing children from the survey. The maximum likelihood approach developed by Bloom and Killingsworth (1985) can be applied in this case if multivariate normality is a reasonable assumption.
Although nonrandom sample selection due to coresidency has been a major methodological issue in the research on intergenerational mobility, evidence on the magnitude of coresidency bias has been scarce, with the exception of the analysis of occupational mobility in the UK by Nicoletti and Francesconi (2006). In an interesting paper, they use British Household Panel Survey to estimate the extent of coresidency bias in the estimates of intergenerational persistence in occupational prestige between father and son(s). They use the occupational prestige index of Goldthorpe and Hope (1974) and estimate intergenerational elasticity as a measure of persistence. The evidence reported in their paper shows that the coresidency bias is substantial, ranging between 20 and 40 percent.14 They, however, do not address the question ofwhether intergenerational correlation (IGC) and intergenerational regression coefficient (IGRC) are affected differently by the truncation due to coresidency, which is the focus of our analysis.
We are not aware of any analysis of coresidency bias in the context of educational mobility, either in developed or developing countries. Our analysis can also claimbroader applicability because we use data from two developing countries with substantial differences in the coresidency rates and provide evidence on both father–son and mother– daughter links in educational persistence.
III. Coresidency Restriction and Truncation Bias in a Simple Model
Coresident sample bias common in the household surveys in developing countries is best modeled as a truncation, not censoring. The most common problem in the context of household surveys in developing countries is that there is no information (on both dependent and independent variables) for the nonresident children, resulting in truncation of the sample.
A. Bias in the IGRC Estimate
Consider the standard model of sample truncation widely discussed in the econometrics and statistics literature, adapted to our application (for the econometric literature, see Heckman 1976; Greene 2012; for a statistical treatment, see Cohen 1991). The truncation is from belowand based on a level of schooling of the children T > 0; for example, a girl i with schooling level
leaves the household for marriage and thus is not included in the survey.15
A simple model of the marriage decision is as follows (assuming parents decide marriage for girls):

where vi
is payoff (indirect utility) from marrying off child
is the labor market earnings forgone as a girl leaves the natal family after marriage, and
is the schooling level of girl i. The marriage decisionMi
is a binary indicator that takes on the value of 1 when a girl is married and lives in a separate household.
Denote the set of individuals included in the survey by D. So child i is unmarried and thus coresident with the parents and is included in the survey; that is,
, if the following holds:

So we have the following model of the population relation and data generation:

where
denotes years of schooling of parents. We assume that
.
For simplicity,we ignore other control variables X, such as age of parents and child.A standard result in the literature is that OLS regression in the coresident sample suffers from omitted variables bias because the conditional expectation function is not linear (Heckman 1976):

where σvε is the covariance between vi and εi, and σv is the standard deviation of vi .
The error term in the OLS regression is not εi, but
, which is correlated with
causing omitted variables bias. The omitted variable li is called the inverse Mills ratio and given as follows:

As discussed by Greene (2012), although the bias depends on the correlations in the data, a robust empirical regularity widely observed in the literature is that the OLS estimate is biased downward to zero (see also Hausman and Wise 1977; Cohen 1991). Hausman andWise (1977) discuss a rationale for the downward bias by showing that the OLS estimate is necessarily smaller than the maximum likelihood estimate (see the appendix to Hausman and Wise 1977).
Denoting the OLS estimate in the coresident sample by
, the attenuation bias due to truncation in the OLS estimate can be approximated by the following relationship:16

where

and α is the mean of αi. Our estimates of IGRC
for Bangladesh and India presented below in Section V show that the bias implied by Inequality 4 above can be serious.
B. Bias in the IGC Estimate
The IGC can be estimated from a regression where the variables are normalized so that their mean is zero and variance is one. Denote the IGC (correlation coefficient) between father’s schooling and children’s schooling by ρ. Then we have the following regression model for estimation of IGC:

where

As noted earlier, a bar on a variable denotes the sample mean, and σc and σp are the standard deviations of children’s and parental schooling, respectively, and σηυ is the covariance between the error terms in the children’s schooling and marriage selection equation with the schooling variables standardized. The truncation point in the standardized model is
.
To see that the truncation bias is lower in OLS estimate of Equation 5, note that similar to Equation 4 above, we have the following approximate relation for Model 5:

where

It is easy to check that
, if
. By using the relation that
, we can rewrite
as follows:

Now
follows from the observation that
in a truncated sample because
.
C. Intuition and Discussion
The preceding section provides a theoretical basis for understanding possible differential effects of truncation on the two widely used measures of intergenerational mobility: IGRC and IGC. Here we discuss an alternativeway to think about the coresidency bias in the IGC and IGRC estimates, which may provide additional intuitions.
We focus on the following relationship between IGRC and IGC widely known in the literature (see, for example, Solon 1999):

Asimpleway to understand the evidence presented later in this paper is that truncation biases the estimate of β downward, but it also results in upward bias in the estimate of ratio of standard deviations in schooling σp/σc. As a result, the net bias in IGC (ρ) is smaller than the bias in IGRC (β) estimate. Estimate of the ratio of the standard deviations in our data sets confirms that the magnitude is larger in the truncated samples (see Appendix Table A1).
A standard result from the literature is that truncation reduces the variance of a variable. Since truncation is based on children’s schooling, it affects the variance of children’s schooling directly:

Note that the commonly available household surveys in developing countries include a random sample of parents (household head and spouse), and thus the estimate of the standard deviation of parental schooling is likely to be unbiased.We can put together the relations in Inequality 4 and Equations 8 and 9 to derive the following approximate relation:

Now observe that
because
, and as a result, the bias represented by the right hand side of Approximation 10 is much smaller than the bias in Approximation 4. To give a sense of the magnitudes, δ = 0.9 implies a value of
, and δ = 0.8 implies
. Thus, the IGC estimates from the coresident sample suffer from much less bias when compared to the most widely used measure of intergenerational persistence: IGRC. If the bias in IGRC is 10 percent, the corresponding bias in IGC is one-half of that (5 percent), and when the IGRC estimate is biased downward by 20 percent, the corresponding bias in IGC is about 10 percent. An important implication of the above results is that, at high levels of coresidency rates (δ closer to 1), the difference between the biases in IGRC and IGC will be smaller. The actual biases estimated in the data will, however, also reflect sampling variability. From Relations 4 and 10 above, we get the following approximate results on the slope of the bias in IGRC and IGC estimate with respect to the coresidency rate:


The results in Equations 11 and 12 above are important for comparisons of intergenerational mobility cross-country and over time (for a given country) because they suggest that the bias in IGC estimates responds less with variations in coresidency rate when compared to the IGRC estimates, assuming that coresidency rates are not too low (δ > 0.25).17 Thus, when coresidency rates vary across countries or over time, the IGC estimates are likely to provide us with a more accurate ranking of countries and evolution of economic persistence over time.
IV. Data and Variables
We use two rich data sets particularly suited for the analysis of the extent of coresident sample bias. The source of data on India is the 1999 Rural Economic and Demographic Survey done by the National Council for Applied Economic Research, and the data on Bangladesh come from the 1996 Matlab Health and Socioeconomic Survey (MHSS). The Bangladesh survey collected information on household head and spouse’s all children (including from past marriages) irrespective of their residency status from 4,538 households in Matlab thana of Chandpur district.18 The India survey also collected information on all of household head’s children from the current marriage but not noncoresident mothers of children from earlier marriage(s).We utilize this information to create data sets containing education and other personal characteristics of parents and children. Both of these surveys focus on rural areas in respective countries. An advantage of rural samples is that the bias from censoring due to possible noncompletion of younger children may not be as important because only a few go on to have more than middle school (or high school) education. The children who go for more than high school education (10 years of schooling in Bangladesh and India) are also the children who leave the village household because the “colleges” (for Grades 11 and 12) and universities (for three to four years of undergraduate and graduate study) are not located in villages.19
Child’s Education and His/Her Probability of Nonresidency in Bangladesh and India
Our estimation sample consists of household head and spouse and their children, including those from other marriages in the case of Bangladesh. For the empirical analysis, we use alternative samples defined by different age ranges for the children. Our main results are based on a sample of children aged 13–60 years. To test the sensitivity of our conclusions with respect to the specific age cutoffs, we estimate the IGRC and IGC for a number of alternative age ranges: 16–60, 20–69, and 13–50 years.
Appendix Table A1 reports the summary statistics of the relevant variables for both the Bangladesh and India data sets for our main estimation sample (children in the age range 13–60 years). Several interesting observations and patterns are noticeable in our data sets. The average schooling attainment remains low in rural areas of both Bangladesh and India at the time of the survey years. The mean and median years of schooling are 4.97 and 5.00, respectively, for Bangladesh, and 6.23 and 7 for India. The relatively lower educational attainments in Bangladesh compared with India were present during the parents’ generation as well: median years of father’s education was two years in Bangladesh compared with 2.50 years in India. The average number of children per household in Bangladesh is about 5.74 compared with 3.53 in India. This difference probably reflects the fact that Bangladesh data include information on children from other marriages while India data do not. There are some differences in the age distribution of children also: median age for Bangladesh data is 30 years compared with 33 years for India. The gender gap in education between boys and girls is about one year in Bangladesh in contrast with two years in India.
Appendix Table A1 also reports the ratio of standard deviation of parent’s education to that of children’s education for both all and coresident children in Columns 3 and 7. The ratio is unambiguously smaller in the full sample (including both coresident and nonresident children) compared with that in the coresident sample. This is consistent with the observation noted earlier in the introduction that a higher estimate of this ratio in a coresident sample is likely to partially offset the biases in IGC estimates.
Figure 1 plots the probability of nonresidency at the time of the survey against the schooling of children. The graphs in both Bangladesh (Panel A) and India (Panel B) show that probability of nonresidence is higher in the tails. Also, the probability of nonresidence is higher for girls at any given level of schooling, although the gender gap closes substantially at the right tail in the case of Bangladesh.
V. Empirical Results
We begin the discussion with a graphical presentation of the data, following the classic analysis of truncation in Hausman and Wise (1977). Figures 2 and 3 report the bivariate linear plots of children’s schooling against parents’ schooling for both the full and the coresident samples for Bangladesh and India, respectively.20 The coresidency rate is much higher in the India data than in the Bangladesh data; thus, the resulting truncation bias is likely to be relatively lower in India. For example, in the father–son sample the coresidency rate is 79 percent in India, while the corresponding rate is only 52 percent in Bangladesh. In the mother–daughter samples, the coresidency rates are lower: 39 percent in India and 26 percent in Bangladesh, reflecting that women leave the natal family following marriage in both countries.
Fitted Lines between Parent’s and Children’s Education in Bangladesh (Coresident and Full Samples)
For each country we present three graphs: (i) son–father, (ii) daughter–mother, and (iii) all children–father. The figures show that the slope of the fitted line is smaller in the coresident sample which is consistent with Hausman and Wise (1977). The widely held belief that the coresidency bias in the estimates of IGRC is substantial thus appears clearly visible in the graphs.
In the graphs for the “all children” sample (both sons and daughters), the coresident line intersects the full sample line from above (see Figure 2, PanelA for Bangladesh and Figure 3, Panel A for India). This implies that the surveys miss less educated children from households with low parental education, but miss better educated children from households with high parental education.We thus have both truncation from above and from below.
Fitted Lines between Parent’s and Children’s Education in India (Coresident and Full Samples)
A closer look at the other graphs reveals some interesting differences across gender and countries. In Bangladesh, the fitted lines in the father–son sample (see Figure 2B) intersect each other at a very low level of father’s education, implying that most of the coresident line lies below the full sample line. This implies that, for most of the distribution, the better educated sons leave the parental household. For the mother– daughter sample in Bangladesh (Figure 2C) the pattern of truncation is different; the line for the coresident sample intersects the full sample line from above at about five years of mother’s schooling, which is very high given that the average education for mothers is only 1.47 years. This implies that the coresident line lies above the full sample line for most of the cases; the girls with relatively lower education leave the parental household (presumably following marriage, they relocate to husband’s house). Also, the gap between the coresident and full sample lines becomes smaller as the parental education increases, which suggests that the probability of a less educated girl leaving her parental household becomes smaller when parent’s education is higher. This can be interpreted as suggestive evidence that better educated parents are less likely to marry off their daughters without completing high school (10th grade in both Bangladesh and India).
The figures for India (Figures 3A, 3B, and 3C) are broadly similar, although the effect of truncation on the slope is smaller compared to the case of Bangladesh, especially in the father–son sample, which reflects that the coresidency rate is very high for sons in India. However, the graphs again tell a consistent story—in all three groups, the coresident fitted line has lower slope than that in the fitted line in the full sample. The intersection points of the coresident and full sample lines are, however, more to the right, implying that for the lower educated parents, it is the lower educated children that leave the household, and for the higher educated, it is the opposite.
While the graphical exploration provides suggestive evidence, to measure the extent of bias in IGRC and IGC, we now turn to the estimates for both Bangladesh and India. We first discuss the results for the “all children” sample (that is, including both sons and daughters). These provide average estimates across gender and are useful as summary measures. We then provide estimates for the father–son and mother–daughter intergenerational persistence, which have been the focus of most of the economics literature.
The regression specification used for estimating the IGRC and IGC is motivated by Solon (1992) and includes age and age squared of both the child and the father.21 As robustness checks, we also estimate a number of alternative specifications, starting with a simple bivariate model where no controls are used. In addition to the quadratic age formulation standard in the literature, we use a completely flexible specification of the effects of age by including dummies for different years of age. The estimates are very robust; the numerical magnitudes of IGRC and IGC estimates vary little, if at all, across different specifications. The estimates from these alternative specifications are not reported for the sake of brevity.
Following the literature, we estimate the following regressions by OLS (denote IGRC by β and IGC by ρ):


In the IGRC regression,
and
are years of schooling of children and parents, respectively, and X is a set of control variables. The schooling variables in the IGC regression (
and
) are normalized to have zero mean and unit variance.
To help keep track of the discussion across different samples, we note here again the terminology used. We call “all children” when the sample includes both sons and daughters. A “full sample” includes both coresident and nonresident members, and “coresident sample” includes only the members who are coresident in the household at the time of the survey. The standard errors reported in this paper are heteroskedasticity robust and clustered at the household level.
A. Estimates for All Children (Sons and Daughters)
1. Evidence from Bangladesh
Table 2 reports the estimates of IGRC and IGC for all children in Bangladesh data, that is, sons and daughters combined together. The first two columns in Table 2 report the estimates of IGRC for the full and coresident samples (top panel) and the implied bias (bottom panel).We use three different measures of parental education: father’s schooling, mother’s schooling, and the average of father’s and mother’s schooling. Note that some researchers also use maximum schooling (of mothers and fathers) as a measure of parental education. In our data sets, the father has higher schooling in most of the cases, and the correlation between the maximum parental schooling and father’s schooling is high enough to yield virtually identical estimates of IGRC and IGC. In addition to quadratic age controls, we also include a dummy for gender of the child in the regression specification for all children.22 This implies that any common factors (such as cultural norms) that might affect the average schooling attainment of girls irrespective of parental socioeconomic status are absorbed as a shift in the intercept.
The estimates in the top panel of Table 2 provide strong evidence that truncation bias in the IGRC estimates is substantial for all three definitions of parental education. Consistent with the expectation based on the graphs discussed above, the IGRC estimate in the coresident sample is significantly biased downward. The null hypothesis that the estimate from the coresident sample is equal to the estimate from the full sample is rejected unambiguously with P-values equal to 0.00 in all of the different cases.23 The pattern is remarkably consistent and justifies the widespread opinion that there are good reasons to expect the IGRC estimates to be biased downward due to nonrandom truncation because of the coresidency requirement used in the household surveys.
To get a better sense of the implied magnitudes, we report bias defined as follows (using IGRC as an example),

where IGRCCR denotes the estimate from a coresident sample, while IGRCF is the estimate from the corresponding full sample including nonresident household members. 24 An important advantage of the above measure is that it is free of units of measurement. Since IGC and IGRC are measured in different units, the absolute bias (IGRCF - IGRCCR ) as a measure may be misleading.25
Truncation Bias in Measures of Intergenerational Persistence Due to Coresidency: Bangladesh (All Children Sample: Household Head’s Sons and Daughters)
The first column in the bottom panel of Table 2 reports the bias in the IGRC estimates from the coresident sample. The evidence is clear: the estimate from coresident sample is biased downward, and the magnitude of bias is substantial across all three indicators of parental education. The bias is the highest when mother’s schooling is the indicator of parental education (34 percent), and the lowest in the case of average parental schooling (24 percent), with average bias of 29.7 percent.26 A 30 percent bias on average vindicates the unease among scholars that the available household surveys in developing countries may not be particularly helpful in understanding the magnitude of intergenerational persistence in economic status.
We nowturn to the IGC estimates for Bangladesh reported in Columns 4 (full sample) and 5 (coresident sample) of Table 2. The estimated IGCs for three different indicators of parental education are reported in the top panel, and the implied biases are reported at the bottom. The evidence is strikingly different; the estimate of IGC from the coresident sample is much closer to that from the full sample, and this is true for all three different indicators of parental education (top panel). The average bias in the IGC estimates is 8.7 percent, which is less than one-third of the average bias in the IGRC estimates (29.7 percent). The highest magnitude of bias is 11 percent in the case of IGC, which is less than half of the lowest bias found in the IGRC estimates (24 percent).
2. Evidence from India
Table 3 reports estimates of IGRC and IGC from India data for three different indicators of parental education (father’s schooling, mother’s schooling, and average schooling of mother and father). The difference between the IGRC estimates from the coresident and full samples in the case of India are smaller in magnitude compared to the estimates from Bangladesh (compare top panel of Table 2 to that of Table 3). The average bias is about 17.6 percent. While the extent of bias is not as dramatic as in the Bangladesh data, the evidence still indicates that truncation due to coresidency causes substantial downward bias in the IGRC estimates. The relatively lower truncation bias in the India estimates reflects the fact that the proportion of coresident children is higher in India compared to Bangladesh (61 percent in India and about 40 percent in Bangladesh).
The IGC estimates in Columns 4 and 5 in Table 3 show that the truncation bias in IGC estimates is significantly smaller. The average bias in IGC for India is 10.4 percent, which is much smaller than the 17.6 percent average bias found in the IGRC estimates.
The evidence in Tables 2 and 3 thus suggest that (i) truncation due to the coresidency restriction in a survey causes large downward bias in the estimates of IGRC, and (ii) the corresponding bias in the IGC estimates is substantially lower. The widespread caution about coresidency bias seems right on target for IGRC estimates, while the IGC estimates from coresident samples are much closer to the estimates from the full samples.
3. India–Bangladesh Comparison
If a researcher relies on IGRC estimates from coresident samples to understand differences between Bangladesh and India in intergenerational persistence in schooling, she is more likely to reach an incorrect conclusion. For example, with father’s schooling as the measure of parental education, the IGRC estimates for India and Bangladesh are very close to each other (0.42 in Bangladesh and 0.43 in India, implying a 2 percent higher estimate in India), which suggests that educational mobility is similar in the two neighboring countries. However, the results from the full sample show a different picture: a 12 percent higher estimate of IGRC in Bangladesh. In contrast, the IGC estimates from coresident samples show a 12 percent larger estimate for Bangladesh, much closer to the corresponding estimate from the full sample: a 16 percent larger estimate for Bangladesh. The IGC estimates from coresident samples thus lead to the correct ranking that educational mobility is lower in Bangladesh and also provide a reliable measure of the gap between the two countries. The other estimates in Tables 2 and 3 also show that the IGRC estimates from coresident samples underestimate the gap between Bangladesh and India, while the IGC estimates yield a much more consistent and reliable picture. A broader implication of the above examples is that cross-country comparisons of intergenerational mobility based on IGRC, by far the most commonly used measure, are more likely to yield incorrect conclusions, while IGC based comparisons seem reliable.
Truncation Bias in Measures of Intergenerational Persistence Due to Coresidency: India (All Children Sample: Household Head’s Sons and Daughters)
B. Estimates of Father–Son and Mother–Daughter Schooling Persistence
In this subsection, we discuss the biases in the IGRC and IGC estimates for the intergenerational link between the father and sons and the mother and daughters. While father–son intergenerational persistence in economic status has been the most widely researched topic both in developed and developing countries, it is probably equally (if not more) important from a policy perspective to understand the barriers faced by the girls in education. The results on father–son linkage are reported in the upper panel of Table 4, and the bottom panel contains the corresponding estimates for mother–daughter persistence in schooling.We report the estimates of bias and test the null hypothesis of zero bias (that is, that the estimates from the coresident and the full samples are equal). For the sake of brevity, we omit the underlying estimates of IGRC and IGC. The estimates for Bangladesh are in the first two columns, and the last two columns refer to the corresponding results for India.
1. Bangladesh
The estimates of father–son intergenerational link in schooling for Bangladesh shows that the IGRC estimate in the coresident sample suffers from strong downward bias; the bias is 29.5 percent (Row1, Column 1 in the top panel of Table 4). The bias in father–son IGRC estimate is thus similar to the average bias for the all children sample discussed above: 29.7 percent. The corresponding bias in the estimated IGC is much smaller: only 8.9 percent (Row 2, Column 1).
The results for mother–daughter in Bangladesh are reported in Columns 1 and 2 of the lower panel of Table 4. The bias in the IGRC estimate from the coresident sample is much stronger at 45.6 percent—a very high magnitude indeed. This illustrates starkly that relying on the coresident sample can lead to a grossly misleading picture of intergenerational persistence between mother and daughter(s). This high bias reflects the fact that the degree of truncation is very high in the daughters’ case; only 26 percent of the full sample satisfies the coresidency restriction in Bangladesh data (for sons it is 52 percent of the full sample). The bias in the IGC estimate from coresident sample is again much smaller in magnitude: 10.6 percent.
2. India
The estimates of father–son schooling persistence for India are reported in Columns 3 and 4 of the top panel of Table 4. The IGRC estimate for India shows that the downward bias due to coresidency is substantial; the estimate from the full sample is 29.5 percent higher than the estimate in the coresident sample. The bias in the father–son sample in India is thus significantly larger than the average bias we found earlier for the sample of all children across different measures of parental education (17.6 percent). In sharp contrast, the IGC estimate suffers from very little coresidency bias: 2.4 percent only. The estimated bias in the IGC estimate for father–son in India is thus ignorable, while the IGRC estimate suffers from strong downward bias from coresident sample selection. The bias estimates for mother–daughter schooling persistence in India are reported in Columns 3 and 4 of the lower panel of Table 4. The bias in the IGRC estimate for mother–daughter is smaller for India when compared to Bangladesh, but the magnitude of bias is still substantial 21.8 percent. The corresponding biases in IGC estimates is 9.7 percent, which is less than half of the bias in the IGRC estimate.
Truncation Bias in Intergenerational Persistence between Father–Sons and Mother–Daughters (Bangladesh and India)
C. Gender Differences in Intergenerational Schooling Persistence
An important policy issue in many developing countries is whether the girls face especially strong barriers to educational mobility. If we rely on the IGRC estimates from coresident samples, the gender gap may seem smaller than it really is because the truncation bias is usually stronger for the estimates for girls, as coresidency rates are lower (this is true in both Bangladesh and India data). But the IGC estimates from coresident samples provide a reliable measure of the gender gap. For example, consider the estimates of father–son and mother–daughter persistence in Bangladesh in Table 4 and the corresponding estimates for different age ranges presented in Online Appendix Table T.2.27
Averaging over estimates for four age ranges in Tables 4 and Online Appendix T.2, the IGRC estimates from coresident samples suggest that persistence is about 33 percent higher for daughters, while the correct estimate from full samples is 45 percent. In contrast, the gender gap estimates based on IGC are similar across full (4.5 percent higher for daughters) and coresident (4.8 percent higher for daughters) samples. Thus, consistent with the cross-country comparisons discussed above, when working with coresident samples, it is preferable to use IGC as a measure of intergenerational persistence to understand gender gap in educational mobility.
D. Robustness Checks: Evidence from Alternative Age Ranges
Our main results in Tables 2–4 use samples where the age range is 13–60 years. This is motivated by the fact that the average schooling attainments in rural Bangladesh and India remain low in the survey years, so that a 13 years lower threshold may not be binding for most of the rural children. In Bangladesh data, the average years of schooling is only 4.43 years; for sons it is 5.5 years and for daughters 3.4 years. The average schooling in India is five years; for sons it is seven years, and for daughters it is 3.7 years. To explore the sensitivity of the conclusions with respect to the age range of children, we estimate the IGRC and IGC across a number of different age ranges. For the sake of brevity, we report estimates from the following age ranges: (i) 13–50 years, (ii) 16–60, and (iii) 20–69 years. The evidence from these different age ranges is consistent with the estimates from our main sample of 13–60 years. Considering estimates across different age ranges and gender, the average biases in IGRC and IGC are 24.4 percent and 6.5 percent, respectively, in Bangladesh, and the corresponding estimates in India are 14.12 percent (IGRC) and 7.6 percent (IGC). Please see the related discussion and Tables T.1 and T.2 in the Online Appendix for details.
E. Coresidency Rates and the Extent of Bias
An interesting aspect of the results presented above is that there is significant variation in the coresidency rates across Bangladesh and India data, and the bias estimates reflect the differences in the severity of truncation. Since we estimated the biases in IGRC and IGC for a number of different samples, one might wonder how the magnitude of the bias relate to coresidency rate across different samples. Figure 4 shows the relation between the coresidency rate and the estimated bias for both IGRC and IGC estimates. There is a clear negative relation between coresidency rate and the magnitude of bias in the case of IGRC, implying that comparing IGRC estimates from different data sets may not be appropriate. In contrast, there is no discernible relation between the bias in IGC estimates and the coresidency rate. An OLS regression of the bias in IGRC estimates on coresidency rates yield a coefficient of -0.22, which is significant at the 1 percent level (t statistic equals -2.50).28 The coefficient on a regression of the bias in IGC estimates on the coresidency rates is, in contrast, numerically very small (0.008) and statistically insignificant (t = 0.30 and P-value = 0.77). The evidence is consistent with the theoretical results discussed earlier in Section III.
The coresidency rate is likely to vary substantially across different countries, which would depend on a variety of economic and cultural factors, such as labor market opportunities for children, costs of housing, availability of public welfare schemes for aging poor parents, among other things. An immediate and important implication of this observation is that one should be cautious about the IGRC estimates for cross-country comparison of economic mobility—the focus instead should be on the estimates of IGC.
Coresidency Rate and Biases in Estimates of IGRC and IGC
VI. Conclusions
We take advantage of two rich data sets from Bangladesh and India to explore the direction and magnitude of the truncation bias due to coresidency in the two most widely used measures of intergenerational persistence: intergenerational regression coefficient (IGRC) and intergenerational correlation (IGC). The evidence reported in this paper shows that the worry about coresidency bias is well justified when the focus is on estimating IGRC, by far the most popular measure among economists.29 The IGRC estimates, in general, suffer from substantial downward bias in coresident samples, vindicating the skepticism among researchers and journal editors about the usefulness of data with coresidency restriction. The bias in IGC estimates is, however, much smaller in magnitude—less than one-third of that in the IGRC estimates on average. We discuss theoretical explanations and intuitions behind the empirical results. The biases in both IGC and IGRC converge to zero as coresidency rate converges to 100 percent, but, even with high coresidency rates, IGC estimates are preferable.
Robust evidence on the direction of bias, that is, that the estimates of intergenerational persistence are highly likely to be biased downward in coresident samples can be useful in understanding changes in economic mobility over time, given that coresidency rates are usually lower in the younger generations. This adds a caveat to the optimistic picture of intergenerational educational mobility in some countries, such as India, based on declining IGRC estimates from coresident samples over time.
Our analysis also suggests that the IGC estimates are much less sensitive to the variation in coresidency rates compared to the IGRC estimates. Since coresidency rates can vary substantially across countries, over time, and across genders, the IGC estimates are likely to be more reliable for understanding the pattern and evolution of intergenerational mobility. The evidence shows that the IGRC estimates from coresident samples lead to the incorrect conclusion that intergenerational schooling persistence is virtually the same in India and Bangladesh. In contrast, the IGC estimates from coresident samples yield the correct conclusion that persistence is higher in Bangladesh, and also provide a reliable estimate of the gap between the two countries. The evidence from both Bangladesh and India shows that coresidency rates are lower for girls, and thus the persistence estimates suffer from stronger downward bias, which may generate a false impression of lower gender gap.
The evidence and analysis in this paper thus provide a strong rationale for focusing on IGC as a measure of intergenerational mobility in the context of developing countries. Perhaps, the most important implication of our analysis is that a large number of good quality household surveys in developing countries that use coresidency to define household membership (for example, LSMS and HIES) are not worthless in analyzing the strength, pattern, and evolution of intergenerational economic persistence. Much progress could be made with the imperfect data if researchers move away from the current emphasis on IGRC and use IGC as the appropriate measure instead.
Summary Statistics
Acknowledgments
The authors thank Matthew Lindquist, Hector Moreno, Tom Hertz, Claudia Berg, seminar participants at NEUDC 2015 at Brown University, and three anonymous referees for helpful comments on earlier versions, as well as Gabriela Aparicio for help with data at the early stage of this project. An earlier version of the paper was circulated under the title “When Measure Matters: Coresident Sample Selection Bias in Estimating Intergenerational Mobility in Developing Countries”; this version supersedes the earlier version. The data used in this article can be obtained beginning January 2019 through January 2022 from Forhad Shilpi (fshilpi{at}worldbank.org).
Footnotes
1. The literature on intergenerational mobility in developed countries is rich with a distinguished pedigree. For excellent surveys of the literature, see Solon (1999), Black and Devereux (2011), Björklund and Salvanes (2011), and Corak (2013). A partial list of the contributions includes Bowles (1972), Becker and Tomes (1979), Atkinson et al. (1983), Solon (1992), Mulligan (1997), Arrow et al. (2000), Black et al. (2005), Björklund et al. (2006), Chetty et al. (2014), Lefgren et al. (2012), Black et al. (2015), and Becker et al. (2015).
2. Coresidency restriction results in a truncated sample, as the surveys do not gather any information on the family members who do not satisfy the coresidency criteria.
3. Some of the children of the household head may not be part of the household at the time of the survey for a variety of reasons, such as higher education, job, marriage, and household partition.
4. This is true even for a country such as India where there is a long tradition of high quality household survey data collection. Bardhan (2005) identifies intergenerational mobility as one of the underresearched areas of economic research in India.
5. The prevailing view, in fact, holds that the estimates from coresident samples are not useful, and it is partly driven by the fact that the researchers do not know the direction and magnitude of the bias.
6. In general, it is not meaningful to rank one or the other as the “better” measure. For discussions on IGC and IGRC, see Solon (1999), Hertz et al. (2007).
7. As we discuss later, the difference in biases between IGRC and IGC becomes smaller as the coresidency rate increases.
8. We focus on IGRC and IGC, as they are two of the most widely used measures of mobility, but our results suggest that when working with coresident samples, one should avoid measures that do not normalize for changing variances across generations.
9. See, for example, Emran and Sun (2015) on China and Emran and Shilpi (2015) on India.
10. The evidence below shows that, based on IGRC estimates from coresident samples, one would conclude, incorrectly, that intergenerational educational persistence is similar in India and Bangladesh, when the IGRC estimates fromfull samples show that persistence is substantially higher in Bangladesh. In contrast, the IGC estimates from coresident samples provide both a correct ranking and a reliable estimate of the gap between the countries.
11. The evidence on intergenerational health transmission is more limited, both in developed and developing countries. Bhalotra and Rawlings (2011, 2013) provide cross-country analysis using Demographic and Health Survey data for 38 countries.
12. Hertz et al. (2007) are careful about sample truncation bias, and they do not focus on the household head’s children as has been the case in many recent studies that rely on data without nonresident children. To the best of our knowledge, the only survey in the Hertz et al. list of countries that cover all of the nonresident children in the survey is that for Bangladesh.
13. The Mexican Family Life Survey (MxFLS) excludes the children who did not live in the household for a year or more.
14. They provide an extensive analysis of alternative econometric approaches for selection correction. Their findings indicate that the inverse probability weighted estimator is the most reliable to tackle coresidency bias among a number of approaches including Heckman selection correction. The selection correction necessarily requires that information on which households are missing members is available in the data set, which unfortunately is not the case in most of the household surveys in developing countries.
15. The specific model of truncation due to marriage developed below applies mainly to nonresident daughters in the context of India and Bangladesh. The nonresidency of sons is due more to migration for jobs or higher education, and truncation in that case occurs in the right tail of the education distribution. It is easy to check that the conclusions regarding the bias in IGRC and IGC derived below carry over to a model where truncation is from above.
16. See Greene (2012) for a more complete discussion on this.
17. The lowest coresidency rate across different samples in our analysis is 26 percent, for the mother–daughter sample in Bangladesh.
18. The MHSS 1996 is a collaborative effort of RAND, the Harvard School of Public Health, the University of Pennsylvania, the University of Colorado at Boulder, Brown University, Mitra and Associates, and the International Centre for Diarrhoeal Disease Research, Bangladesh (ICDDR,B).
19. While these data sets are exceptionally rich in information on nonresident children of household head, and thus are perfectly suited for the task at hand, they have their own limitations. The Bangladesh data come from only one district and thus are unlikely to be representative of the country as a whole. The India data are more representative in terms of geographic coverage as the survey includes 17 states, but the random sample of households originally chosen for the panel survey may no longer be representative of a broad cross-section of rural households in India.
20. In the Online Appendix (http://jhr.uwpress.org/), we present nonparametric graphs, which show a very slight concavity in the relationship. Please see Figure F2.We, however, focus on the linear fitted lines following Hausman and Wise (1977) because both IGC and IGRC are estimated from linear regression.
21. Mother’s age is missing for a significant proportion of children.
22. The estimates and the conclusions do not depend on the inclusion of the gender dummy.
23. We, however, note here that the formal test of equality of estimates may not be very useful in our context. Even with very small numerical difference between the estimates from the full and coresident samples, one can reject the null hypothesis of equality simply because the standard errors are extremely small (see, for example, the IGC estimates). So the focus should be on the magnitude of the bias not the statistical test of equality of estimates.
24. We use the estimate from coresident sample as the base for defining the bias, because the unbiased estimate is not known to a researcher with coresident sample, but she can relate to the bias defined above.
25. It may also be useful to convert the bias in IGC in units of years of schooling.We can rearrange the relation between IGC and IGRC noted in Section III.C for this purpose: IGRC = IGC(σc/σp). To understand the implications of the bias in the IGC estimates from a coresident sample in units of school years, one needs to use an unbiased estimate of (σc/σp) from the full sample.
26. It is the simple average of the three bias estimates in the bottom panel.
27. We focus on father–son and mother–daughter links as the persistence runs along gender lines with cross effects much smaller.
28. This can be useful for a researcher with coresident sample to get a back-of-the-envelope estimate of the magnitude of the bias when she has information about the coresidency rates from alternative sources.
29. The same negative conclusion is likely to hold for other related measures of mobility where the focus is on the slope parameter of a regression without normalization to take into account changes in variances.
- Received February 2016.
- Accepted January 2017.