Skip to main content

Main menu

  • Home
  • Content
    • Current
    • Ahead of print
    • Archive
    • Supplementary Material
  • Info for
    • Authors
    • Subscribers
    • Institutions
    • Advertisers
  • About Us
    • About Us
    • Editorial Board
  • Connect
    • Feedback
    • Help
    • Request JHR at your library
  • Alerts
  • Call for Editor
  • Free Issue
  • Special Issue
  • Other Publications
    • UWP

User menu

  • Register
  • Subscribe
  • My alerts
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Human Resources
  • Other Publications
    • UWP
  • Register
  • Subscribe
  • My alerts
  • Log in
  • My Cart
Journal of Human Resources

Advanced Search

  • Home
  • Content
    • Current
    • Ahead of print
    • Archive
    • Supplementary Material
  • Info for
    • Authors
    • Subscribers
    • Institutions
    • Advertisers
  • About Us
    • About Us
    • Editorial Board
  • Connect
    • Feedback
    • Help
    • Request JHR at your library
  • Alerts
  • Call for Editor
  • Free Issue
  • Special Issue
  • Follow uwp on Twitter
  • Follow JHR on Bluesky
Research ArticleArticles
Open Access

Efficient Targeting in Childhood Interventions

Alexander Paul, View ORCID ProfileDorthe Bleses and View ORCID ProfileMichael Rosholm
Journal of Human Resources, January 2026, 61 (1) 160-184; DOI: https://doi.org/10.3368/jhr.0320-10756R4
Alexander Paul
Alexander Paul is a manager at E.CA Economics, Berlin, Germany, and an associate of the TrygFonden’s Centre for Child Research, Denmark.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dorthe Bleses
Dorthe Bleses is a professor at the School of Communication and Culture, Aarhus University, and TrygFonden’s Centre for Child Research, Aarhus, Denmark.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Dorthe Bleses
Michael Rosholm
Michael Rosholm is a professor of economics at the Department of Economics and Business Economics, Aarhus University, and an affiliate of TrygFonden’s Centre for Child Research, Aarhus, Denmark, and IZA, Bonn, Germany (corresponding author, ).
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael Rosholm
  • For correspondence: rom{at}econ.au.dk
  • Article
  • Figures & Data
  • Supplemental
  • Info & Metrics
  • References
  • PDF
Loading

Abstract

Using Danish administrative data, we address the issue of efficient targeting in childhood interventions. We define children to be in need of an intervention if they experience one or more socially undesirable outcomes in adulthood. Because interventions are very effective early in life, we then test if and to what extent indicators available at birth can predict these outcomes. We find fair to good levels of prediction accuracy for many outcomes, driven by a parsimonious set of predictors. We show that optimal weights for the construction of risk scores deviate from the weights typically used in targeted interventions.

JEL Classification:
  • I18
  • I28
  • I38

I. Introduction

Many childhood interventions, such as the famous Perry Preschool Project (Weikart 1967), target a selected subset of children rather than applying the intervention universally to the whole cohort. One rationale behind such targeting is the budget constraint. Even more important is that only certain children promise a sufficiently large return to justify the cost of the intervention. Children who already enjoy a beneficial environment in the absence of the intervention would benefit little or might even be harmed.1 Although the theoretical motivation for targeting interventions is evident, an important question in practice is which children should be targeted. Scholars have thus called for the development of “measures of risky family environments…that facilitate efficient targeting” (Heckman 2008, p. 314).

In this article, we approach the problem of efficient targeting from a long-term perspective. We define children as disadvantaged (in need of an intervention) if they are at risk of experiencing undesirable outcomes in their adult life, such as criminal behavior or high healthcare use. We exploit rich register data from Denmark to predict disadvantage, using only data available at the child’s birth. This constraint is motivated by recent studies in human development showing that childhood intervention programs are highly effective if administered very early in life (Heckman 2006; Cunha, Heckman, and Schennach 2010; Allen 2011). According to this literature, and with perfect information, children should be targeted immediately after birth, as in the prominent Carolina Abecedarian Project (Ramey et al. 1974). Children at high risk for adverse outcomes are likely to benefit greatly from early intervention.2

Some prominent childhood interventions target children by means of a risk score. Only children that score sufficiently high are eligible for participation. A regular ingredient in defining the risk score is a measure of the family’s socioeconomic status (SES), such as household income or parental education.3 When combining all the indicators into a one-dimensional score, interventions tend to place a larger weight on family SES, but they generally do not motivate the exact weight given to each component. As an example, the Carolina Abecedarian Project (Ramey et al. 1974, p. 65) constructed a “high risk index” that increased by one unit for each year missing from 12 years of parental schooling (separately for both the father and mother). It also increased by four units if family income fell short of 5,000 dollars and by one additional unit for each further 1,000-dollar reduction. Ramey et al. (1974) note that the weights they assigned to each component were based on a “best guess” (p. 10–11).

The unclear motivation behind the chosen weights suggests that risk scores might be substantially improved by answering the following basic questions. First, what relative weight should each indicator optimally receive? Does parental schooling matter more than income or vice versa? Second, do paternal and maternal characteristics matter in the same way, for example, with respect to years of schooling? Third, is the relationship of the outcomes with the predictors nonlinear, and are there interactions among predictors? For example, is each additional year of parental schooling equally important?

This study takes an econometric approach to address the problem of optimally selecting and weighting early indicators of disadvantage. We start by performing standard logit regressions to predict long-term outcomes using predictors measured at birth. Logit regression has the advantage that it allows for easy computation and interpretation of risk scores. However, it is not necessarily the best at prediction, as interactions among predictors and functional forms must be explicitly specified, and there are no safeguards against overfitting. We therefore also apply more sophisticated machine learning techniques that are known for good predictive power, in part because they implicitly allow for interaction terms among predictors and flexible functional form and prevent overfitting. We include indicators of family SES (income, education, and occupation) and several other parental variables, such as hours of work, health status, and criminal activity as predictors. Some of these variables potentially correlate with quality time investments, which play an important role in human capital formation (for example, Del Boca, Flinn, and Wiswall 2014). At the child level, we include sex, birth order, county of residence at birth, and region of origin. We examine whether and which of the child and parental variables can predict adverse outcomes in adulthood and derive optimal weights for the formation of composite risk scores.

The outcomes we consider are meant to capture the economic cost of different social dimensions, ranging from education and labor market outcomes to health and crime. The cost associated with these outcomes is not spread evenly across all members of society but can vary substantially. Indeed, it has been shown that a relatively small fraction of the population accounts for a sizable share of the total economic burden (Caspi et al. 2017; Richmond-Rakerd et al. 2020). We rank children by the outcome-specific cost they generate in adulthood and define the top 20 percent of the distribution as “at risk” of the particular outcome. In the case of social benefits, for instance, the top 20 percent recipients in our sample account for 76 percent of total benefit receipt. We aim to predict which children are at risk and can potentially be targeted by an intervention. We also predict which children risk experiencing combinations of these outcomes.

First, we find that predictions using register data available at birth are possible and often yield fair to good prediction accuracy. Predictions are most accurate for educational attainment, criminal behavior, placement in foster care, and combinations of these outcomes, but are less accurate for health-related outcomes.

Second, we find that logit regression performs well. Predictions generated by other machine learning methods are generally neither statistically nor economically significantly different from logit regression. The reason for this failure to outperform logit could be data limitations. Even though our data set includes the full Danish birth cohorts and many predictors, it is possible that even richer data would allow more sophisticated models to play to their strengths.

Third, we find that updating the predictors with data from a few years after birth improves predictions very little. It seems as if further improvements are only possible with indicators of the child’s behavior and skills, which are much more costly to obtain than the variables generally available from register data.

Fourth, we find that indicators of parental SES are highly predictive. A parsimonious set of indicators consisting of sex, parental education, and income yields predictions that are almost as accurate as those obtained using the full set of predictors. Knowledge of a child’s sex and a few variables related to socioeconomic background may therefore be sufficient for effectively targeting children in childhood interventions. Our study thus provides support for the practice of including measures of parental SES as a key ingredient in the construction of risk scores.

Finally, we derive optimal weights for the formation of risk scores and find that they deviate in important ways from the weights typically used in childhood interventions. In the discussion, we also point out that, due to treatment effect heterogeneity, risk scores are best thought of as a tool that assists human experts in their decision on final treatment assignment, and we discuss the practical relevance of the risk scores, including their sensitivity to policy interventions.

A study closely related to ours is Caspi et al. (2017). They find that a small set of predictors consisting of SES, a maltreatment indicator, IQ, and self-control could accurately predict adverse outcomes for 1,037 New Zealanders at age 38. Predicting whether children experience combinations of multiple adverse outcomes works particularly well. They use predictors that are recorded up until age 11, which is too late for effective early interventions. In addition, obtaining measures of IQ or self-control for the whole population would be relatively costly. In contrast, we use only indicators that are inexpensive to measure, available from the Danish registers, and available at birth. Chittleborough et al. (2016) also use only predictors from around birth. However, they study outcomes at age five (before schooling starts), thus missing substantial information on social burden that only a long-term perspective can offer. To predict high school dropout in Denmark, Şara et al. (2015) employ Danish register data but use predictors available after high school entry.

This work also relates to other strands of the literature. First, we use machine learning techniques to predict which children are most in need of help. A growing number of studies address similar “prediction policy problems” (Kleinberg et al. 2015) in various contexts, for example, regional allocation of refugees (Bansak et al. 2018), shootings among at-risk youth (Chandler, Levitt, and List 2011), food-safety inspections (Glaeser et al. 2016), hip and knee replacements (Kleinberg et al. 2015), and judicial bail-or-release decisions (Kleinberg et al. 2018a). Second, our study loosely relates to the theoretical literature on optimal treatment assignment (Bhattacharya and Dupas 2012; Kitagawa and Tetenov 2018; Manski 2004). This literature typically uses experiments or observational studies to estimate covariate-specific heterogeneous treatment effects. We do not observe treatment effects associated with a particular intervention. Instead, we suggest that treatment should be assigned by human agents to children who are at risk of an adverse outcome and who thus have the potential to benefit from an appropriately designed intervention.

Finally, our paper adds to the discussion of targeted versus universal programs. Targeted programs are a response to limited resources, which is particularly important in the context of the Scandinavian welfare state that continually struggles with the Baumol cost disease. Since increases in productivity are lower in the public than in the private sector (in part due to its larger share of labor in production), but wages in the public and private sector increase at the same rate (due to, for example, institutional arrangements and unions’ bargaining power), the welfare state will eventually be faced with the problem that the level of services offered cannot be sustained indefinitely. Targeting may provide a (temporary) solution to this problem. Moreover, targeting may avoid potentially negative effects on subgroups of the population (for example, Havnes and Mogstad 2015; Cornelissen et al. 2018). At the same time, targeted programs might lead to stigmatization and are less effective when disadvantaged children are hard to identify.

In the following, Section II presents some organizational and theoretical considerations that motivate our analysis. Section III deals with the practical aspects of prediction, including the data, estimation, and the specific prediction methods. Section IV reports the results. Section V discusses our findings and concludes.

II. Organizational and Theoretical Considerations

The organizational setup we have in mind is one where econometrically estimated weights are used to produce a risk assessment, identifying children who are at risk of experiencing adverse outcomes later in life. This risk assessment can subsequently be used by human experts to determine which at-risk children/families to offer assistance. Determining the appropriate type of assistance comes in a subsequent step and is at present exclusively in the hands of human experts. Such a setup is similar to practices in the criminal justice system, where justices combine data-driven risk assessments with their own expertise to make the final decision on pre-trial release, sentencing, or parole (see Berk 2019; Ludwig and Mullainathan 2021, for overviews).4

In Online Appendix Section A, we develop a simple model that demonstrates how prediction can help the policymaker to assign treatment in a welfare-improving way. We summarize the key aspects of the model here. We assume that treatment has a positive and homogeneous effect on at-risk children and does not harm children that are falsely identified as at risk. The issue of homogeneous treatment effects is further discussed in Section V. Given a certain fraction of the cohort to be targeted by the intervention, the policymaker should maximize the number of correctly identified at-risk children out of all at-risk children (the true positive rate, TPR), as previously shown by Sansone (2019) using a similar model. This result is intuitive. As we assume that the intervention does no harm, we need not be concerned about false positives. Next, the policymaker chooses what fraction of the cohort should receive the intervention. If prediction is of value, the marginal TPR, and thus also the marginal expected benefit from the intervention, will decline as more children are targeted. Improving upon random and uninformed treatment assignment, prediction enables the policymaker to optimally administer the intervention to a selected fraction of the cohort for which the marginal expected benefit exceeds the marginal cost.

III. Data and Methods

A. Data

1. Sample

Our sample consists of the full Danish birth cohorts from the years 1985, 1986, and 1987. This choice is motivated by the fact that most of the Danish register data is available from 1980 onwards, so a certain period after this year is required to construct powerful predictors, such as parental crime or parental hospital admissions. At the same time, children should not be born too recently, so we can also observe their relevant adult outcomes, such as educational attainment or disposable income. Because we use parental predictors, we also impose the condition that both parents are known to us and have lived in Denmark continuously since 1980. The final sample contains 149,755 children.

2. Outcomes

The age at which we measure the children’s outcomes ranges from 28 to 33 years, depending on the availability of the most recent data. We focus on seven outcomes that capture the economic cost of different societal dimensions, ranging from education and labor market outcomes to health and crime. Reflecting the principle that a small fraction of the population accounts for a disproportionate share of the total economic burden associated with a certain outcome (Caspi et al. 2017; Richmond-Rakerd et al. 2020), we order children by the outcome-specific cost they generate in adulthood and define the top 20 percent of the distribution as “at risk” of this outcome. While other values are also possible, we choose 20 percent because it was previously used by Caspi et al. (2017), who motivated this choice with the Pareto principle, named for economist Vilfredo Pareto, according to which 80 percent of consequences arise from 20 percent of causes (Bunkley 2008). If we instead used continuous measures as outcomes, our prediction methods would focus more on distinguishing between children within the low-burden group and thus become less accurate at identifying the high-burden children of interest in this study.

Table 1 provides an overview of the eight outcomes considered in this study (see also Online Appendix Table G.1 for data sources and detailed descriptions). For five of the outcomes, those having the outcome account for 100 percent of the total burden to society. For example, children ever being criminally charged account for all of the societal cost associated with criminal charges, whereas children never charged do not cause any cost. For the three other outcomes, those having the outcome only account for part of the total burden to society, but their share is disproportionate. Specifically, Table 1 shows that the top 20 percent social benefit recipients in our sample account for 76 percent of total benefit receipt. Similarly, the top 20 percent patients with the most hospital admissions account for 55 percent of all admissions. For income, which is a benefit rather than a burden to society, the pattern is reversed: the bottom 20 percent of the income distribution receive a disproportionately small share, equal to 9 percent of total income.5

View this table:
  • View inline
  • View popup
Table 1

Overview of Outcomes

Table 2 shows that the various outcomes are not independently distributed across children but instead highly correlated. A consequence is that a disproportionately large share of the cohort will have combinations of several outcomes. By the same token, disproportionately many children will be entirely free of any adverse outcome. This is illustrated in Online Appendix Figure F.1, which juxtaposes the actual distribution of the number of negative outcomes with a simulated one based on the assumption that outcomes were uncorrelated.

View this table:
  • View inline
  • View popup
Table 2

Correlations

Online Appendix Figure F.2 reveals that children with combinations of outcomes not only account for a disproportionate share of the population but also for a disproportionate share of the total economic burden. In fact, this share becomes increasingly disproportionate as the number of adverse outcomes rises. Therefore, in addition to predicting each outcome individually, we also aim to predict if children have several adverse outcomes. Specifically, we search for a predictive algorithm that can distinguish children with three or more outcomes versus those with two or fewer (and similarly for four or more and five or more outcomes). We also predict having three or more outcomes as opposed to zero outcomes. This exercise implicitly assumes that we know a priori that a child will end up with either zero or three or more outcomes.6

Finally, in addition to counting the number of adverse outcomes, we also construct a social burden indicator through confirmatory factor analysis, assuming that a single factor underlies the seven individual outcomes. See Online Appendix Table G.2 for additional information.

3. Predictors

The predictors included at the level of the child are sex, county of residence at birth, region of origin, birth month, and birth order. Predictors at the level of the parents are recorded separately for the mother and the father. They include income, wealth, educational attainment, occupation, working hours, age, marital status, hospitalizations, placements, and criminal charges. All continuous predictors are turned into discrete categorical variables. Missing values are generally assigned to a separate category. We include a dummy for each category (except for the logit model, where we omit the baseline category) in the prediction analysis, resulting in a total of 184 dummy variables (159 in the logit model). See Online Appendix Tables G.3 and G.4 for more details and summary statistics. All of the child and parental predictors are well known to be associated with outcomes in adulthood, as discussed in Section B of the Online Appendix.

In the main analysis, all predictors are measured at or before birth. In additional analyses, we extend the time frame to include the first few years after birth. This allows us to update and strengthen the parental predictors with more recent data and to include the child’s hospitalizations as an additional predictor.

B. Methods

1. True positive rate (TPR) and expected benefit

A full-fledged benefit–cost analysis as sketched in Section II and described in detail in Online Appendix Section A would require information that depends on the context and is often not available, such as the monetary cost of the outcome and the impact of the intervention. For this reason, we do not attempt to find the optimal targeted fraction of the cohort that should receive the intervention. Instead, we examine whether the marginal expected benefit curve declines with the fraction treated. To do so, we examine the slope of the marginal TPR curve, which in turn determines the slope of marginal expected benefit curve. If the marginal TPR curve—and thus the marginal expected benefit curve—is sloping downward, this would be evidence that data available at birth can be gainfully utilized to improve upon random and uninformed treatment assignment and thus increase social welfare.

To determine the shape of the marginal TPR curve, we proceed as follows. First, for a given targeted fraction, we maximize the TPR. The optimal mechanism assigns the intervention to those children with the highest probability of being at risk. We estimate this probability using machine learning techniques and categorize children with the highest estimated probability of experiencing the outcome as at risk, while collectively adding up to the chosen targeted fraction. We then compute the resulting TPR. Second, after performing this step for a grid of 20 targeted fractions (5 percent, 10 percent, …, 100 percent), we study how the marginal TPR changes with the targeted fraction and is sloping downward.

We also report the so-called area under the curve (AUC) as a summary measure of predictive accuracy across all possible values of targeted fractions. The AUC ranges from 0.5 to 1. In practice, AUC values are often interpreted as follows (Caspi et al. 2017): worthless (0.5–0.6), poor (0.6–0.7), fair (0.7–0.8), good (0.8–0.9), excellent (0.9–1.0). Online Appendix C.1 provides details on the derivation of the AUC in our setting.

2. Estimation and prediction

Predicting the realization of binary outcome variables, as in this study, is known as a classification problem. We begin by estimating standard logit regressions. An advantage of logit regression is that predictor variables are assumed to combine linearly to form the risk score for each child. In combination with the exclusive use of dummy variables, the risk score thus becomes easy to construct and to interpret. A disadvantage of logit regression is that interactions among predictors must be explicitly specified. Another disadvantage is its implicit risk of overfitting and poor out-of-sample performance.

To address the issue of overfitting, we split the data set into an 80 percent training data set, on which we perform estimations, and a 20 percent test data set, on which we evaluate the model fit in terms of TPR and AUC. In an attempt to improve the predicted probabilities from the logit, we additionally employ three modern machine learning methods: (i) logit LASSO, (ii) random forest, and (iii) gradient boosting. See Online Appendix Section D.1 for details on these methods. The models have in common that they allow for different levels of model complexity that are governed by a vector of tuning parameters. A more complex model reduces the bias in representing the relationship between outcome and predictors, but comes at the risk of overfitting. Both overfitting and bias worsen out-of-sample model performance.7

To find the optimal level of complexity, we tune parameters via eightfold cross-validation on the training data (following the recommendations of Mullainathan and Spiess 2017). That is, we split the training data set into eight equally sized folds and set one of the folds aside. We then estimate the model for specified values of the tuning parameters on the remaining seven folds and evaluate the fit on the selected fold. After repeating this step for each of the other seven folds, we compute the average fit across all folds. The optimal parameter specification is the one that yields the highest average fit in the cross-validation procedure. We use this specification to reestimate the model on the whole training data and evaluate its fit on the test data. Online Appendix Section D.2 reports the optimal tuning parameters.

IV. Results

A. Individual Outcomes

Based on predicted probabilities from logit regressions, Panel A of Figure 1 shows the marginal increase in the TPR for the education outcome as we increase the targeted fraction of the cohort to be treated. Similar graphs for all outcomes considered are shown in Online Appendix Figure F.3.8

Figure 1. Prediction of Educational Attainment Outcome Notes: In Panel A, the solid line is based on predictions from logit regressions and shows incremental changes in the true positive rate as the targeted fraction of the cohort to be treated is increased in steps of 5 percent. The dashed line shows corresponding constant increase under uninformed treatment. The gray area represents 95 percent bootstrap confidence intervals based on 2,000 bootstrap samples from the test data, keeping the prediction function fixed. Panel B shows the receiver operating curve (ROC) and area under the curve (AUC) for predictions from logit regression. The brackets contain the 95 percent confidence interval based on 2,000 bootstrap samples from the test data, keeping the prediction function fixed.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1

Prediction of Educational Attainment Outcome

Notes: In Panel A, the solid line is based on predictions from logit regressions and shows incremental changes in the true positive rate as the targeted fraction of the cohort to be treated is increased in steps of 5 percent. The dashed line shows corresponding constant increase under uninformed treatment. The gray area represents 95 percent bootstrap confidence intervals based on 2,000 bootstrap samples from the test data, keeping the prediction function fixed. Panel B shows the receiver operating curve (ROC) and area under the curve (AUC) for predictions from logit regression. The brackets contain the 95 percent confidence interval based on 2,000 bootstrap samples from the test data, keeping the prediction function fixed.

The graph provides clear evidence that informed treatment assignment based on predictions can substantially improve upon uninformed treatment assignment. For example, targeting 5 percent of the cohort will reach more than 15 percent of all children at risk of having only compulsory education compared to 5 percent under random, uninformed assignment. Thus, when the problem is to target a given fraction of the cohort, for example, due to budget constraints, informed treatment assignment can undoubtedly increase welfare.

In Panel B of Figure 1, we show the receiver operating curve (ROC) plot, along with corresponding AUC value to assess the overall accuracy of the education prediction. The ROC plots for all outcomes are available in Online Appendix Figure F.4. Prediction works well for education and also for criminal charges and foster care placement, with AUC values between 0.76 and 0.81. Health outcomes and income are less predictable by early-life indicators. These findings are consistent with earlier evidence that the child’s educational outcomes depend strongly on parental education, while intergenerational mobility is larger for income and health.9 As Landersø and Heckman (2017) pointed out, the Danish welfare state is characterized by large income redistribution through taxes and transfers, in addition to wage compression, leading to higher intergenerational income mobility than educational mobility. Similarly, universal access to tax-financed medical care might explain why family background matters less for health outcomes.10

B. Combinations of Outcomes

Section III.A.2 demonstrated that outcomes are highly correlated within children. A small group of children accounts for a disproportionately large share of the total social burden. Predicting who these children are and targeting them for appropriately designed interventions promises large returns—Caspi et al. (2017), using predictors throughout childhood, showed that prediction works even better for these “high-cost” children.

In the Panel A in Figure 2, we predict whether children have three or more outcomes versus two or fewer (and similarly for four or more and five or more outcomes). The AUC values are high (0.75–0.81), suggesting that targeting high-cost children is indeed easier, even when using only predictors measured at birth. In the right panel, we repeat the analysis for whether children have three or more outcomes versus zero outcomes (and similarly for our or more and five or more outcomes). The AUC values are now even higher (0.80–0.87), implying that there is substantial variation between low-cost and high-cost children already at birth. Our estimates are similar to the ones by Caspi et al. (2017). Unfortunately, however, we do not know a priori which children will end up with either zero or three or more outcomes (but not one or two outcomes), so this latter exercise is of limited practical relevance.

Figure 2. Prediction—Combinations of Outcomes Notes: Receiver operating curve (ROC) and area under the curve (AUC) for predictions from logit regression.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2

Prediction—Combinations of Outcomes

Notes: Receiver operating curve (ROC) and area under the curve (AUC) for predictions from logit regression.

C. Comparing Methods

Online Appendix Table G.7 compares AUC estimates from different methods and outcomes. For each method and outcome, we present two estimates of the AUC, as explained in Online Appendix Section C.1. The main result is that the directly optimized AUC, which we generally report in this study, yields results similar to the indirectly fine-tuned AUC (see Online Appendix Section C.2 for a more detailed discussion).

Online Appendix Table G.7 also allows us to compare logit regression with machine learning techniques. The main result is that these more advanced machine learning techniques do not improve predictions in our case.

The quantitatively similar performance of the logit regression and the machine learning techniques may come as a surprise. After all, random forest and gradient boosting are tree-based approaches that fundamentally differ from regression-based approaches such as logit and LASSO logit (see Online Appendix Section D.1). Moreover, random forest and gradient boosting offer enhanced flexibility with respect to interactions and functional form, which gives reason to expect a better predictive fit.

We take several steps to make sure the similar performance of the various methods is not caused by inadequate calibration. First, we compare the out-of-sample fit on the test data, which we have reported so far, with the in-sample fit on the training data. As we perform eightfold cross-validation on the training sample, we compute the in-sample fit as the average across eight different fits, each belonging to the fold that is left out of the calibration and used for validation. Out-of-sample and in-sample fit are very similar, alleviating concerns about overfitting (Online Appendix Table G.8). Second, we check whether the optimal tuning parameters of the machine learning methods are not at the edges of the tested parameter range, which would point to suboptimal calibration. In Online Appendix Section D.2, we report optimized tuning parameters for all combinations of outcomes and metrics, corresponding to more than 200 predictions. Most optimal parameters are indeed in the interior of the allowed range. Some are not, which is not surprising, however, in light of the large number of predictions made. Improper parameter tuning is hence unlikely to drive our findings.

To conclude, we believe that data limitations are the most plausible reason why machine learning methods in the present case do not outperform logit. Machine learning excels at prediction in big data. Our data are limited. First, the number of observations is dictated by the magnitude of the Danish birth cohorts. Although a sample size of 149,755 is quite large, one cannot rule out that the machine learning algorithms would outperform the logit regressions if applied to much larger data sets with millions of children. Second, the number of predictors (184 dummy variables, see Section III.A.3) is also limited. This is in part on purpose, as we aim to include predictors available at or before birth and attainable at low cost. However, as shown in Online Appendix Section B, all predictors have been previously shown to correlate with the outcomes and thus also with each other. Consequently, the algorithms do not have the chance to detect subtle patterns in the data, as there are no new variables to explore. See McKenzie and Sansone (2019) for a previous paper in which machine learning methods are not superior, likely because of data limitations.

D. Post-Birth Predictors

A trade-off exists between the effectiveness of the intervention, which some studies have shown to decline with age (Heckman 2006; Rosholm et al. 2021), and the accuracy of predictive targeting, which likely improves with more information collected later in life. The question is whether we can improve predictions by extending the time frame to a few years after birth. Adding years after birth allows us to update parental predictors and to include the child’s hospitalizations as an additional predictor. We look at one, three, and five years after birth.

Figure 3 illustrates how the AUC changes as we extend the time window. As expected, the AUC increases with more recent predictors. The increase is most pronounced for foster care placement, which is also the outcome that occurs earliest in life and is probably more sensitive to changes in early-life family environment. Overall, however, the marginal improvements in predictive accuracy are rather modest. It seems as if the role that family background plays in shaping long-term outcomes is largely determined by factors set in place at birth. Targeting children already at birth appears to come at little cost of predictive accuracy relative to the large potential benefits of early versus late interventions. We discuss this finding further in Section V.

Figure 3. Prediction with Post-Birth Predictors Notes: Improvement in AUC when updating predictors one, three, and five years after birth and adding the child’s hospitalizations as another predictor. All results based on logit regressions.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3

Prediction with Post-Birth Predictors

Notes: Improvement in AUC when updating predictors one, three, and five years after birth and adding the child’s hospitalizations as another predictor. All results based on logit regressions.

E. Parsimonious Model

Our predictions are based on a rich set of variables. If a parsimonious model with a small number of predictors can generate predictions close to the full model, then the necessary data collection would be less costly, and computing risk scores would be simpler and more transparent. We therefore examine whether all predictors contribute equally to the quality of the predictions or only few predictors drive our results.

As shown in detail in Online Appendix Section E, any two of the three SES predictors—parental income, education, and occupation (plus the child’s sex)—generate around 90 percent of the full-model AUC. The best combination is that of income and education, suggesting that these two variables complement each other most, while occupation captures aspects of both and thus contributes the least independent variation. Adding occupation to both income and education does not yield great additional improvement.11

The finding that a parsimonious model with the predictors sex, parental income, and parental education performs almost as well as the full model suggests that collecting data on these few variables is sufficient for efficient targeting in practice. It also bolsters the way that targeting has been operationalized in many prominent childhood interventions that used indicators of parental SES to define disadvantage. A separate but equally important question is how much weight to attach to the different values that an indicator can take on. We address this question next.

F. Optimal Weights

Efficient targeting based on a one-dimensional risk score requires optimally weighing the selected predictors that contribute to the risk score. The Perry Preschool Project derived a “cultural deprivation rating” in which higher values indicated better outcomes (Weikart 1967, p. 3–4). It had three components. Paternal occupation entered the rating with one point if the father engaged in unskilled work and four points if he engaged in skilled work. Each of the parents’ average years of schooling entered with another point. Finally, “density in the home,” measured as number of rooms divided number of people, entered after multiplying by one-half to give it “a one-half weight.” Before aggregating to the final rating, each component was additionally divided by its standard deviation to equate different distributions. Only children with a final rating below 11 were considered further for the experiment.

The Carolina Abecedarian Project (Ramey et al. 1974, p. 65) constructed a “high risk index.” The index increased by one point for each year missing from 12 years of schooling, for both the mother and father. Family income greater than 5,000 dollars left the risk score unchanged, while income less than 5,000 dollars increased it by four points, and by another point for each additional 1,000-dollar step downwards. Additional points were assigned for, among others, the absence of the father and low parental IQ scores. Only children with an index value of 11 or higher participated in the experiment. Ramey et al. (1974) noted that these “[w]eights were assigned to the various factors based upon our ‘best guess’ of their relative importance.” (p. 10–11).

In interventions such as the Perry Preschool and the Carolina Abecedarian Projects, the assigned weights often lack a clear motivation. Thus, at least three questions arise. First, what relative weight should each indicator optimally receive? Second, should indicators such as education enter differently for fathers and for mothers? Third, should indicators such as years of schooling enter linearly? We do not have access to all the indicators used in these two or other studies. Instead, we will use a parsimonious set of predictors consisting of sex, income, and education, which we showed to predict quite well (Section IV.E). Income and education are key ingredients in defining early disadvantage in many targeted interventions. Answering the above questions for these variables is thus highly policy-relevant.

We can address the problem of optimal weighting with the estimated coefficients from the logit regressions. Because our predictors are discretized, and a dummy is included for each discrete value, the coefficient directly gives the weight associated with that value of the predictor, relative to the baseline value. For easier interpretation, we rescale all weights such that the lowest possible weight is zero, and the highest possible weight is 100. Note that risk score values are not interpretable as percentiles; their distribution depends on the distribution of predictor values in the population.

Figure 4 illustrates the computation of the weights for the educational attainment outcome. Other outcomes and the exact values of the weights are shown in Online Appendix Figure F.6 and Table G.9, respectively. The baseline child, the leftmost circles in each column with a gray edge, relative to which the weights are defined is characterized as: female, master’s degree/PhD (both mother and father) and tenth income decile (both mother and father). We see that the baseline child has a score of 0 (= 0 + 0 + 0 + 0 + 0). If the child’s father had only compulsory schooling instead of a master’s degree/PhD, her score would increase to 28 (= 0 + 28 + 0 + 0 + 0 + 0). Even though the coefficients are estimated with some uncertainty, as indicated by the 95 percent confidence bands, it is reassuring to see that, in general, income and education show the expected positively monotone relationship with the outcome.

Figure 4. Optimal Risk Scores for Educational Attainment Outcome Notes: This figure illustrates how to construct optimal risk scores for the outcome “education (compulsory schooling only).” The baseline child is indicated by the leftmost circles with a gray edge and has the following characteristics: female, master’s degree/PhD (mother/father), and tenth income decile (mother/father). The score of the baseline child is the sum of the points in the leftmost circles. For children with other characteristics, the score is obtained by adding the points indicated to the right of the baseline child. The characteristics are given in the following order: master’s degree/PhD, bachelor/vocational bachelor, short cycle higher education, high school, vocational education and training, compulsory schooling (mother/father), and tenth decile, ninth decile, …, first decile (mother/father). All figures are based on coefficients from logit regressions that have been rescaled such that the minimum possible score is zero and the maximum possible score is 100. Vertical bars represent 95 percent confidence bands based on robust standard errors.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4

Optimal Risk Scores for Educational Attainment Outcome

Notes: This figure illustrates how to construct optimal risk scores for the outcome “education (compulsory schooling only).” The baseline child is indicated by the leftmost circles with a gray edge and has the following characteristics: female, master’s degree/PhD (mother/father), and tenth income decile (mother/father). The score of the baseline child is the sum of the points in the leftmost circles. For children with other characteristics, the score is obtained by adding the points indicated to the right of the baseline child. The characteristics are given in the following order: master’s degree/PhD, bachelor/vocational bachelor, short cycle higher education, high school, vocational education and training, compulsory schooling (mother/father), and tenth decile, ninth decile, …, first decile (mother/father). All figures are based on coefficients from logit regressions that have been rescaled such that the minimum possible score is zero and the maximum possible score is 100. Vertical bars represent 95 percent confidence bands based on robust standard errors.

What lessons can we draw about optimal weights in general? First, men tend to have higher risk of adverse outcomes except for health-related outcomes. Income and education seem in general to contribute equally to the risk scores, when measuring contribution as the difference between the lowest and the highest score within a predictor. That said, education contributes more to the risk score than income when predicting education and vice versa.

Second, maternal and paternal education affect the risk scores in a similar manner for most outcomes. An interesting observation with respect to parental income is that maternal income plays a small role as long as she is in the upper 80 percent of the distribution, but being in the bottom 20 percent causes the risk score to spike. The relationship is in general more linear for paternal income.

Third, values at the bottom of the education and income distribution in particular substantially raise the risk score. For income, these are the bottom 30 percent of fathers and the bottom 20 percent of mothers. For education, this corresponds to the parent having only compulsory schooling or a vocational education. This finding implies that risk scores in which income or years of schooling enter linearly, as in the Perry Preschool Project and the Carolina Abecedarian Project, assign too little weight to children with parents in the bottom of the distribution. Of course, this might be driven by our specific choice of outcomes that also focus on the bottom of the respective distribution. Predictions of, for instance, average hospitalizations rather than top 20 percent hospitalizations might depend much more linearly on parental SES. However, as we argued above, it is precisely those children in the top 20 percent of the distribution that cause a disproportionate burden to the welfare state. Targeting these children effectively seems appropriate from an economic point of view.12

V. Discussion and Conclusion

We use prediction methods to show that efficient targeting in childhood interventions is possible even if only variables available at birth are at the decision-maker’s disposal. This applies to interventions addressing a wide range of long-term outcomes, ranging from labor market outcomes to health and crime. We also find that predictions do not improve much by adding post-birth indicators. We demonstrate that a parsimonious set of variables consisting of sex, parental education, and parental income predicts almost as well as the full set of predictors. Finally, we provide econometrically derived optimal weights for the formation of risk scores that differ from the weights typically used in the literature.

In our simple motivational model, where treatment effects were assumed to be homogeneous among at-risk children, risk scores could be directly used for assigning children to interventions—children with higher risk scores should be selected for the intervention. In reality, however, treatment effects are likely heterogeneous and not necessarily correlated with risk (Athey 2017; Kleinberg et al. 2015; Ascarza 2018). This has implications for treatment assignment, as targeting high-risk children yields little value if they are not actually the ones benefiting from an intervention. For example, patients with a family history of severe genetically determined chronic disease might not respond to any type of treatment.

Given the reality of heterogeneous treatment effects, data-driven risk scores should not be used alone to assign treatment. In our view, risk scores better serve as a tool that aids human experts in choosing which children should be targeted for some intervention. Given a pool of children identified as at risk by means of algorithms, experts can employ their expertise to select the children deemed most suited for intervention. With several interventions available, experts can assign each child the most appropriate type of treatment. Of course, for some types of risks, for example, crime and drug abuse, where social interactions may be important determinants, intervention may more fruitfully be targeted to all children within, say, residential areas rather than to specific children.

The idea of combining algorithms with human judgment is used in criminal justice, for example, where risk scores inform justices, who then make the final decisions on bail, sentencing, or parole (Stevenson and Doleac 2024; Ludwig and Mullainathan 2021; Berk 2019). See also Gennatas et al. (2020) for an interesting discussion of how to integrate human expert assessments with machine learning algorithms.

Risk scores may also be useful in randomized controlled trials (RCTs) that test the effectiveness of a novel intervention using treatment and control groups. Administering a RCT to a selected group of children identified as at risk, rather than the full cohort, saves resources while targeting those children that presumably benefit the most from the intervention. This was the motivation behind constructing the risk scores used in the Perry Preschool Project or the Carolina Abecedarian Project.

Our finding that predictions do not improve much by adding post-birth indicators is contingent on using only predictors typically available in registers. Post-birth measures of child development or parenting style are likely to have predictive power as well. For example, Dale et al. (2023) augment Danish register data with measures of the child’s expressive vocabulary and parental investment (daily singing/reading) at age 16–30 months and show that adding these variables improves predictions of school grades at age 15. However, these variables are typically not available in register data and are costly to obtain at large scale, which poses a problem if the goal is to identify at-risk children in the larger population. Though beyond the scope of this paper, predictions from a richer small sample could be informative about those predictors in the register data that are particularly valuable or suggest a different weighting over them.13

So far, we have been deliberately unspecific about the nature of the intervention to which our risk scores can be applied. This is because an intervention can take many forms depending on the context. At the broadest level, the intervention could consist of the provision of free or subsidized public childcare to parents. A related type of intervention is a center-based program, such as the Perry Preschool Project or the Carolina Abecedarian Project, that goes beyond basic childcare by closely involving parents or offering healthcare. Finally, in a narrower sense, interventions can also be thought of as specific programs aimed at improving the learning environment within the daycare center (for example, Bleses et al. 2018a,b) or at home (for example, Andersen and Nielsen 2016; Hjort, Sølvsten, and Wüst 2017). See Rosholm et al. (2021) for an overview of recent RCTs conducted in the context of the Danish welfare state, covering various interventions throughout childhood.

The setting of our study is the generous Danish welfare state, which raises the question whether our findings can be generalized to countries with a less generous welfare state, such as the US. In particular, the Danish welfare state features a high overall spending level and universal access to childcare and healthcare, which could impact the predictive patterns uncovered in this study. Heckman and Landersø shed light on this question in a series of recent papers (Landersø and Heckman 2017; Heckman and Landersø 2022). They show that family background, measured as parental education, is approximately as good a predictor of various child outcomes in both Denmark and the US. This applies to outcomes as diverse as in our study: birth weight, admission to neonatal care unit, sociability, language test scores, criminal convictions, educational attainment, labor market status, and mortality. The authors conclude that “despite generous social policies, family influence on many child outcomes in Denmark is comparable to that in the US. Common forces are at work in both countries that are not easily mitigated by welfare state policies” (Heckman and Landersø 2022).

A notable exception to the similarity between Denmark and the US is income, which is more loosely related to parental characteristics in Denmark than in the US (Landersø and Heckman 2017).14 Heckman and Landersø attribute this to ex post redistribution through the Danish tax and transfer system, not ex ante equalization of individual skill acquisition. For income, the role of parental background will likely be more pronounced in countries with a less generous welfare state than Denmark. Overall, however, the research suggests that the predictive patterns uncovered in this study carry over to other countries, in particular the US.

The question remains, of course—should risk scores be applied in practice? There are a number of caveats. First, the Lucas critique obviously applies to our setting, as the risk scores computed here are not policy-invariant. If, in reaction to a risk score, a predictor is successfully targeted by policy, then its predictive power will become attenuated; in the extreme, even though relevant, it might no longer show up as predictive at all. Similarly, the risk scores derived in this study could be affected by historical policies. For example, our finding that birth order has little predictive power could be due to interventions effectively targeting children based on this variable. Ignoring birth order in defining risk would then be a grave mistake. The sensitivity of machine learning algorithms to policy interventions that change the stability of the relation between predictors and outcomes has also been discussed in Athey (2017) and Hofman et al. (2021).

Second, as a related point, the predictors used in this study were recorded more than 30 years ago. To the extent that the relationship between at-birth predictors and long-term outcomes is different today, the risk scores we computed might no longer be optimal for cohorts born today. The justification for our approach is that long-term outcomes must be observed to be able to construct meaningful risk scores. The alternative to anchoring risk scores in long-term outcomes is to assign ad hoc weights, which most likely results in less efficient targeting. When training the model on one cohort and evaluating it on another cohort, we find no evidence for an unstable relation between predictors and outcomes in the short run (Online Appendix Figure F.8).

Third, parents may try to manipulate the risk score in order to receive treatment for their child. In our setting, the potential for manipulation is small because data were directly taken from the official registers. Survey-based data are more likely to contain false records. Besides misreporting, parents could, of course, also directly lower their income to become eligible.

Finally, targeting raises ethical issues. Dare (2013) discusses several of these issues within the context of a predictive risk model for child maltreatment. The computation and publication of risk scores might be viewed as reducing the child to a number. Moreover, there is a large difference between a calculated risk score and a realized disadvantage. Intervening on the basis of a perceived risk rises questions concerning, for example, the rights of parents to raise their children as they see fit. Furthermore, if assignment of children identified as “high risk” to an intervention is considered a stigma, then the positive effects of treatment may be counteracted by the negative effects of the stigma. Resistance to using data-based algorithms in social settings might also come from fears that algorithms are biased and tend to perpetuate existing inequalities (O’Neil 2016), even if the algorithm is actually designed to achieve the opposite. It is obvious that these concerns are very real and should be addressed. In the context of child maltreatment, Dare (2013) concludes that the potential gains outweigh any ethical reservations, but in our context, it is perhaps not so clear-cut. See Kleinberg et al. (2018b) and Holm (2019) for other discussions of the ethical aspects of predictive risk models. Discussing these issues more deeply is beyond the scope of this paper, which aims to demonstrate the extent to which targeting is practically feasible. We thus hope our work will serve as a basis for further discussion, both inside and outside of academia.

Acknowledgments

The authors thank seminar participants at Aarhus University and three anonymous referees for useful comments and suggestions. This paper uses confidential administrative data from Denmark. The data can be obtained by filing a request directly to Statistics Denmark’s Division of Research Services (www.dst.dk/en/TilSalg/Forskningsservice). The authors are willing to assist in the process.

Footnotes

  • ↵1. Kline and Walters (2016) show that children participating in Head Start do not benefit from the program if they would have otherwise attended another preschool, but they benefit significantly if the alternative would have been being cared for at home. As another example, Cornelissen et al. (2018) find that children with immigrant ancestry benefit most from attending childcare because their alternative care arrangements are of relatively low quality. Similarly, Havnes and Mogstad (2015) study the effect of childcare attendance on earnings in adulthood and show that children from low-income families gain substantially, whereas children from upper-class families suffer a loss in earnings.

  • ↵2. Heckman (2012) states: “The highest rate of return in early childhood development comes from investing as early as possible, from birth through age five, in disadvantaged families.”

  • ↵3. The Perry Preschool Project targeted African American children with low IQ from families that performed poorly on a cultural deprivation scale based on paternal occupation, parental education, and density in the home (persons per room) (Weikart 1967). The Carolina Abecedarian Project constructed a high risk index based on parental education and income and ten additional minor indicators (Ramey et al. 1974). The Early Training Project considered housing, parental education, and parental occupation (Klaus and Gray 1968). Eligibility for Head Start is mainly determined by parental income (US Department of Health & Human Services 2021).

  • ↵4. See also Gennatas et al. (2020) for an interesting discussion of how to integrate human expert assessments with machine learning algorithms.

  • ↵5. A particularly relevant study in this context is by Richmond-Rakerd et al. (2020), who also use Danish administrative data. They show that adult health, crime, and social welfare are unequally distributed across children and correlated within children. Our definition of outcomes varies slightly from theirs: we consider the number of hospitalizations rather than the number of hospital days and criminal charges rather than criminal convictions. Richmond-Rakerd et al. (2020) do not attempt to predict outcomes with information available at birth.

  • ↵6. We include these results for the purpose of comparison with Caspi et al. (2017).

  • ↵7. See also Varian (2014) for an introduction to machine learning from an economist’s perspective.

  • ↵8. The coefficient estimates of the corresponding logit regressions can be found in Online Appendix Tables G.5 and G.6.

  • ↵9. Hertz et al. (2008) estimate that the intergenerational correlation for education in Denmark is 0.30. Andrade and Thomsen (2018) find values in the range between 0.35 and 0.39. In contrast, the intergenerational correlation for income is typically much smaller. For gross income including public transfers, Landersø and Heckman (2017) estimate it to be 0.21, while Andersen (2021), studying total income before deductions and taxes, finds values of 0.05–0.06 (maternal income) and 0.13–0.21 (paternal income). Our measure of disposable family income reflects the progressivity of the Danish tax system, so these estimates are probably too large. Intergenerational correlations for health tend to be even smaller, at least with respect to fathers. Andersen (2021) finds values of 0.11–0.12 (paternal health) and 0.13–0.14 (maternal health).

  • ↵10. Andersen (2021) finds rank–rank slopes for intergenerational health outcomes in Denmark that are only half the size of those found by Halliday, Mazumder, and Wong (2018) for the US, a country with a considerable fraction of uninsured people.

  • ↵11. In certain contexts, particularly in the US, only sex and income might be available, but not education. Targeting only on sex and income yields a substantially lower AUC than when either occupation or education are also included. In this case, however, extending the time frame leads to larger improvements in performance for several outcomes, suggesting that education, which is highly correlated with permanent income, can partially be substituted for by post-birth income (see Online Appendix Figure F.5 in analogy to Figure 3). However, the gap to the full model is far from fully closed.

  • ↵12. In tree-based approaches, such as random forest and gradient boosting, computing risk scores is not as straightforward as in logit. One way to quantify the importance of predictors is to measure the so-called gain, that is, the contribution of each predictor to the model based on the total gain of this predictor’s splits. However, note that these methods’ built-in randomness with respect to the predictors and observations used in each step of the algorithm tends to attribute importance somewhat evenly across predictors. That said, Online Appendix Figure F.7 confirms that SES-related predictors such as income and education are most important in our gradient boosting model. Online Appendix Table G.10 lists the top five most important individual predictors in gradient boosting; it shows that having a parent—particularly a mother—at the bottom of the education distribution (compulsory education only) or income distribution (first decile) has the most important contribution to the prediction in the gradient boosting algorithm, similar to the risk scores based on the logit regression.

  • ↵13. See Hellerstein and Imbens (1999) for an example of combining large and small data sets.

  • ↵14. Consistent with this result, we also find relatively low predictive accuracy for the income outcome in this study.

  • Received March 2020.
  • Accepted May 2023.

This open access article is distributed under the terms of the CC-BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) and is freely available online at: https://jhr.uwpress.org.

References

  1. ↵
    1. Allen, Graham.
    2011. “Early Intervention: The Next Steps.” Technical report.
  2. ↵
    1. Andersen, Carsten.
    2021. “Intergenerational Health Mobility: Evidence from Danish Registers.” Health Economics 30(12):3186–202.
    OpenUrlPubMed
  3. ↵
    1. Andersen, Simon Calmar, and
    2. Helena Skyt Nielsen
    . 2016. “Reading Intervention with a Growth Mindset Approach Improves Children’s Skills.” Proceedings of the National Academy of Sciences 113(43):12111–13.
    OpenUrlAbstract/FREE Full Text
  4. ↵
    1. Andrade, Stefan, and
    2. Jens-Peter Thomsen
    . 2018. “Intergenerational Educational Mobility in Denmark and the United States.” Sociological Science 5:93–113.
    OpenUrl
  5. ↵
    1. Ascarza, Eva.
    2018. “Retention Futility: Targeting High-Risk Customers Might Be Ineffective.” Journal of Marketing Research 55(1):80–98.
    OpenUrl
  6. ↵
    1. Athey, Susan.
    2017. “Beyond Prediction: Using Big Data for Policy Problems.” Science 355(6324):483–85.
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Bansak, Kirk,
    2. Jeremy Ferwerda,
    3. Jens Hainmueller,
    4. Andrea Dillon,
    5. Dominik Hangartner,
    6. Duncan Lawrence, and
    7. Jeremy Weinstein
    . 2018. “Improving Refugee Integration Through Data-Driven Algorithmic Assignment.” Science 359(6373):325–29.
    OpenUrlAbstract/FREE Full Text
  8. ↵
    1. Berk, Richard A.
    2019. Machine Learning Risk Assessments in Criminal Justice Settings. New York: Springer International Publishing.
  9. ↵
    1. Bhattacharya, Debopam, and
    2. Pascaline Dupas
    . 2012. “Inferring Welfare Maximizing Treatment Assignment Under Budget Constraints.” Journal of Econometrics 167(1):168–96.
    OpenUrlCrossRef
  10. ↵
    1. Bleses, Dorthe,
    2. Anders Højen,
    3. Philip S. Dale,
    4. Laura M. Justice,
    5. Line Dybdal,
    6. Shayne Piasta,
    7. Justin Markussen-Brown,
    8. Laila Kjærbæk, and
    9. E.F. Haghish
    . 2018a. “Effective Language and Literacy Instruction: Evaluating the Importance of Scripting and Group Size Components.” Early Childhood Research Quarterly 42:256–69.
    OpenUrlCrossRef
  11. ↵
    1. Bleses, Dorthe,
    2. Anders Højen,
    3. Laura M. Justice,
    4. Philip S. Dale,
    5. Line Dybdal,
    6. Shayne B. Piasta,
    7. Justin Markussen-Brown,
    8. Marit Clausen, and
    9. E.F. Haghish
    . 2018b. “The Effectiveness of a Large-Scale Language and Preliteracy Intervention: The SPELL Randomized Controlled Trial in Denmark.” Child Development 89(4):e342–e363.
    OpenUrlCrossRefPubMed
  12. ↵
    1. Bunkley, Nick.
    2008. “Joseph Juran, 103, Pioneer in Quality Control, Dies.” New York Times, March 3.
  13. ↵
    1. Caspi, Avshalom,
    2. Renate M. Houts,
    3. Daniel W. Belsky,
    4. Honalee Harrington,
    5. Sean Hogan,
    6. Sandhya Ramrakha,
    7. Richie Poulton, and
    8. Terrie E. Moffitt
    . 2017. “Childhood Forecasting of a Small Segment of the Population with Large Economic Burden.” Nature Human Behaviour 1(1):1–10.
    OpenUrl
  14. ↵
    1. Chandler, Dana,
    2. Steven D. Levitt, and
    3. John A. List
    . 2011. “Predicting and Preventing Shootings among At-Risk Youth.” American Economic Review 101(3):288–92.
    OpenUrlCrossRef
  15. ↵
    1. Chittleborough, Catherine R.,
    2. Amelia K. Searle,
    3. Lisa G. Smithers,
    4. Sally Brinkman, and
    5. John W. Lynch
    . 2016. “How Well Can Poor Child Development Be Predicted from Early Life Characteristics?” Early Childhood Research Quarterly 35:19–30.
    OpenUrlCrossRef
  16. ↵
    1. Cornelissen, Thomas,
    2. Christian Dustmann,
    3. Anna Raute, and
    4. Uta Schönberg
    . 2018. “Who Benefits from Universal Child Care? Estimating Marginal Returns to Early Child Care Attendance.” Journal of Political Economy 126(6):2356–409.
    OpenUrlCrossRef
  17. ↵
    1. Cunha, Flavio,
    2. James J. Heckman, and
    3. Susanne M. Schennach
    . 2010. “Estimating the Technology of Cognitive and Noncognitive Skill Formation.” Econometrica 78(3):883–931.
    OpenUrlCrossRefPubMed
  18. ↵
    1. Dale, Philip S.,
    2. Alexander Paul,
    3. Michael Rosholm, and
    4. Dorthe Bleses
    . 2023. “Prediction from Early Childhood Vocabulary to Academic Achievement at the End of Compulsory Schooling in Denmark.” International Journal of Behavioral Development 47(2):123–34.
    OpenUrl
  19. ↵
    1. Dare, Tim.
    2013. “Predictive Risk Modelling and Child Maltreatment: An Ethical Review.” Wellington, New Zealand: Ministry of Social Development.
  20. ↵
    1. Del Boca, Daniela,
    2. Christopher Flinn, and
    3. Matthew Wiswall
    . 2014. “Household Choices and Child Development.” Review of Economic Studies 81(1):137–85.
    OpenUrlCrossRef
  21. ↵
    1. Gennatas, Efstathios D.,
    2. Jerome H. Friedman,
    3. Lyle H. Ungar,
    4. Romain Pirracchio,
    5. Eric Eaton,
    6. Lara G. Reichmann,
    7. Yannet Interian,
    8. José Marcio Luna,
    9. Charles B. Simone II.,
    10. Andrew Auerbach,
    11. Elier Delgado,
    12. Mark J. van der Laan,
    13. Timothy D. Solberg, and
    14. Gilmer Valdes
    . 2020. “Expert-Augmented Machine Learning.” Proceedings of the National Academy of Sciences 117(9):4571–77.
    OpenUrlAbstract/FREE Full Text
  22. ↵
    1. Glaeser, Edward L.,
    2. Andrew Hillis,
    3. Scott Duke Kominers, and
    4. Michael Luca
    . 2016. “Crowd- Sourcing City Government: Using Tournaments to Improve Inspection Accuracy.” American Economic Review 106(5):114–18.
    OpenUrlCrossRef
  23. ↵
    1. Halliday, Timothy,
    2. Bhashkar Mazumder, and
    3. Ashley Wong
    . 2018. “Intergenerational Health Mobility in the US.” IZA Discussion Paper 11304. Bonn, Germany: IZA.
  24. ↵
    1. Havnes, Tarjei, and
    2. Magne Mogstad
    . 2015. “Is Universal Child Care Leveling the Playing Field?” Journal of Public Economics 127:100–114.
    OpenUrlCrossRef
  25. ↵
    1. Heckman, James J.
    2006. “Skill Formation and the Economics of Investing in Disadvantaged Children.” Science 312(5782):1900–1902.
    OpenUrlAbstract/FREE Full Text
  26. ↵
    1. Heckman, James J.
    2008. “Schools, Skills, and Synapses.” Economic Inquiry 46(3):289–324.
    OpenUrlCrossRefPubMed
  27. ↵
    1. Heckman, James J.
    2012. “Invest in Early Childhood Development: Reduce Deficits, Strengthen the Economy.” The Heckman Equation. https://heckmanequation.org/resource/invest-in-early-childhood-development-reduce-deficits-strengthen-the-economy/ (accessed July 23, 2025).
  28. ↵
    1. Heckman, James, and
    2. Rasmus Landersø
    . 2022. “Lessons for Americans from Denmark About Inequality and Social Mobility.” European Association of Labour Economists, World Conference EALE/SOLE/AASLE, Berlin, Germany, June 25–27, 2020. Labour Economics 77:101999.
    OpenUrl
  29. ↵
    1. Hellerstein, Judith K., and
    2. Guido W. Imbens
    . 1999. “Imposing Moment Restrictions from Auxiliary Data by Weighting.” Review of Economics and Statistics 81(1):1–14.
    OpenUrlCrossRef
  30. ↵
    1. Hertz, Tom,
    2. Tamara Jayasundera,
    3. Patrizio Piraino,
    4. Sibel Selcuk,
    5. Nicole Smith, and
    6. Alina Verashchagina
    . 2008. “The Inheritance of Educational Inequality: International Comparisons and Fifty-Year Trends.” B.E. Journal of Economic Analysis & Policy 7(2):10.
    OpenUrl
  31. ↵
    1. Hjort, Jonas,
    2. Mikkel Sølvsten, and
    3. Miriam Wüst
    . 2017. “Universal Investment in Infants and Long-Run Health: Evidence from Denmark’s 1937 Home Visiting Program.” American Economic Journal: Applied Economics 9(4):78–104.
    OpenUrlCrossRef
  32. ↵
    1. Hofman, Jake M.,
    2. Duncan J. Watts,
    3. Susan Athey,
    4. Filiz Garip,
    5. Thomas L. Griffiths,
    6. Jon Kleinberg,
    7. Helen Margetts,
    8. Sendhil Mullainathan,
    9. Matthew J. Salganik,
    10. Simine Vazire,
    11. Alessandro Vespignani, and
    12. Tal Yarkoni
    . 2021. “Integrating Explanation and Prediction in Computational Social Science.” Nature 595:181–88.
    OpenUrlPubMed
  33. ↵
    1. Holm, Elizabeth A.
    2019. “In Defense of the Black Box.” Science 364(6435):26–27.
    OpenUrlAbstract/FREE Full Text
  34. ↵
    1. Kitagawa, Toru, and
    2. Aleksey Tetenov
    . 2018. “Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice.” Econometrica 86(2):591–616.
    OpenUrl
  35. ↵
    1. Klaus, Rupert A., and
    2. Susan W. Gray
    . 1968. “The Early Training Project for Disadvantaged Children: A Report After Five Years.” Monographs of the Society for Research in Child Development 33(4):iii–66.
    OpenUrl
  36. ↵
    1. Kleinberg, Jon,
    2. Himabindu Lakkaraju,
    3. Jure Leskovec,
    4. Jens Ludwig, and
    5. Sendhil Mullainathan
    . 2018a. “Human Decisions and Machine Predictions.” Quarterly Journal of Economics 133(1):237–93.
    OpenUrlCrossRefPubMed
  37. ↵
    1. Kleinberg, Jon,
    2. Jens Ludwig,
    3. Sendhil Mullainathan, and
    4. Ziad Obermeyer
    . 2015. “Prediction Policy Problems.” American Economic Review 105(5):491–95.
    OpenUrlCrossRefPubMed
  38. ↵
    1. Kleinberg, Jon,
    2. Jens Ludwig,
    3. Sendhil Mullainathan, and
    4. Ashesh Rambachan
    . 2018b. “Algorithmic Fairness.” AEA Papers and Proceedings 108:22–27.
    OpenUrl
  39. ↵
    1. Kline, Patrick, and
    2. Christopher R. Walters
    . 2016. “Evaluating Public Programs with Close Substitutes: The Case of Head Start.” Quarterly Journal of Economics 131(4):1795–848.
    OpenUrlCrossRef
  40. ↵
    1. Landersø, Rasmus, and
    2. James J. Heckman
    . 2017. “The Scandinavian Fantasy: The Sources of Intergenerational Mobility in Denmark and the US.” Scandinavian Journal of Economics 119(1):178–230.
    OpenUrlCrossRefPubMed
  41. ↵
    1. Ludwig, Jens, and
    2. Sendhil Mullainathan
    . 2021. “Fragile Algorithms and Fallible Decision- Makers: Lessons from the Justice System.” Journal of Economic Perspectives 35(4):71– 96.
    OpenUrl
  42. ↵
    1. Manski, Charles F.
    2004. “Statistical Treatment Rules for Heterogeneous Populations.” Econometrica 72(4):1221–46.
    OpenUrlCrossRef
  43. ↵
    1. McKenzie, David, and
    2. Dario Sansone
    . 2019. “Predicting Entrepreneurial Success Is Hard: Evidence from a Business Plan Competition in Nigeria.” Journal of Development Economics 141:102369.
    OpenUrl
  44. ↵
    1. Mullainathan, Sendhil, and
    2. Jann Spiess
    . 2017. “Machine Learning: An Applied Econometric Approach.” Journal of Economic Perspectives 31(2):87–106.
    OpenUrlCrossRef
  45. ↵
    1. O’Neil, Cathy.
    2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. New York: Crown.
  46. ↵
    1. Ramey, Craig T., et al.
    1974. “The Carolina Abecedarian Project: A Longitudinal and Multidisciplinary Approach to the Prevention of Developmental Retardation.” Technical report. ERIC Report ED 030 490. Chapel Hill, NC: North Carolina University, Frank Porter Graham Center.
  47. ↵
    1. Richmond-Rakerd, Leah S.,
    2. Stephanie D’Souza,
    3. Signe Hald Andersen,
    4. Sean Hogan,
    5. Renate M. Houts,
    6. Richie Poulton,
    7. Sandhya Ramrakha,
    8. Avshalom Caspi,
    9. Barry J. Milne, and
    10. Terrie E. Moffitt
    . 2020. “Clustering of Health, Crime and Social-Welfare Inequality in 4 Million Citizens from Two Nations.” Nature Human Behaviour 4(3):255–64.
    OpenUrlPubMed
  48. ↵
    1. Rosholm, Michael,
    2. Alexander Paul,
    3. Dorthe Bleses,
    4. Anders Højen,
    5. Philip S. Dale,
    6. Peter Jensen,
    7. Laura M. Justice,
    8. Michael Svarer, and
    9. Simon Calmar Andersen
    . 2021. “Are Impacts of Early Interventions in the Scandinavian Welfare State Consistent with a Heckman Curve? A Meta-Analysis.” Journal of Economic Surveys 35(1):106–40.
    OpenUrl
  49. ↵
    1. Sansone, Dario.
    2019. “Beyond Early Warning Indicators: High School Dropout and Machine Learning.” Oxford Bulletin of Economics and Statistics 81(2):456–85.
    OpenUrl
  50. ↵
    1. Şara, Nicolae-Bogdan,
    2. Rasmus Halland,
    3. Christian Igel, and
    4. Stephen Alstrup
    . 2015. “High-School Dropout Prediction Using Machine Learning a Danish Large-Scale Study.” In Proceedings. ESANN 2015: 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ed. Michel Verleysen, 319–24.
  51. ↵
    1. Stevenson, Megan T., and
    2. Jennifer L. Doleac
    . 2024. “Algorithmic Risk Assessment in the Hands of Humans.” American Economic Journal: Economic Policy 16(4):382–414.
    OpenUrl
  52. ↵
    US Department of Health & Human Services. 2021. “Poverty Guidelines and Determining Eligibility for Participation in Head Start Programs.” https://eclkc.ohs.acf.hhs.gov/eligibility-ersea/article/poverty-guidelines-determining-eligibility-participation-head-start (accessed July 23, 2025).
  53. ↵
    1. Varian, Hal R.
    2014. “Big Data: New Tricks for Econometrics.” Journal of Economic Perspectives 28(2):3–28.
    OpenUrlCrossRef
  54. ↵
    1. Weikart, David P.
    1967. “Preliminary Results From a Longitudinal Study of Disadvantaged Preschool Children.” ERIC Report ED 030 490. presented at the 1967 convention of the Council for Exceptional Children, St. Louis, MO.
PreviousNext
Back to top

In this issue

Journal of Human Resources: 61 (1)
Journal of Human Resources
Vol. 61, Issue 1
1 Jan 2026
  • Table of Contents
  • Table of Contents (PDF)
  • Cover (PDF)
  • Index by author
  • Front Matter (PDF)
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Journal of Human Resources.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Efficient Targeting in Childhood Interventions
(Your Name) has sent you a message from Journal of Human Resources
(Your Name) thought you would like to see the Journal of Human Resources web site.
Citation Tools
Efficient Targeting in Childhood Interventions
Alexander Paul, Dorthe Bleses, Michael Rosholm
Journal of Human Resources Jan 2026, 61 (1) 160-184; DOI: 10.3368/jhr.0320-10756R4

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Efficient Targeting in Childhood Interventions
Alexander Paul, Dorthe Bleses, Michael Rosholm
Journal of Human Resources Jan 2026, 61 (1) 160-184; DOI: 10.3368/jhr.0320-10756R4
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One
Bookmark this article

Jump to section

  • Article
    • Abstract
    • I. Introduction
    • II. Organizational and Theoretical Considerations
    • III. Data and Methods
    • IV. Results
    • V. Discussion and Conclusion
    • Acknowledgments
    • Footnotes
    • References
  • Figures & Data
  • Supplemental
  • Info & Metrics
  • References
  • PDF

Related Articles

  • Google Scholar

Cited By...

  • No citing articles found.
  • Google Scholar

More in this TOC Section

  • Birds of a Feather Earn Together
  • Cash Welfare and Health Spending
  • Smartphone Bans, Student Outcomes and Mental Health
Show more Articles

Similar Articles

Keywords

  • I18
  • I28
  • I38
UW Press logo

© 2026 Board of Regents of the University of Wisconsin System

Powered by HighWire