Abstract
We evaluate a primary school scholarship program in Cambodia with two different targeting mechanisms, one based on poverty level and the other on baseline test scores (“merit”). Both approaches increased enrollment and attendance. Only the merit-based targeting induced positive effects on test scores. This asymmetry is unlikely to have been driven by differences in recipients’ characteristics. We marshal evidence suggesting that the framing of the scholarships might have led to different impacts. In order to balance equity and efficiency, a two-step targeting approach might be preferable: first, identify low-income individuals, and then, among them, target based on merit.
I. Introduction
Studies of the impacts of programs that incentivize schooling through cash transfers have typically found that direct and indirect costs are important determinants of school participation and that programs reducing such costs are quite effective in inducing higher enrollment and attendance rates. However, the programs do not show consistent positive impacts on learning outcomes.
One hypothesis that emerges from this literature (particularly the literature from developing countries) is that monetary incentives may increase the participation of low-achieving students in school and that school systems may be ill prepared to teach them well (Behrman, Parker, and Todd 2011; Filmer and Schady 2009, 2014). This gives rise to a potential equity-efficiency tradeoff: targeting high-achieving students through an incentive based on academic performance such as a merit scholarship may yield higher learning outcomes. But if academic performance is correlated with economic circumstance, then those outcomes would come at the cost of reaching the poor. Poverty-targeted incentives, on the other hand, do reach the poor and do induce greater schooling—but if there is little learning to show for it, one might question the usefulness of that approach. The potential tradeoff between equity and efficiency has been at the center of discussions in many contexts: discussions on college scholarships in the United States (Orfield 2002), social programs in developing countries (Coady, Grosch, and Hoddinott 2003), and more generally on poverty reduction strategies in the presence of tight budget constraints (Bardhan 1996).
This paper addresses the equity-efficiency tradeoff directly in the context of Cambodia, a low-income country, by evaluating the impacts of a government scholarship program that started in 2008.1
The main objective of the government with this program was to increase school enrollment and progression in Grades 4–6. The program included two approaches to targeting that were run in parallel: Schools in one group offered scholarships based on the economic status of the student’s household (which we refer to as poverty-based targeting), and schools in another group offered scholarships based on the student’s performance on a baseline test (which we refer to as merit-based targeting). Other than the mechanism for selecting the scholarship recipients, the conditions of the program under both targeting mechanisms were identical.
We find that both targeting approaches increase enrollment and attendance indicators. For example, the probability of reaching Grade 6 increased by 19 percentage points for poverty scholarship recipients and by 14 percentage points for merit scholarship recipients (over a counterfactual rate of 60 percent). However, only the merit-based scholarship produces a positive impact on achievement as measured by test scores: Recipients’ performance on a math test was 0.17 standard deviations higher and on the Digit Span test was 0.15 standard deviations higher, with both of these estimates statistically significantly different from zero. On the other hand, estimates of these impacts among poverty scholarship recipients were close to zero (−0.05 and −0.06 standard deviations respectively for the math and Digit Span tests) and insignificantly different from zero.
This asymmetry in results does not appear to be driven by the characteristics of the students themselves. High baseline achievers who received the poverty-based scholarship performed no better in followup tests than the corresponding control group. On the other hand, similarly poor individuals who received a merit-based scholarship did perform better on the followup test (as did nonpoor merit-based scholarship recipients). In summary, “identical” children who received differently labeled scholarships experienced different impacts on learning outcomes. We provide suggestive evidence that merit-based scholarship recipients and their families exerted additional effort as a result of the scholarship, as measured by homework and by expenditures on education, whereas poverty-based scholarship recipients did not.
Our findings suggest that the impact of the program was dependent on the way that the intervention was framed. We document pathways that are consistent with motivation being an important determinant of the program’s success in incentivizing students, and that the framing of the scholarship itself affected motivation. In order to maximize both equity and efficiency objectives, these results suggest that demand-side programs should ensure that student (and family) motivation is enhanced through the program rather than potentially undermined. An intervention that targets students with an incentive that recognizes their high academic potential, while ensuring that the poorest students are among that set, would be one potential way to achieve this.
We cannot investigate all the pathways for how framing might matter—for example, the data do not allow us to assess the extent to which teachers might respond differently to being presented with a poverty-based scholarship recipient as opposed to a merit-based scholarship recipient. In addition, as with other evaluations of this type, our results should be understood as conditional on the environment in which this intervention is taking place. Based on this intervention, we do not know whether wholesale simultaneous improvement in the quality of schooling provided would have resulted in better learning outcomes among recipients of the poverty-based scholarships. Further research to understand how additional channels might mediate impacts, or what alternative approaches might work for improving learning outcomes among the poor, would be important to provide further guidance to policy. But our results suggest that even a modest change in how conditional cash transfers are implemented—that is, altering their framing so that they motivate greater effort—can positively affect impacts.
This article is organized as follows. After motivating the analysis and discussing the pertinent literature in Section II, we describe the setting, the program, and the evaluation design in Section III. In Section IV, we present the empirical strategy, the data, and the validation of the identification strategy. The main results appear in Section V. In Section VI, we explore pathways for the impacts and the results on the equity-efficiency tradeoff. A conclusion follows, which reviews the implications of the findings for designing incentive programs and points to potential future research.
II. Motivation: The Efficiency of Incentive Programs
Monetary incentive programs (scholarships included) are thought to induce greater schooling for three main reasons. First, the direct and indirect costs of attending school, along with the lack of financing to cover those costs, may deter families from making optimal decisions on education; reducing those costs may induce greater investment in education. Second, students and families may discount future returns to education very heavily and, as a result, they may not invest the optimal effort in education. Monetary incentives will increase the short-run benefits of such investments. Third, families and students may not have complete information on the returns to education. Monetary incentives may serve as a means of signaling that education is important.
Scholarship programs, such as the one evaluated here, can be thought of as a particular type of Conditional Cash Transfer (CCT) program—one that has an individual rather than a family as the recipient, and one that focuses on a single sector (education) rather than being conditional on actions related to two sectors (health and education).2 Extensively studied and increasingly popular in much of the developing world, CCT programs have become the largest form of social assistance provided in several countries (see the reviews in Fiszbein and Schady 2009, Independent Evaluation Group 2011, Baird et al. 2013, Saavedra and Garcia 2013). Much of the rigorous evidence on the impact of CCTs on student enrollments has come from middle-income countries, mainly in Latin America, where baseline enrollment rates are high and impacts relatively small (in part because there is little room for an increase). In contrast, this paper presents evidence from a scholarship program in Cambodia where enrollment rates are low. It adds to the much smaller evidence base on CCTs from low-income countries, such as Pakistan (Chaudhury and Parajuli 2008) and Malawi (Baird, McIntosh, and Özler 2011), as well as evidence from other programs in Cambodia (Filmer and Schady 2008, 2011, 2014).
CCT programs designed to increase enrollment and attendance have indeed raised these measures of school participation, but in some settings they have resulted in negative or insignificant changes in learning outcomes (Behrman, Parker, and Todd 2005; Behrman, Sengupta, and Todd 2000; Filmer and Schady 2014; Krishnaratne, White, and Carpenter 2013; McEwan 2015; Murnane and Ganimian 2014), whereas in others they have had positive results (Baird, McIntosh, and Özler 2011; Barham et al. 2013). Similarly, the effects of cash incentives linked to changes in achievement itself have yielded mixed results. These programs aim to encourage the performance of students who are already in the educational system. Two recent evaluations in the United States showed mixed (Fryer 2011) or no impact (Bettinger 2012) on standardized tests; an evaluation of a program in Israel showed positive impacts on exit exams, takeup rates, and college matriculations (Angrist and Lavy 2009). Finally, a mixed approach using merit-based scholarships in Kenya yielded positive effects on test scores in one of two districts (Kremer, Miguel, and Thornton 2009).3 The question of how to effectively turn incentives for schooling into incentives for learning remains open.
In Cambodia, two evaluations of the impact of scholarships for lower secondary school have shown substantial increases in school enrollment and attendance as a direct consequence of the programs (Filmer and Schady 2008, 2011, 2014). Recipients were 20–30 percentage points more likely to be enrolled and attending school as a result of the scholarships. These studies also show that scholarships targeted to lower secondary school students led to more family expenditure on education and less work for pay among recipients. Impacts on learning outcomes were limited. The authors argue that the limited impacts point to potential issues in the quality of education and the match between students’ skill levels and instruction (particularly among students induced to stay in school as a result of the scholarship).
Three main explanations are typically put forward for mixed findings on test scores. First, the positive results may reflect the capacity of monetary incentives to act as an extrinsic motivator (for the student and family), especially among individuals from low-income families. Such short-term motivation may be important when information on the returns to education is imperfect and the discount rate is very high. Bettinger (2012) argues that these considerations may be highly relevant for primary school students in the United States. Second, scholarship programs can induce a negative impact by reducing intrinsic motivation. A person’s motivation to perform well can decline if she comes to view performing well as an obligatory activity for achieving a certain goal, in this case a scholarship (for example, see Lepper, Greene, and Nisbett 1973; Deci, Koestner, and Ryan 2001). Finally, these programs may have no impact on academic achievement if students are unable to respond to the incentives. Students may not know how to convert the incentive into actions that influence achievement (Fryer 2011). A related issue is that the incentive may have strong complementarities with other inputs that are out of a student’s control and that the scholarship program does not alter, such as the quality or appropriateness of the teaching (Fryer 2011): A student or her family may have a substantial amount of control over whether she attends school but far less control over the factors that enable that schooling to be converted into learning.
We explore an additional explanation that is consistent with the contrasting impacts of the poverty-based and merit-based targeted scholarships we study here. If the targeting approach itself changes the frame within which various actors are responding to the incentive, and that frame matters for impact, then the impact of the program on test scores will be dependent on the actual targeting mechanism. This line of reasoning is aligned with a (large) psychology literature on “stereotype threat” (for a general theoretical framework, see Schmader, Johns, and Forbes 2008). When certain individuals are faced with a specific label that carries a social stigma, labeling interferes with intellectual performance. The mechanisms by which labeling can affect performance are diverse: anxiety, stereotype activation, self-doubt, working memory, and arousal (Schmader, Johns, and Forbes 2008). For example, African Americans responded to stereotype threats by lowering performance on tests (Steele and Aronson 1995). Labeling associated with caste in India affected the performance of lower-caste individuals when the caste was publicly announced (Hoff and Pandey 2006). Aronson et al. (1999) shows that this effect can be present even in the absence of a social stigma attached to the labeling; white males underperform in relation to Asian students. But there is a broader set of ways that labeling can matter: It sends signals and messages to the recipients and others, and different signals and messages may induce different behaviors. After describing the program, the evaluation, and the results, we return to this issue in the interpretation of our findings.
III. Program and Evaluation Design
Cambodia has a recent record of using demand-side incentives to raise school enrollment and attendance rates. Some of these programs operate at the primary school level—such as school feeding programs or small-scale programs that offer incentives for children to attend school—but most are targeted at lower secondary school (Filmer and Schady 2008, 2014). The programs do not simply waive school fees; instead, the families of children selected for a “scholarship” receive a small transfer conditional on school enrollment, regular attendance, and satisfactory grades.
One important finding from the evaluations of earlier incentive programs is that, relative to the population as a whole, the targeting only mildly favored the poor. Filmer and Schady (2009) show that one program, despite reaching the poorest children who applied for scholarships, did not reach the poorest of the poor, who had already dropped out of school before Grade 6 when they would have applied for secondary school scholarships. Figure 1 shows the proportion of youth ages 15–19 nationally who completed each grade at the time the program we study in this paper was launched. The figure shows that children from the poorest quintiles are the least likely to reach sixth grade. This finding suggests that it is hard for a program that targets children at the end of Grade 6 to be strongly pro-poor and that a program targeting poor students earlier in the schooling cycle is needed when the goal is to reach the poorest of the poor.
A. The Primary School Scholarship Pilot Program
Based in part on these findings and the desire to assess the viability, effectiveness, and optimal design of such a program, the Government of Cambodia implemented a new pilot scholarship program in 2008. The program’s stated goal was to offset the direct and opportunity costs of schooling and increase educational attainments as a result.4 An implicit goal was also to improve learning outcomes through the additional education. This paper reports the results of the impact evaluation of that pilot program.
The basic design of the primary school scholarship pilot was to select participating schools; randomly assign them to offer scholarships based on poverty or merit; and then, within each school, identify candidates for scholarships based on transparent, clearly articulated criteria. Once selected, scholarship recipients were required to stay enrolled, attend school regularly, and maintain passing grades to keep the scholarship until they graduated from primary school.5 In Cambodia, primary school consists of Grades 1 through 6, and the program targeted students entering the upper-primary level (Grades 4, 5, and 6). The scholarship was equivalent to US$20 per student annually.6 Through a followup survey, we determined that the mean household expenditure per capita per year in our sample was US$610;7 as such, the scholarship represents 3.3 percent of the yearly per capita expenditure (the average household size in our sample is 6.99). The scholarships were intended to be disbursed in two tranches of US$10—the first in the beginning of the year and the second in the middle of the school year. In the first year of the program, however, scholarships were distributed in one lump sum due to delays in implementation.8
The pilot program was implemented in three provinces—Mondulkiri, Ratanakiri, and Preah Vihear—where average dropout rates in the upper-primary grades were highest, according to an analysis of Cambodia’s Education Management Information System (EMIS). To narrow the geographic scope of the program, only seven districts in Ratanakiri (those with the highest dropout rates) were selected for participation, out of a total of nine districts. In the other two provinces, all districts were included. Within these selected districts, all primary schools that offered classes through Grade 6 participated in the program.
Of the 207 program schools, 103 were randomly assigned to join the program in Phase 1 (the program’s first year, 2008/09) and the other 104 in Phase 2 (the second year of the program, 2009/10). In each phase, schools were randomly assigned to either poverty-based or merit-based scholarship targeting. In total, the program offered scholarships to approximately 5,162 students from a pool of 12,066 individuals in the 207 program schools—although our analysis focuses only on the subset that was offered them in the first year of implementation.
Schools using poverty-based targeting selected students based on a poverty index. All students in Grade 4 filled out a simple form with questions regarding their household and family socioeconomic characteristics.9 These forms were scored according to a strict formula based on weights derived from an analysis of household survey data to derive a poverty index for each student.10 The poverty index ranged from 0 (richest household) to 292 (poorest household). The application forms were scored centrally by a firm contracted specifically for this purpose, thereby reducing the probability of any manipulation of program eligibility. Within each school, the applicants with the highest scores (that is, those with the highest level of poverty) were offered a scholarship.
Schools using merit-based targeting selected scholarship recipients based on scores on a test administered at baseline. The test included questions in math and Khmer (the national language). The maximum possible score was 25 points/correct answers. The test was adapted from the Grade 3 National Learning Assessment.11 All students in Grade 4 took the test, and within each school the applicants with the highest test scores were offered scholarships. Again, the tests were scored centrally to minimize the risk of program manipulation.
Our analysis exploits the fact that all students in the targeted grade filled out both the “poverty form” as well as the math and Khmer test—regardless of which type of targeting their school would ultimately use to select scholarship recipients. In addition, we use the fact that Grade 4 students in Phase 2 schools (who would be ineligible for scholarships) also filled out the forms and took the tests.
School head teachers, as well as other school-level stakeholders, were aware (after the initial forms were completed and before the scholarship recipients were announced) of which type of scholarship was being allocated in a school. School-level spot checks suggest a high degree of understanding of the nature of the targeting, likely due to the extensive “socialization” of the program that took place.
The number of students within each school who would receive a scholarship was fixed exogenously and set to half the number of registered students in the year prior to the program (as determined by an analysis of EMIS data).12
B. Evaluation Design and Data
Application forms, including baseline tests for the sample that we study in this analysis, were filled out in December 2008/January 2009. Recipient lists were circulated to schools in May 2009, which was prior to the distribution of scholarships that took place at the beginning of July 2009—that is, the end of the 2008/2009 school year.
The identification of impacts relies on the fact that, among the cohort of students studied, fourth graders in Phase 2 schools were not eligible for scholarships in the 2008/09 school year or later. Among these students, we identify the valid counterfactual group—a group that differs, on average, from the treatment group only in that it did not receive scholarships. Because students in the control schools were never exposed to the program (even after the subsequent cohorts in Phase 2 schools became eligible for scholarships), the two groups of students can be tracked over time and their enrollment, attendance, and other outcomes compared.13 Recipients received scholarship disbursements (conditional on remaining in school, attending regularly, and maintaining passing grades) during the 2008/09, 2009/10, and 2010/11 school years.
We use three main data sources to evaluate program impacts. First, we use the full set of data collected for program eligibility: data on baseline household characteristics (which were used to construct the poverty index) as well as mathematics and Khmer language test scores for all students in the 207 schools. Second, we use the official list of students who were offered a scholarship.14 Third, we use endline data that were collected specifically for this evaluation. These data are derived from a survey administered to a random subsample of students from the 207 program schools at the end of the 2010/11 school year, three years after the program began implementation. The students who were in Grade 4 at the start of the program and stayed in school were finishing (or had just finished) Grade 6 at the time of the survey. The survey was administered at home (not in schools) to the child who applied for the scholarship, and it included a household module administered to the child’s mother, father, or other caregiver.
Of the 5,902 students in Grade 4 at baseline (Table 1), we provided a list of randomly selected 2,952 students15 to the survey company to apply followup surveys, as well as a list of potential (randomly generated) “replacement” students to be interviewed if the original person selected could not be found. The survey located 2,274 (678 attritors) for an overall Grade 4 attrition rate of 23 percent. The firm interviewed 174 replacement respondents.
The endline survey includes measures of school attainment, of the “intensity” of school participation (based on questions related to time spent in school), as well as two measures of achievement and cognitive development based on a mathematics and a “Digit Span” test. The items on the mathematics test were drawn from a variety of sources, including the baseline mathematics test, questions from the Grade 6 National Assessment, as well as publicly released items from the Trends in International Mathematics and Science Study (TIMSS) Grade 4 Assessment. A pretest ensured that only items with adequate properties were retained for the final test. The final test measured both knowledge and the capacity to use this knowledge for problem-solving.
Performance on the math test is a measure of the most immediate academic impact of the intervention. The program is expected to have an impact on endline test scores among students selected through each targeting mechanism for two main reasons. First, the program incentivized more enrollment and school attendance; consequently, students are more exposed to school, and through that additional schooling they potentially increase their skills and knowledge. Second, because the program requires that all treated students—those with merit and those with poverty scholarships—maintain passing grades, the program is expected to incentivize students to study more, which in turn should affect their ability to solve mathematical problems.
In a Digit Span test, a series of numbers are read to a respondent who is then asked to repeat the numbers to the enumerator. The series increase from two digits to a larger and larger number, up to nine-digit numbers. Respondents are also asked to repeat the numbers back in reverse order. The test is often included in batteries of psychometric tests, has been used in previous analyses of development programs (for example, see Kazianga, de Walque, and Alderman 2012), and is typically interpreted as a measure of short-term memory and working memory capacity (Schmader, Johns, and Forbes 2008). Nevertheless, recent articles suggest that higher academic achievement can also impact cognitive ability (Cascio and Lewis 2006; Hanushek and Woessman 2008; Carlsson, Dahl, and Rooth 2012). We interpret impacts on both of these tests as measures of academic achievement. These tests exhibit a high degree of internal consistency: The Cronbach alpha statistic for the math test is 0.71 and for the Digit Span test it is 0.65.16 For the multiple choice math questions, there is also a positive correlation between the correct answer to each item on the test and the overall test score and a negative correlation between each incorrect answer and the overall test score.17
Two particular characteristics of the data are noteworthy. First, we use followup information collected from home visits rather than at the school. This strategy allows us to avoid the problems that arise when followup information is collected at school, where students who do not attend would not be present. Second, we collect followup data three years after the students first receive the scholarship, which allows us to capture longer-run effects than most evaluations of the impact of school-related interventions capture.
Our analytical sample consists of Grade 4 students selected for followup who were offered a poverty- or merit-based scholarship or would have been offered a poverty- or merit-based scholarship had they attended a Phase 1 school (Table 1). In total, 570 and 525 treated students in the merit and poverty groups respectively were randomly selected for followup. The firm found 440 students in the merit group and 436 students in the poverty group. The firm received a list of 579 merit control students and 531 poverty control students to follow; they found, respectively, 421 and 378 students (merit attrition rate 25 percent, poverty attrition rate 23 percent). Finally, the household survey collected information from 26 and 18 individuals in replacement households for the merit and poverty treatment groups and 53 in each of the control groups. All in all, the sample we analyze includes 940 students in the merit group and 885 students in the control group. Analytical issues related to attrition, and adjustments for it, are discussed below.
IV. Empirical Strategy and Study Sample
A. Empirical Strategy
We estimate a reduced-form model relating the program to enrollment and attendance outcomes, test scores, and other outcome/process indicators.
We estimate separate and pooled models. For poverty-based treatment, the separate estimation is based on the equation
where Yi,t1 denotes the value of the outcome variable (for example, school attainment) for individual i at follow-up (t1); is a vector of control variables measured at baseline (for example, household characteristics); and ηi,t1 captures unobserved student characteristics and idiosyncratic shocks. is equal to one if the student was offered a poverty-based scholarship, and zero for students in the control schools. An analogous model is estimated for the merit-based scholarships. In each case, the sample consists only of (a) recipients in treatment schools and (b) individuals who would have been recipients of scholarships had they attended a treatment school of that type:
where Index refers to the baseline poverty index in the case of the poverty-based scholarship or the baseline test score in the case of the merit-based scholarship. The pooled model combines the two samples (hereafter “pooled regression”) and includes both treatment dummy variables:
The control variables included in the estimation consist of an indicator for gender; the number of children 14 and under in the household; indicators for whether the household owns a motorcycle, a car/truck, an ox/buffalo, a pig, an ox or buffalo cart; indicators for whether the house has a hard roof, a hard wall, a hard floor, an automatic toilet, a pit toilet, electricity, piped water; and the poverty index and test scores.18 Note that since we are relying on randomized assignment for identification, the inclusion of control variables should not affect the estimate of impacts. We include them because they may potentially increase the precision of the estimates.19
Given that the treatment variable identifies, at baseline, all individuals who were offered the scholarship, the estimator is an intent-to-treat estimator (ITT).20 Errors are clustered at the school level, and each model includes district-level fixed effects.
Before turning to the results, we present validity checks to rule out effects driven by attrition and imbalance on observable characteristics across treatment and control groups.
B. Attrition
Recall that the endline analysis is based on a household survey administered to a random subsample of scholarship applicants. Table A1 presents, for the analytical sample, differences in baseline characteristics between nonattritors and attritors and between nonattritors and students coming from replacement households. As mentioned above, overall attrition is 23 percent. Four out of 16 variables are statistically different between attritors and nonattritors. There are no discernable differences between nonattritors and replacement students or between attritors and replacement students.
More important, however, is to establish that the difference in characteristics between attritors and nonattritors is not systematically different between treatment groups. Columns 1 and 2 of Table 2 report the means of household baseline characteristics for attrited students in control and treatment groups respectively, while Columns 3 and 4 report the analogous means for the nonattrited sample, control, and treatment (standard deviations in parentheses). Column 5 reports the estimate (standard errors in parentheses) and statistical significance of the “double-difference.” This is derived from a regression of each baseline characteristic on a dummy variable for attrition, a dummy variable for treatment, and the interaction of the two. A statistically significant estimate would suggest that the characteristics of attritors differ by treatment group.
The attrition rates for the two types of scholarships are very similar to the overall rate (23 percent poverty scholarship, 25 percent merit scholarship). The table shows balance between attritors and nonattritors across treatment groups. Out of the 32 double-difference estimates, only two are statistically significantly different from zero: the probability of being a girl (for poverty) and presence of hard floor in the dwelling (for merit). These findings are consistent with pure chance, and we therefore do not think that attrition is a likely driver of our results. In addition, we test whether double-differences are jointly equal to zero (based on estimating a Seemly Unrelated Regression model). For both treatments, we fail to reject the null hypothesis that the coefficients jointly equal zero (the p-values of the test are 0.57 and 0.32 for poverty and merit treatment respectively). While these results suggest that selective attrition is not a problem, we nevertheless check that our estimates are robust to how we handle attrition and discuss those findings in the results section.
C. Baseline Balance and Characterization of the Study Sample
This section presents the general characteristics of the study sample and describes the validation of the experimental design. Table 3 presents average characteristics for students in the control and treatment groups by type of scholarship. It includes only treated students in treatment schools—“treatment,” Columns 2 and 5—and untreated students in control schools who would have been eligible for treatment—“control,” Columns 1 and 4—based on their poverty index score or baseline test score, had they attended a treatment school. Columns 3 and 6 report the estimated difference (with standard errors in parentheses) between treatment and control applicants by fitting a regression model of each baseline characteristic against the treatment dummy variable (robust standard errors are clustered at the school level). We also include predicted values for the baseline test score and poverty index, based on an OLS regression of each variable against the other covariates, as characteristics of interest in the table.21
Overall, characteristics differ very little between treatment and control groups. Only a few coefficients in Columns 3 and 6 are statistically significant: Of the 32 differences reported, five are statistically significant. The table also reports a joint test for equality of means. We fail to reject the joint hypothesis of equality of means between treatment and control groups at conventional levels of significance (p-value of 0.29 for the poverty-based scholarship and 0.40 for the merit-based scholarship). The results from these tests confirm the validity of the random assignment.22,23
An important feature of the data is that the distributions of the poverty index and the test score are statistically indistinguishable between the treatment and control groups. Figures 2 and 3 present the density of the poverty index and of the test score at baseline for treatment and control schools. There is a clear overlap throughout the whole distribution. A Kolmogorov-Smirnov test does not allow us to reject equality in both cases.
Table 3 also indicates that the applicants who are offered scholarships on the basis of merit tend to own more assets, have a lower poverty index score, and performed better on the baseline test than students offered a poverty-based scholarship. For instance, the poverty index, which ranges from 0 (wealthiest family in the sample) to 292 (poorest family in the sample), has a mean of 245.1 for (control) students targeted based on poverty and 218.2 for (control) students targeted based on merit. Likewise, the baseline test (ranging from 0 to 25) has a mean of 19.77 for (control) merit-based students and 17.74 for (control) poverty-based students. We return to this issue below when we discuss the tradeoffs between the two targeting approaches. The main finding from Table 3, however, is that the samples are balanced at baseline, allowing us to identify casual effects.
V. Results
A. Impacts on Enrollment and Attendance
The intervention is aimed at directly incentivizing higher enrollment and attendance rates, so scholarship students must stay enrolled, attend school regularly, and maintain passing grades until they graduate from primary school (sixth grade) to continue receiving scholarships. We focus on three enrollment and attendance proxies: the probability of reaching sixth grade, the highest grade completed, and the hours of school attended in the past seven days.
Table 4 reports the program’s impact on these enrollment and attendance proxies. Column 1 presents the estimates for the poverty-based intervention (this is coefficient β1 in Equation 1) using only the poverty sample, and Column 2 presents the analogous estimates for the merit-based intervention using only the merit sample (separate regressions). Column 3 presents the results from the pooled regression and the tests for different impacts across scholarship types (βPov and βMerit in Equation 2). All regressions control for the baseline characteristics presented in Table 3—including baseline test score and poverty index—and district fixed effects; robust standard errors, clustered at the school level, are reported.
The mean of the control group for each outcome variable is a reference point for assessing the magnitude of the impacts (these means are presented in the table): 59.2 percent of the control poverty students reported reaching at least sixth grade, and the average grade completion for that group was 5.37 grades. The equivalent numbers for the merit control group were 61.8 percent and 5.44. The third outcome variable is based on the number of hours students had attended school during the past seven days, conditional on being enrolled. On average, students reported having attended school for about 8.77 hours (the poverty-targeted control group) and 9.22 hours (the merit-targeted control group) in the past week.24
Given that the separate regression (Columns 1 and 2) and the pooled regression (Column 3) show very similar results, we will discuss only the estimates from the pooled sample. Impacts on all enrollment and attendance variables are significant and positive in the poverty-targeted treatment. Poverty-based scholarship recipients are 19.1 percentage points more likely to reach Grade 6 than control students and they complete 0.34 more grades. The estimates for hours in school have large standard errors. The point estimate suggests that poverty-based scholarship recipients spend 2.76 hours more than the control group, but this estimate is not statistically significant.25 Impacts are similar for the merit-based intervention: Students in the merit-targeted treatment group are 13.7 percentage points more likely to reach Grade 6 than control students, they complete 0.225 more grades than control students, and they attend 0.74 additional hours of school compared to the control students (with this last coefficient not statistically significant). These impacts are comparable to those found in the context of the Lower Secondary School scholarship program, where enrollment increased on the order of 20–25 percentage points (Filmer and Schady 2014). These impacts are larger than the majority of impacts documented in countries elsewhere in the world (Fiszbein and Schady 2009), and they are also large when assessed against the small size of the transfer (US$20 per year). The relatively large impacts are likely related to the very low counterfactual levels, suggesting substantial “room to grow.” In Mexico, for example, counterfactual enrollments at the primary school level for the Opportunidades intervention were above 90 percent (Schultz 2004).
Based on the estimates from the pooled sample, we cannot reject the null hypothesis that the impacts are equal between poverty and merit scholarships (p-values of 0.18 for probability of reaching Grade 6, 0.22 for highest grade completed, and 0.27 for hours in school, respectively). In sum, there is evidence that the program increased school participation regardless of whether the scholarship was awarded based on merit or poverty, and the size of the impacts were similar across the two targeting schemes.
We carry out two types of robustness checks to assess the extent to which our results are sensitive to the approach taken to handling attrition. First, we estimate the model excluding the “replacement” students and use only those on the original sampling list. The results (reported in the first row of Table A2) are very similar to our original estimates. This is not surprising: The number of replacement students is low and the characteristics at baseline were very similar to nonreplacement students (Table A1).
Second, we implement Lee bounds (following Lee 2009), starting with the sample that excludes replacement students (results are reported in the second and third rows of Table A2).26 The bounds are quite tight for the probability of reaching Grade 6 and for the highest grade completed: For both scholarship types, both lower and upper bounds are positive and statistically significant. The estimates for hours in school are noisier and it is only for the upper bound estimates that there is a statistically significant impact (for both scholarship types). Lower bounds are negative, so a zero impact lies within the estimated bounds for this outcome.
B. Impacts on Test Scores
Table 5 presents impacts of the interventions on the mathematics and Digit Span tests. The table has the same structure as Table 4: It presents impact estimates from models after controlling for baseline characteristics, in the first two columns by running the model separately for each treatment, and in the third column by pooling the data. All achievement measures are standardized such that the control schools have a mean of zero and a standard deviation of one (when averaged across all students, not just those who serve as the counterfactual for the treatment group). Impacts can therefore be interpreted as changes in standard deviations of the achievement measure.
In contrast to the results on enrollment and attendance, the impacts on test scores differ between recipients of poverty-based and merit-based scholarships. Poverty-based scholarships had no impact on test scores, whereas merit-based scholarships had positive impacts. The impacts of the poverty-based scholarships on the mathematics and Digit Span test scores are estimated to be very small (−0.045 to −0.056 standard deviations, respectively; pooled regression), and they are not statistically significantly different from zero. On the other hand, the estimates of the merit-based scholarships are 0.172 standard deviations for the mathematics test and 0.148 standard deviations for the Digit Span test. We reject the null hypothesis of equality across targeting approaches with 95 percent confidence (p-values of 0.03 for math and 0.01 for Digit Span). We note that these test score impacts are similar in magnitude to those reported for the ex post merit-based scholarship program evaluated in Busia, Kenya (Kremer, Miguel, and Thornton 2009).27
We carry out the same robustness checks as for the school participation indicators. Impacts estimated on the sample that excludes replacement households are very similar: For the poverty treatment, impacts are small, negative, and not statistically significant; for the merit treatment, impacts are positive and significant (size effects are 0.187 standard deviations for math and 0.151 standard deviations for Digit Span). Lee bounds are less precise in this case but point to a qualitatively similar finding. For the poverty scholarship, the lower bound for both tests is negative and the upper bound positive (and only significant for the Digit Span test), which means that the bounds include zero. For the merit scholarship, the lower bounds for both tests are positive (although not statistically significant) and the upper bounds are positive (both significant), which means that the bounds do not include zero. Despite the fact that the bounding exercise is less tight for test scores, and that the lower-bound estimates for the impact of the merit scholarship are not statistically significant, we see these results as consistent with our main finding. That is, whereas both poverty- and merit-based scholarships incentivized students to acquire more schooling, only students who received merit-based scholarships showed any gains in academic achievement.
As discussed, the different results for recipients of poverty-based and merit-based scholarships could potentially derive from the different skills and endowments of individuals in these groups at baseline. We can test this hypothesis directly because some recipients of poverty-based scholarships performed well enough on the baseline test to qualify for merit-based scholarships had they been in a merit-based scholarship school. These high performers in the poverty-based treatment are “identical” to the high performers in the merit-based treatment in every respect but one: Their scholarships were labeled “poverty” scholarships rather than “merit” scholarships. Absent any labeling effects, one would expect to find the same impacts on learning outcomes among these high achievers as among recipients of merit-based scholarships who are similarly poor. This is not what we find.
In order to estimate these heterogeneous effects, we estimate the following models. For each poverty school, we identify students that were below each school’s median baseline test score (hereafter “low test”) and, similarly, for merit schools we identify students below each school’s median poverty index (hereafter “low poverty”). We estimate the pooled specification with each outcome variable against the following variables: a dummy variable for poverty treatment, a dummy variable for merit treatment, a dummy variable for “low test,” a dummy variable for “low poverty,” and two interaction variables—the interaction between poverty treatment and “low test,” and the interaction between merit treatment and “low poverty.” The coefficient on the dummy variable for poverty treatment is therefore the impact for the above-average test scoring recipients; the coefficient on the dummy variable for the merit treatment is the impact for the above-average poverty recipients. If these coefficients are not equal, then this means that merit scholarships for poor students have a different impact than poverty scholarships for high-scoring students—that is, that the differently labeled scholarships have different impacts on “identical” students. In all specifications, we include baseline controls to account for any imbalances in these subpopulations.28
The results from estimating this model are reported in Table 6. As is evident, we lose some power from trying to identify heterogeneous impacts among small subgroups of the population (and we therefore have correspondingly larger standard errors).
For the two main measures of schooling attainment—the probability of Grade 6 completion and the highest grade completed—the impacts of both treatments are positive and significant. The interaction effects are not statistically different from zero suggesting limited heterogeneity in impacts of both types of scholarships on these outcomes. Most importantly, the test fails to reject that poverty scholarships for high scorers have the same impact as merit scholarships among the poorest recipients (p-values for these tests are 0.22 and 0.60 for Grade 6 completion and for highest grade, respectively). The results for school hours are qualitatively consistent, but these lack statistical power.
The results are different for the measures of academic achievement. Here, we find that the impacts of the poverty scholarships are not statistically significant (with no significant difference between high or low test scoring recipients). On the other hand, the merit-based scholarships do have a statistically significant impact (again, with no significant differential impact among poorer or less poor recipients). For both measures of achievement, we statistically reject that the impact of the poverty scholarships on high test scoring recipients is equal to that of merit scholarships for high-poverty recipients (p-values of 0.07 and 0.01 for math and for Digit Span, respectively).29
In summary, we see these estimates as pointing to the notion that the labeling or framing of the scholarships shaped their impact. Absent such shaping, one would expect high-testing recipients who receive a poverty scholarship to have the same outcome as merit scholarship recipients who are poor. But poverty-based scholarships do not appear to have induced better test results among recipients who also had high test scores at baseline, whereas the merit-based scholarships did induce better test scores among all individuals, poor or not poor at baseline.30
C. Regression Discontinuity Design Results
Because the scholarship offers are made according to strict criteria, applicants “just above” and “just below” the cutoff for eligibility could be compared using a regression discontinuity design (RDD) approach to evaluate program impact. The implementation of the selection rules was carried out in a way closely consistent with the intent: A comparison of the official list of scholarship recipients promulgated by the Ministry of Education with the list generated by the firm hired to carry out the scoring and ranking yields no discrepancies. Spot checks at a number of schools yielded no cases of manipulation of the selection process.
Table 7 presents results from implementing the RDD strategy.31 It is worth noting that the RDD procedure is a local estimation, whereas the results presented so far in the paper are average treatment effects. Moreover, in order to gain power, we augment the RDD analysis sample with data from the second cohort (when there was no longer a “pure” control group). Both of these reasons suggest caution while comparing the RCT and RDD results. We nevertheless believe that the RDD estimates provide a useful additional check on the robustness of our findings.
RDD results are very similar to the RCT results. The point estimates for the (local) effect of merit treatment on highest grade completed is 0.51 (and statistically significant); the analogous estimate for the poverty treatment is 0.20 (not statistically significant). In contrast, the effects of treatment on mathematics are 0.04 (not significant) and 0.34 (statistically significant) for poverty and merit treatment respectively. These results were stable to different bandwidths and control functions.32 These results corroborate the asymmetry found in the RCT estimation: positive results on enrollment for both treatments, but only positive effects on achievement for the merit treatment.
VI. Pathways and Program Efficiency
A. Pathways
Both types of scholarships studied here plausibly provide incentives for students or their families to increase effort. We use two measures, based on the data available from household interviews, to assess the program’s impact on the hours that students study outside of school and the share of family expenditures going to education-related items such as textbooks.
As a reference point, students in both the poverty and merit control groups spent around 3.4 hours per week doing school tasks outside of school, and total household expenditures on education averaged approximately US$21. Merit-based scholarship recipients spent more time doing academic work outside of school compared to the control group (an increment of 0.64 hours); the impact for poverty-based scholarship recipients is smaller (0.48 hours) and statistically insignificant (Table 8). However, we fail to reject the null hypothesis of equality of effects (p-value 0.65). Similarly, spending on education differs by type of scholarship received. Households with a merit-based scholarship recipient spent US$4.54 more on education than control students’ households, whereas households that received a poverty-based scholarship did not discernibly increase expenditures on education (point estimate of –1.53, not statistically significant). In this case, we reject the null hypothesis of equal effects (p-value of 0.06).
To illustrate the magnitude of the association between the measures of effort and achievement outcomes, we estimated a simple model that regresses each test score outcome against each effort variable (hours of study outside school and household expenditure in education). The coefficient estimate of hours of study outside the school on math results is 0.043 (with a t-statistic of 3.34) and on Digit Span is 0.028 (t-statistic of 3.35). The coefficient estimate of household expenditure and math is 0.003 (t-statistic of 3.72) and for the Digit Span test is 0.002 (t-statistic of 2.23).33 In short, there is a statistically significant association between the outcomes and the measures of effort.34
In sum, it appears that students work harder and families spend more on education when a merit-based scholarship is awarded. No such impacts are seen among recipients of poverty-based scholarships.
We are unable to investigate a different student- and family-level pathway that may have led to differential impacts, namely different beliefs on the part of recipients and their families in the two study arms. As mentioned above, other programs have shown that the promise of a cash transfer conditional on high endline performance on a test have led to better test scores (Kremer, Miguel, and Thornton 2009). It is possible that recipients of the merit scholarships in Cambodia believed that high performance was required to retain the scholarship. Although this is a possibility, we do not have any empirical or anecdotal evidence to suggest that recipients believed this. Moreover, program socialization was the same in poverty- and merit-based schools, and program rules included the exact same conditionalities. Nevertheless, if such a process were indeed playing out, we do not see this as wholly inconsistent with our interpretation of the results. If labeling a scholarship as based on merit induced the mistaken belief that it was tied to future performance, and that mistaken belief led to better learning outcomes (perhaps through the effort pathways discussed above), then we would see this as supportive of our overall findings. This is, of course, different from explanations linked to concepts such as stereotype threat, but it would be consistent with different behavioral responses to differently labeled transfers.
It is important to recognize that the scholarship program could influence the behavior of not just students and their families but also of other actors such as teachers. For instance, altruistic teachers may provide more attention to students with poverty scholarships to help them retain their scholarships. It is also possible that teachers exert more effort with scholarship students if parents pressure them to do so. Banerjee and Duflo (2006) discuss changes in teacher motivation arising from greater accountability to families in the context of a scholarship program. Last, teachers might put more effort into students who are labeled high performers (so-called “Pygmalion effects”; see Jacobson and Rosenthal 1968). As such, school actors (such as teachers) may also differentially change their behavior depending on how the scholarships are labeled.35,36 Our data do not allow an analysis of these supply-side effects: We note, however, that for these supply-side effects to drive our results they would need to play out differently for recipients of poverty-targeted as opposed to merit-targeted recipients. So, again, while the pathway may be different from the “effort” effect discussed above, it nevertheless reflects a differential behavioral response to the labeling of the scholarships.37
B. Program Equity and Efficiency
Given the two targeting approaches, the Cambodia pilot scholarship program is well suited to shed light on the potential tradeoff between efficiency (defined as achieving more per dollar transferred) and equity (defined as reaching the poorest members of the population).38 Analyses of the socioeconomic profile of program applicants and recipients under the two targeting schemes, and comparisons to the national distribution of socioeconomic characteristics, suggest that both targeting approaches are heavily skewed to the poor. The first panel in Figure 4 shows that 50 percent of those who applied to the program are in the poorest nationally benchmarked wealth quintile (70 percent in the poorest two quintiles); fewer than 3 percent of applicants were from the richest quintile. The fact that the program was targeted to poor areas and to schools serving high concentrations of poor children in Cambodia accounts for this pattern. Unsurprisingly, scholarships targeted to the poorest from each school yields a greater pro-poor distribution of benefits than when targeting by merit. In the poverty-targeted schools, 63 percent of applicants were from the poorest quintile of the population (85 percent were in the poorest two quintiles). Merit-based targeting is less pro-poor but is still able to reach the poorest groups in the population. In the merit-targeting schools, 54 percent of applicants were from the poorest quintile of the population (and 76 percent from the poorest two quintiles).
The finding that merit-based targeting did not result in an overall regressive scheme is a reassuring result. Part of this may be due to the fact that there is an overall weak correlation between baseline test scores and baseline poverty among the targeted students. We estimate that, of the merit-based scholarship recipients, 56 percent would have received a scholarship had the school been a poverty-based one; of the poverty-based scholarship recipients, 52 percent would have received a scholarship had the school been a merit-based one.39
It is possible that the relatively narrow geographic targeting of the program led to this result. Indeed, the locations for this intervention are remote and poor and there is relatively limited heterogeneity in student poverty. This raises the question of whether our results are generalizable. In other settings—for example, where there is greater heterogeneity in student poverty levels—the tradeoff between the alternative targeting strategies might be starker. Nevertheless, to the extent that the result is somewhat generalizable, the findings suggest that there may be ways to enhance both equity and efficiency in this type of program. Indeed, if poverty-based scholarships trigger lower motivation and effort (or other behaviors that negatively affect learning outcomes) than a merit-based scholarship, not because of differences in the underlying skills of the population but because of framing, this will affect how the equity-efficiency tradeoff is made. If the framing of merit-based scholarship triggers high motivation and self-esteem (or other behaviors), then this targeting mechanism can induce efficiency gains.
VII. Conclusions
The fact that some students were able to take better academic advantage than others from additional school exposure highlights an issue rarely addressed in previous evaluations of CCT programs. Recent evidence on monetary incentives for schooling shows that students are able to change their behavior on the margins that are under their control—for example, enrollment and attendance. These positive effects do not necessarily translate into test score gains, however. For example, despite the fact that Mexico’s Oportunidades program—a rigorously evaluated CCT program—induced students to enroll and attend additional years of school, the program did not induce better test scores. A recent set of papers has argued that education systems in developing countries are typically tailored toward better-off and better-skilled students. Specifically, Glewwe, Kremer, and Moulin (2009) show that only the strongest students at baseline were able to take advantage of textbooks that were provided to schools in Kenya. Furthermore, Duflo, Dupas, and Kremer (2011) show that teachers who were assigned to students at the bottom of the achievement distribution were less likely to teach.
The most important finding of our study is the asymmetry of response to the two targeting mechanisms. Both poverty-based and merit-based targeting schemes induced higher enrollment and attendance, yet only the merit-based mechanism induced positive effects on test scores, and the merit-based students (and their households) exhibited a higher level of effort in education. These results are not driven by differences in baseline skills across recipients of the two types of scholarships. Poorer students who are academically prepared did not gain significantly in skills and knowledge (measured by test scores) from the additional schooling they received as a result of the simple poverty-targeted incentive. Clearly more work is needed—in Cambodia and elsewhere—to establish the best approach to ensure that additional schooling translates into learning. Remedial lessons for students in the early grades, or increasing school readiness among poorer students—for example, through early child development programs—might be approaches to try. Indeed, data from Cambodia suggest that children suffer from substantial delays in cognitive development, which hamper their school readiness (Naudeau et al. 2011).
The experience evaluated in this paper suggests another approach based on the finding that incentivizing school attendance in a way that recognizes academic potential can pay off in measurable learning outcomes. By changing the framing of the program, and the associated labeling of the recipient, the merit-based scholarships distributed by the program led to better learning outcomes. We are able to explore two pathways for how this might have worked. We find that merit-scholarship recipients and their families appear to have been motivated to exert greater effort in terms of hours spent studying and in terms of education expenditures. Poverty-based recipients did not appear to have been motivated in a similar way.
We recognize that our results are subject to a number of caveats. An ideal “labeling” experiment would have been to randomly assign scholarships and randomly assign labels within schools. This was not our approach. We interpret the fact that impacts on learning are different for recipients with “identical” characteristics but for whom the labeling of the scholarship differs as evidence of “labeling” effects. This is perhaps a broad understanding of the term; it encompasses behaviors of a number of agents. We document two specific behaviors of two such agents—students and their families. But there are others that we cannot document—such as beliefs about scholarship retention or teacher behaviors. Understanding what is driving the different responses, and assessing the response of teachers to these different targeting approaches, would be useful for future research to shed light on.
Our findings nevertheless suggest that it should be possible to frame demand-side incentive programs in a way that maximizes their impact on learning outcomes. Such an approach could be undertaken in a way that does not necessarily imply a stark tradeoff between equity and efficiency. Indeed, the results show that among poorer children who received merit-based scholarships, the impacts on school participation as well as test scores were large. Scaling up an approach that targets students with an incentive that recognizes their high academic potential—while ensuring that the poorest students are among that set—is likely to maximize both the equity and efficiency objectives of the program.
Footnotes
Felipe Barrera-Osorio is an assistant professor of education and economics at the Harvard Graduate School of Education.
Deon Filmer is a lead economist at The World Bank. The authors thank Luis Benveniste, Norbert Schady, Beng Simeth, Tsuyoshi Fukao, and the members of the Primary School Scholarship Team of the Royal Government of Cambodia’s Ministry of Education for valuable input and assistance in carrying out this work. Adela Soliz provided able research assistance. The paper also benefitted from comments by David Deming, Kevin Lang, Leigh Linden, Muna Meky, Richard Murnane, Halsey Rogers, Shwetlena Sabarwal, and Katja Vinha, as well as from comments by seminar participants at University of Texas at Austin, Harvard Graduate School of Education, and Boston University. The authors are responsible for any errors. This work benefited from funding from The World Bank as well as from the EPDF Trust Fund (TF095245). Barrera-Osorio also received a Faculty Grant from the Harvard Graduate School of Education. The findings, interpretations, and conclusions expressed in this paper are those of the authors and do not necessarily represent the views of The World Bank, its Executive Directors, or the governments they represent. The data used in this article can be obtained beginning July 2016 through June 2019 from Felipe Barrera-Osorio, Gutman 456, 13 Appian Way, Cambridge MA 02138, felipe_barrera-osorio{at}gse.harvard.edu.
↵1. This study narrowly defines efficiency as the effects of a program on educational outcomes per dollar cost of the program. We do not investigate the effects on efficiency of raising the money for the program. The program transfers cash to poor households, which is potentially welfare enhancing in itself, but we do not address this aspect of the program directly. Another issue that is not addressed is whether either of the interventions influences other outcomes such as health.
↵2. Conditional Cash Transfer (CCT) programs typically transfer cash to families that comply with a set of conditions, such as the enrollment and regular attendance of children in school, regular prenatal visits by pregnant women, and regular health checkups for young children.
↵3. Besides giving scholarships concurrent with studies, some programs “promise” students a scholarship if they perform well on a test administered in the future. In theory, the promise of a reward elicits increased effort from students whose abilities place them within reach of a scholarship. Kremer, Miguel, and Thornton (2009) is an evaluation of one such program.
↵4. Primary schools are officially nonfee based. Opportunity costs include various forms of child labor that are relatively prevalent in the study area—although typically labor is combined with schooling at the primary school ages.
↵5. These requirements are moderately enforced. Students absent for many days are followed up by school officials and if they return to school they remain eligible for the scholarship. After a student is absent for too many days, they are classified as having dropped out and are no longer eligible for the scholarship.
↵6. The previous lower-secondary scholarships were in the amounts of US$45 and US$60 per year (Filmer and Schady 2011).
↵7. GNI per capita in Cambodia was approximately US$700 for 2008 (World Development Indicators).
↵8. Scholarship distributions for the cohort of recipients analyzed here (Phase 1) took place in July 2009 (US $20), November 2009 (US$10), April 2010 (US$10), November 2010 (US$10), and April 2011 (US$10).
↵9. The first 14 entries of Table 2 report the full set of variables included in the calculation of the score.
↵10. The weights were determined by estimating a model predicting the probability that a student would drop out of school during Grades 4–6, since addressing this dropout was the stated goal of the program. Strictly speaking, the score should be referred to as a “dropout-risk score.” However, the risk is essentially a set of characteristics that capture the socioeconomic status of a household, weighted to capture those elements that predict dropout best. For convenience and ease of exposition, the score is referred to in this paper, as well as in program documents, as a “poverty” score. It might seem heroic to rely on the responses of fourth graders to a socioeconomic questionnaire to rank households. We believe these are reliable because (1) the items asked about are simple and easily understood by the young respondents, and (2) the poverty score is negatively associated with endline (log) household consumption expenditures for the control group (rho =−0.13, p < 0.01).
↵11. The National Assessment was implemented in a sample of schools nationwide in Grade 3 during the 2005/06 school year (Royal Government of Cambodia 2006).
↵12. The number of scholarships is not equal to half the number of applicants for three reasons. First, the rule was to allocate scholarships to all applicants who had scores higher than or tied with the cutoff score. Second, there were Grade 4, Phase 2 schools that did not implement the scholarship in the period of the first phase (control schools). Third, there is year-to-year variation in the number of enrolled students. Covering half the students in a class/school is relatively high compared to other programs. This might affect the external validity of our findings if they are to inform programs that cover a much smaller share of students.
↵13. Because the scholarship offers are made according to a strict cutoff criteria, it is possible to implement a regression discontinuity design (RDD) approach to evaluate program impact. Below, we present the main result from this strategy.
↵14. We use the official government declaration (“Prakas”) of recipients.
↵15. Includes all Grade 4 students (Phase 1 and Phase 2, treated and nontreated students).
↵16. In Classical Test Theory, this statistic is typically used as a measure of the reliability of a test and captures the covariance among the various items of a test as a share of the sum of the variances and covariances of the items. We interpret it here mostly as a measure of the extent to which the various items on the tests capture a single underlying construct (such as achievement). Values above 0.6 are typically considered “acceptable” and above 0.7 are typically considered “good.”
↵17. There is one item for which this is not true. Our results are robust to excluding this item from the analysis. This “item-total” correlation is used in Classical Test Theory to ensure that each item is measuring the same construct as the other items included. Details on these correlations are available from the authors on request.
↵18. We use dummy indicative variables to impute values for a very small number of missing observations in the control variables. Given the random nature of the program, the imputation does not make any substantive difference in the estimation, although standard errors are slightly smaller since sample size is maximized.
↵19. Additionally, we ran different specifications of the pooled model: first, a model that included as covariate a dummy indicating joint controls of poverty and merit treatments; second, a fully saturated model, without a constant term, that includes both treatment variables, a dummy variable that is one only for merit controls, a dummy variable that is one only for poverty controls, and a dummy variable that is one for the joint controls. Results, available from the authors on request, are very similar to the ones presented here.
↵20. Our analysis is based on an ITT approach starting from the official list of students which maps one-to-one to a list we generate independently from the baseline data. Note that official recording forms for scholarship receipt were generated centrally and prepopulated with student names (and space for a thumb print on collection). If a student had dropped out and could not collect the scholarship, the funds could not be reassigned to another student but would be returned to a central fund for use in a subsequent distribution round. This limits the potential for reallocation of scholarships at the school level. We treat as recipients all those who were initially offered a scholarship whether or not they kept the scholarship for all three years.
↵21. As a further check, we used controls’ information at baseline to create predicted values of each of the main outcomes of interest for the whole sample (treated and control). Over these predicted values, we ran tests of differences across treatment and control groups. We fail to reject the hypothesis of equality for all the variables. Results are available from the authors on request.
↵22. Given that the randomization was performed at the school level, we also analyzed the balance of baseline’s characteristics at this level between treatment and control (for example, all students in treated schools in comparison to all students in control schools—not just those who did, or would have, received a scholarship). The results of this exercise (not presented here) are very similar to the ones reported in Table 2.
↵23. To the extent that differences in these variables is cause for concern, this is mitigated by their inclusion as control variables in the regressions.
↵24. In cases where the school was closed because classes had ended for the year, we assigned a value of 0 because we did not know exactly when the school year had ended for each school. Given the randomization, we do not expect that school closures were systematically different in any of the groups studied.
↵25. Because this indicator measures a different margin than the others, we should perhaps not be surprised that the impacts could be slightly different.
↵26. The approach consists of trimming the data in the treatment or control group (whichever has the least amount of attrition). “Excess observations” are trimmed either from the top or the bottom of the distribution of outcomes (resulting in “upper” and “lower” bounds) to achieve equivalent attrition in both treatment and control groups.
↵27. Note that it is possible that school resources are more “diluted” in Phase 1 schools than in Phase 2 schools at endline (because there are two recipient cohorts in the former and only one in the latter). The bias introduced, if this is the case, would be toward finding no impact on test scores (if more resources improve learning outcomes). Moreover, we would not expect such a bias to differ for poverty-based versus merit-based scholarship recipients.
↵28. We tested for differences at baseline between low- and high-test/poverty index for poverty and merit treatments. For the majority of cases, we failed to reject the null hypothesis of equality. Results are available from the authors on request.
↵29. As in the earlier robustness checks, we reestimate the model excluding the replacement households and find the same pattern: We cannot reject equality of impacts for the school participation effects, while we do for the math and Digit Span tests. Results are available upon request.
↵30. As suggested by an anonymous referee, we also fit models for heterogeneous effects by gender without finding any significant effect. Results are available on request.
↵31. Barrera-Osorio, Filmer, and McIntyre (2014) presents the main tests that validate the use of the RDD estimation.
↵32. The results in the table are from a parametric estimation with a bandwidth of ± 15 points controlling by linear forcing variable and linear interaction at both sides of the discontinuity. For brevity, we present the results for highest grade completed and mathematics. Barrera-Osorio et al. (2014) presents results for alternative specifications.
↵33. We estimated heterogeneity effects for these mechanisms (analogous to Table 6), but due to low power the estimates are highly imprecise.
↵34. We nevertheless acknowledge that these measures of effort may not be sufficient in magnitude to explain the test score differences. We think of these as proxies for a broader set of differences in effort that we do not directly observe.
↵35. There is little scope for explicit “tracking” of students across classes in these schools. At baseline, 90 percent of schools had only one Grade 4, Grade 5, and Grade 6 classroom, suggesting limited scope for reallocating students across classrooms.
↵36. The current consensus of the literature on the Pygmalion effect is that it exists but is small. Teachers also tend to correct information as the academic year progresses (Jussim and Harber 2005).
↵37. An additional pathway could be differential classroom dynamics/peer effects induced by the alternative targeting schemes. If merit scholarships draw in higher-achieving students whereas poverty scholarships draw in low-achieving students, then the resulting classroom composition might be different in a way that affects learning through a channel that is not labeling (for example, teachers responding differently to a different mix of students). While this is a possibility, and we cannot rule it out conclusively, we do not think that it is at play here. First, we find no learning impacts on nonrecipient peers that suggest that this effect impact would only affect recipients—which seems inconsistent with the notion that it is affecting classroom dynamics as a whole. Second, the magnitude of our estimates suggests that scholarships result in an additional two to three students per classroom (out of ten to 12), which we think is likely insufficient to substantively alter the classroom dynamics.
↵38. In a separate exercise, we investigated a different dimension of equity, namely the impact of the program on nonrecipients within the same school. We find that the poverty-based intervention was associated with higher enrollments among nonrecipient peers but find no other substantive or significant impacts. The results are available from the authors on request.
↵39. One might be concerned that the lack of a strong association between the poverty score and test score performance could be due to poor measurement of either or both of these measures. We do not believe this to be the case. The “poverty score” discriminates well between households that own assets that are associated with higher consumption levels and those that don’t; it correlates well with other aspects of monetary welfare—including consumption per capita at endline in the control group (the correlation coefficient is −0.13 with log per capita expenditures, which is statistically significant with p < 0.01); and higher poverty predicts lower grade attainment among the control group. The baseline test score has a Cronbach alpha of 0.85 (which is considered high); there is a positive correlation between the correct answer to each item on the test and the overall test score, and a negative correlation between each incorrect answer and the overall test score; and it correlates with other measures of performance—including math and Digit Span test scores at endline in the control group (the correlation coefficients are 0.12 and 0.11 with math and Digit Span respectively, both of which are statistically significant with p < 0.01).
- Received January 2014.
- Accepted January 2015.