Food for Thought? Experimental Evidence on the Learning Impacts of a Large-Scale School Feeding Program

There is limited experimental evidence on the effects of large-scale, government-led interventions on human capital in resource-constrained settings. We report results from a randomized trial of the government of Ghana ’ s school feeding. After two years, the program led to moderate average increases in math and literacy standardized scores among pupils in treatment communities and to larger achievement gains for girls and disadvantaged children and regions. Improvements in child schooling, cognition, and nutrition constituted suggestive impact mechanisms, especially for educationally disadvantaged groups. The program combined equitable human capital accumulation with social protection, contributing to the “ learning for all ” sustainable development agenda.


I. Introduction
Average learning levels for primary school pupils in low-and middleincome countries (LMICs) are dismal: for instance, only 40 percent of students in Sub-Saharan Africa (SSA) master basic literacy and numeracy at the end of primary school (World Bank 2018).Further, large disparities in achievements are present, with children from lower socioeconomic status or rural households, and sometimes girls, lagging behind the average pupil.This "learning crisis" occurred despite unprecedented expansion in primary school access and completion; in SSA, for example, 78 percent of children at primary school age were enrolled in 2014, up from 58 percent in 1999 (World Bank 2017).Consistent with the principle of "quality education for all" underscored by the Sustainable Development Goal 4, raising average learning achievements in an equitable way is a pressing global educational objective.
Currently, there is very limited rigorous evidence focusing on the effectiveness and distributional impacts of large-scale, government-led interventions on human capital, especially in SSA (Snilstveit et al. 2017).One such intervention is school feeding, which ranks among the world's most common forms of social protection (Alderman, Gentilini, and Yemtsov 2018).Every day, about 368 million children receive some form of school feeding globally, for an estimated investment of $70 billion a year (WFP 2013).In SSA, since the early 2000s, many governments have invested in school feeding as a multisectoral strategy involving education, health, and agriculture, with funding mostly stemming from Ministries of Education (Alderman and Bundy 2012;Drake et al. 2017).At an average cost of US$54 and US$82 per child per year in low-and middle-income countries, respectively, and often with limited poverty targeting, the share of the educational budgets devoted to school feeding is often considerable (Gelli and Daryanani 2013).
This work experimentally addresses whether large-scale, government-led school feeding programs can contribute to equitable learning goals in resource-constrained settings.While school feeding has a robust track record in increasing school participation (Kristjansson et al. 2015;Drake et al. 2017), experimental evidence on its effectiveness on learning is more limited and provides mixed results.Specifically, while some studies find positive effects, but often only for some specific subgroups, underinvestigated topic within the literature on education interventions in LMICs (Evans and Yuan 2018;Bashir et al. 2018;World Bank 2018).For school feeding, Online Appendix Table A.1 demonstrates that there is a lack of systematic investigation of heterogeneity across gender and socioeconomic status.Finally, very few studies have assessed potential channels for impact.
We tackle these questions by evaluating the average and distributional effects of the Ghana School Feeding Programme (GSFP) on child learning.The GSFP currently provides a free, hot-cooked daily meal to more than two million pupils in government primary schools across all districts in the country. 1 In collaboration with the government, we conducted a RCT designed around the retargeting and scale-up of the GSFP to the most food-insecure districts in all regions of Ghana.While the overall trial was aimed at assessing program impacts on education, nutrition, and agriculture 2 (see Gelli et al. 2016), here we report on treatment effects on pre-specified educational outcomes, including child math and literacy scores, and heterogeneity in treatment effects on perprotocol population subgroups.Further, we offer some supportive evidence around possible mechanisms of program impact, including changes in schooling, cognition, and nutritional status.
Ghana's learning challenges are similar to the ones currently faced by many other LMICs.First, while the government's efforts to raise schooling in the 2000s resulted in primary enrollment rates that are among the highest in SSA, average learning levels remain disappointingly low: a 2018 study highlighted that more than 80 percent and 70 percent of Grade 2 and Grade 4 students, respectively, could not read a single word or perform a twodigit subtraction (World Bank 2018).Second, vast inequalities in learning exist by gender, poverty, and place of residence (World Bank 2018).Further, Ghana is highly varied in terms of agroecology, ethnicity, socioeconomics, and political and administrative capacities.Uncovering program impacts in face of this diversity and potential regional variation in implementation and monitoring is of interest for policymakers operating in settings characterized by high administrative and socioeconomic heterogeneity.
Following the methodology outlined in our protocol (Gelli et al. 2016), we document the following intent-to-treat (ITT) findings.After almost two academic years of implementation, exposure to school feeding led to average increases in math, literacy, and a composite score of learning by about 0.15 standard deviations (SD, hereafter).While effects sizes are comparable to estimates from a meta-analysis of smaller-scale trials of school feeding in LMICs (Snilstveit et al. 2017), we note that these moderate improvements started from a low base.Turning to impact heterogeneity, we find that the program especially benefitted educationally disadvantaged groups.Girls' math, literacy, and learning composite scores increased by more than 0.2 SD in school feeding communities compared to controls.Treatment effects among children living in the northern regions, the country's most disadvantaged areas, and for children from households 1.The government approved an expansion of the program to more than three million children by July 2016, but data on actual coverage are not currently available (http://mogcsp.gov.gh/ghana-school-feeding-programme-gsfp/, accessed May 31, 2022).Although the program covers all districts of Ghana from the academic year 2016-2017, it does not cover all schools.Plans to expand in-school meals for all public schools in Ghana are ongoing.2. The results of the analyses on child anthropometrics and community agriculture are part of separate analyses, as per our protocol.Aurino,Gelli,Adamba,and Alderman 77 by guest on March 16, 2024. Copyright 2020 Downloaded from below the poverty line at baseline ranged between 0.25 SD and 0.3 SD across all scores.These findings are likely to correspond to lower bounds of potential effects, as program take-up was imperfect and implementation challenges were present.The latter mostly related to delays in financial disbursements to the caterers that are in charge of procuring food, cooking, and serving the meals.The school feeding intervention also led to increases in grade attainment for the average child, while it promoted enrollment among children from the poorest households and regions.In line with the results on learning, cognitive scores of attention span and short-term memory also improved moderately for the average pupil, while they increased more markedly for educationally vulnerable groups.Nutritional outcomes also improved for girls and the poorest children in treatment communities.
To the best of our knowledge, this is the first large-scale RCT from either a highincome or LMIC setting that investigates the effects of a nationally mandated, government-led program on educational attainments.Thus, we contribute to the experimental literature on school meals by showing the social protection and human capital accumulation of a large-scale program implemented over a relatively extended period.As discussed, the issue of scale is critical because treatment effects tend to decrease with the size of the implementing organization (Muralidharan and Niehaus 2017;Vivalt 2020).Regularity and quality in the provision of meals is key for effectiveness, as children and parents may respond to irregular or lower-quality service in multiple ways (for example, going home for lunch and not returning to school afterwards, changing school, or not attending at all).
Given the low-skill level of our study setting, this study adds especially to the LMICfocused literature.In particular, our work complements the study by Chakraborty and Jayaraman (2019) by providing evidence of the learning effects of government-led school feeding in LMICs.By exploiting staggered program implementation, Chakraborty and Jayaraman's study identified moderate and positive average effects of the Indian "middaymeals" scheme on math and reading.No heterogeneity by gender or household assets was detected.Chakraborty and Jayaraman assess the program at full scale and exploit up to a five-year program exposure, which provides one of the most robust results on learning existing for LMICs.We add to this important contribution not only by employing a cleaner identification strategy, but also by providing evidence from a government program run in SSA and by analyzing potential schooling mechanisms.In fact, even in contexts such as Ghana, where primary school enrollment is compulsory and high, and infrastructure already exists to accommodate all children, there are still concerns around potential deterioration of educational quality, with negative effects on test scores due to system overload and compositional changes, especially at lower primary grades.Overcrowded classrooms and peer effects have previously confounded conclusions on the impacts of school feeding on learning in other settings where baseline enrollment was low3 (Ahmed and Arends-Kuenning 2006).However, based on our results, we deduce that the introduction of school feeding has not impaired average scores.Further, our findings suggest that in contexts characterized by wide educational inequalities such as Ghana, school feeding programs can contribute to "leveling the playing field" by raising learning outcomes especially among children at the margin (Jukes, Drake, and Bundy 2008).
More broadly, we add to the literature on social protection and human capital in LMICs.In this context, existing evidence from large-scale programs that have human capital objectives (for example, conditional cash transfers such as PROGRESA) has overwhelmingly focused on schooling rather than on learning.As summarized by Sarah Baird and coauthors, "Unlike enrollment and attendance, the effectiveness of cash transfer programmes on improving test scores is small at best" (Baird et al. 2014, p. 29).We contribute to this body of work by highlighting the importance of social protection for equitable human capital outcomes.
The next section presents the background and the study design.Section III illustrates the data and identification strategy.Sections IV and V present the ITT estimates and potential mechanisms for impact, respectively.Section VI presents some robustness checks, while Section VII concludes, including a concise discussion of costs.

II. Background and Study Design
A. Educational Setting and the GSFP Despite the rapid economic growth of Ghana in the past decades, food insecurity and poverty are widespread, particularly in rural areas.During the 2000s, the country prioritized school participation through various initiatives, including the GSFP.For example, it made basic education compulsory for children 5-15 years old.These efforts resulted in a substantial expansion of basic education, with primary enrollment increasing from 61 percent in 1999 to 87 percent in 2016 (World Bank 2017).Despite these impressive achievements, an estimated 300,000-800,000 children are still out of primary school, mostly from households below the poverty line and from the country's northern regions (UNDP Ghana 2015).Moreover, Ghana's success in expanding schooling has not been matched by corresponding improvements in learning, which remain overwhelmingly low compared to international standards (Ministry of Education/RTI International 2014).Wide inequalities in achievements exist by gender, poverty, and place of residence (northern vs. southern regions) (World Bank 2018).
The government of Ghana initiated the GSFP in 2005 with a four-year program budget of more than US$200 million (GSFP 2006).Funding for the program is now integrated into the government annual budget.GSFP coordination and implementation are undertaken by a National Secretariat, with program oversight provided by the Ministry of Gender, Children and Social Protection.The program is decentralized; private caterers are awarded contracts by the GSFP to procure, prepare, and serve food to pupils in the targeted schools.Cash transfers (and, recently, electronic payments) are made from the District Assemblies to caterers based on 54 Ghana pesewas per child per day (roughly US$0.33) every two weeks.Each caterer is responsible for procuring food from the market on a competitive basis, preparing school meals, and distributing food to pupils.Supervision at the school level is undertaken by the School Implementing Committees.Delayed reimbursements to caterers are common, with delays as long as half a year or even a whole year (SEND-Ghana 2013).Delayed payments to caterers often result in caterers reducing the quantity or quality of food provided or adjusting the school feeding menus, thus likely influencing program quality and, potentially, effectiveness.

B. Evaluation Design
The trial was designed around the scale-up of the GSFP based on a retargeting exercise conducted in 2012.The government's decision to retarget the GSFP followed a report that highlighted that the program overwhelmingly benefited nonpoor households, with only 21 percent of benefits accruing to poor families (Wodon 2012).Schools and households in school catchment areas (which we call "communities" hereafter) were randomly assigned to two treatment arms: an intervention group, where the GSFP was implemented, and control, where the intervention was postponed until the study completion.The selection of study areas followed a two-step approach.First, 58 priority districts (out of the country's 170 at the time of this exercise) were identified for the scale-up of GSFP.Chosen districts had the highest shares of national poverty and food insecurity based on poverty and food insecurity rankings (see Gelli et al. 2016 for details).Second, due to the relatively small number of clusters, a restricted randomization procedure was used. 4This method was employed to ensure that schools were comparable based on school-and village-level data from the Education Management Information system annual school census data from 2011-2012 (for details, see Gelli et al. 2016).The randomization procedure arbitrarily selected two schools in each district and randomly assigned them to the treatment and control groups.The procedure was repeated 2,000 times, and the research team then selected the permutation with the combination of treatment and control groups that minimized the R-squared of a regression of the selection status on school-and village-level covariates.Following Hayes and Moulton (2017), the variables in the restricted randomization were selected on availability and potential influence on the main study outcomes.These included school enrollment, gender ratio, classroom numbers and infrastructure conditions, accessibility, and NGO support.This step utilized a list of schools not currently covered by the program provided by the GSFP secretariat.We note that the schools were selected from separate communities and that the distance between communities is geographically wide enough to minimize cross-community school enrollment, as each two villages are at least six kilometers apart. 54.This is different from pairwise matching, whereby clusters are paired based on background characteristics, before randomly assigning one cluster within each pair to treatment assignment.This approach has the advantage of enabling balance on more variables than with the stratification randomization method and to provide balance in means on continuous variables (Bruhn and McKenzie 2009). 5.The design also included an agricultural substudy within the intervention group to test whether stimulating the procurement of school food from district-based farmers for half of the GSFP schools would stimulate district-level agricultural outcomes.Treatment assignment to this second level of randomization was achieved through a restricted randomization procedure that was similar to the one to assign the intervention to a random subset of schools.Such procedure was developed to allocate the school feeding arm into two subgroups (GSFP and Home-grown school feeding, HGSF), based on variables that characterized the agricultural environment at the district level, including agroecological zone, maize productivity, and employment.The basic idea of the multilevel design of the trial was to compare child-level outcomes (for example, education and health) between children belonging to school feeding and control communities, and the agriculture impacts of the HGSF pilot relative to the regular GSFP at the district level.Thus, all the analysis we undertake in this paper pools GSFP and HGSF in a single school feeding arm.Also, we note that the type of program to which a child was assigned (for example, standard GSFP vs. HGSF pilot) was not predictive of uptake (Online Appendix 3), and that in Online Appendix 10 we show that treatment effects do not differ by these subgroups.Both checks reassure about potential concerns of implementation variation between schools in districts that were randomly assigned to different food procurement schemes could have affected in some way the educational outcomes.
Using a household census at baseline, approximately 25 households with children in the 5-15 target age group were then randomly selected for interview from each community receiving the intervention and 20 households in the communities of the 58 control schools.For further details on the sampling procedures, see Gelli et al. (2016).

A. Timeline and Sample
A baseline survey was undertaken in 116 communities between June and September 2013.Due to an error in the lists received by the GSFP, 25 schools in the study population, including approximately 18 percent of children in the target age group (5-15 years), had already been receiving school meals at baseline and were removed from the study population.We excluded these schools (13 controls, 12 treatment) from the follow-up.Analysis of child and household characteristics show that the excluded communities were more likely to be rural and located in the north of Ghana, and households to be slightly worse off in terms of some sociodemographic characteristics.Children from excluded communities had lower learning achievements, although all these differences were not large (Online Appendix 1).Two additional communities from the same district in Northern Ghana were excluded from the endline survey, due to logistical problems related to local insecurity.
Implementation in most treatment communities began in the academic year 2014-2015, due to bureaucratic delays (see Section III.C).The follow-up survey was conducted in February-March 2016.Given that the academic year in Ghana usually runs from August to May, the program was evaluated after roughly two academic years of implementation.
Both rounds of surveys included detailed modules on household demographics, farm and other assets, expenditures, farming and other economic activities, child anthropometry, and child self-reported6 education indicators for all target-age children in the household, including enrollment, attendance and grade attainment, and educational achievement tests.Of the 4,269 target-age children sampled in 2013, 836 were in the last year of primary school or had already completed primary school.As such, they were not eligible to receive the intervention when implementation began and were therefore excluded from the sample.After three years, we successfully reinterviewed 92 percent of target-age children eligible to receive school feeding, leading to a longitudinal sample of 3,170 children.Data on schools and caterers were also collected (Aurino 2020).

B. Balance of Baseline Covariates and Attrition at Endline
Table 1 presents descriptive statistics of characteristics of the baseline sample by treatment arm.The average child was about 8.5 years old, with children from the school feeding arm on average a month older than the control.Almost all children were enrolled in school at baseline, and a tenth of them attended private schools.The average child had completed less than two years of schooling, and about 11 percent had repeated a grade.Along with the descriptive statistics, we present balance tests to assess whether the randomization was successful in achieving balance of baseline covariates.The only difference between the two groups that was statistically significant at the 10 percent level was age of household heads, which were about one-year-and-a-half older in the school feeding arm than in control communities.These findings, together with the relatively small size of the differences, suggests that the randomization was successful in achieving balance.
Table 2 presents analysis of attrition at the child level.We do not observe any inbalance in the probability of remaining in the longitudinal sample based on school feeding offer. 7Column 2 presents analysis of whether children with higher baseline test scores were more likely to be resurveyed, which did not appear to be the case.Column 3 investigates whether treatment was associated with some child characteristics in Notes: *p < 0.1.N = 3,433.This table presents descriptive statistics for the full baseline sample of eligible children at baseline, stratified by assignment to treatment.The sample refers to all children aged 5-15 interviewed at baseline, prior to attrition.Mean and standard deviation in parentheses.The school feedingcontrol difference column reports the school feeding coefficient of a basic OLS regression with each covariate as an outcome and standard errors clustered at the community level.For each variable, the estimated school feeding coefficient provides the difference between the school feeding and control groups and its standard errors.
7. This result did not change when we split treatment in GSFP and HGSF pilots (results available upon request).

Downloaded from
predicting likelihood of remaining in the sample.We did so by interacting treatment assignment with the background characteristics we use for heterogeneity analysis.This time, we find a joint significance of all regressors at the 5 percent level.Also, the interaction between treatment and children from poor households was moderately significant, as children from poor households in treatment areas were slightly more likely to be re-interviewed at endline (93 percent of baseline children were followed up in treatment communities and 91 percent in control areas, for a total of 22 additional children lost in control areas compared to treatment).Also, boys and children from northern regions were slighlty more likely to be re-interviewed.To evaluate further the possible effects of potential attrition bias on the validity of the impact estimates, the table in Online Appendix 2 presents the balance of baseline and endline characteristics across treatment groups for the full longitudinal sample, as well as for the longitudinal sample stratified by gender, household poverty, and northern region.Across a wide range of baseline child and household backgrounds, there were no differences between school feeding and control arms in key characteristics at both baseline and endline for the longitudinal sample.The only exception is age in months for children from poor households, whereby children in school feeding areas from poor households were older at both baseline and endline than children in control areas.We address this issue by employing age-standardizing test scores, as highlighted in Section III.D. Thus, even if there was some concern of differential attrition by treatment in the case of children from poorest households, balance was generally maintained, particularly in light of the relatively low levels of attrition overall, which lessens concerns of a change in the sampling frame by treatment assignment due to attrition.

D. Program Uptake and Implementation
Sixty-one percent of eligible children at baseline in treatment areas reported receiving school meals in the previous week at endline, which we refer to as overall uptake rate.The uptake rate was 83 percent for those in public primary education, indicating that most children who were still in basic government education (where the program is served) did in fact receive school meals.On the other hand, fewer than 2 percent of children in control areas received school feeding at endline, ruling out the possibility of significant crossover, which would have hampered the experimental design.We also checked whether the introduction of the program led to children in treatment communities switching from private to public schools to receive the program, but we did not find evidence of this (results available upon request).
As the indicator of program uptake was self-reported by the child (or the caregiver in the case of young children), we cross-checked it with mean uptake at the community level to assess whether responses from children living in the same communities were consistent.For 80 percent of the communities, mean uptake was more than 70 percent (with half of them having an average uptake exceeding 90 percent) (results available upon request).Only four communities, all located in the south of Ghana, had average uptake below a quarter of all eligible children, which may be a sign of poor implementation.
Eighty percent of children who reported receiving school feeding in the treatment arm at endline ate the GSFP meal at school during all days in the previous week, suggesting a fairly regular service provision.Twenty-three percent of children in the treatment group reported they were more likely to eat less food at home on days they eat at school, indicating some substitution between meals. 8However, only 4 percent reported bringing their food from the school meal to share at home.Online Appendix 3 presents correlates of child endline program uptake (independent of primary enrollment status) among children in treatment communities.Children aged 5-11 years at baseline were two times more likely to receive school feeding compared to adolescents (12-15 years at baseline), consistent with expectations of older children having progressed to secondary school or being out of school.There was no gender variation in the odds of uptake, while household poverty at baseline and northern regions were predictive of about two times higher chances of reporting school meals receipt.Baseline math and literacy scores were associated with lower odds of school feeding.This finding may be due to faster progression to secondary school for pupils that had higher achievements at baseline.
We do not have access to administrative data on program implementation, but we use data from schools and caterers to investigate variation on implementation in our sample.School data show that for some schools, the program started as originally planned in the first semester of 2013, but for the majority of schools (n = 30), the program started in the early months of 2014.Only one school started in February 2015.There was no indication of discontinuation of the program, but only 37 percent of schools reported having a copy of the district GSFP menu, potentially signaling varying adherence to the nutritional guidelines set by the GSFP secretariat (results available upon request).No regional differences were evident.Nearly 85 percent of caterers indicated that often payments were insufficient to cover operational costs, which led them to resort to credit to avoid changing the content and size of meals (83 percent), cutting portion sizes (9 percent), or adopting a mix of other strategies to reduce costs (for example, reduce personnel).Further, focus groups with children, caregivers, teachers, and caterers did not highlight particular irregularities in service provision, which may have been assured thanks to caterers adopting the strategies mentioned above to face the delays in disbursements (Fernandes et al. 2017).
To understand further whether the financial challenges incurred by caterers resulted in poor-quality meals, we analyzed data from their weekly meal logs, which provided the ingredients used for the meal served during the survey day and the following day.The most frequent meal served was a combination of a starchy food (for example, rice, yam, gari, etc.) with some type of legumes (46 percent of meals), followed by a stew or a soup combining starchy foods and animal-source proteins, mostly dry fish, chicken, or meat (37 percent of meals), and a starch with vegetables, mostly okra or tomato (9 percent of meals).All these meals are consistent with the GSFP menu.In one school in Brong Ahafo the caterer reported serving no meal in both days, while in only three separate instances in schools in the northern regions the caterer reported to have served only a starchy food, but only for one of the two meals surveyed.Figure 1 presents meal content between northern and southern regions, which is relevant to the heterogeneity analysis, highlighting modest variation in implementation across these areas.Although we do not have data on quantities served per child, these descriptive findings suggest that at least the implementation guidelines regarding food diversity seemed to have been followed in most cases.
Finally, we note that the structure of the school day does not change between intervention and control schools.Without the GSFP, students should either bring food from home or buy from nearby vendors.Focus groups highlighted that without the program, however, students often go home to have lunch and may not return to school afterwards, missing out on instructional time (Fernandes et al. 2017).A similar pattern was observed in Uganda, whereby control students had much lower afternoon shift attendance than children in the school feeding arm (Alderman, Gilligan, and Lehrer 2012).Consistent with qualitative reports, analysis of time-use data from our endline survey showed that pupils in intervention school spent additional time in schools compared to control peers at endline, with larger effects for girls and children from the poorest households (30 and 50 additional minutes per day, respectively) (results available upon request).

D. Measures of Child Learning
Given the wide age range included in the target sample, learning assessments evaluated a basic set of skills in literacy and math.Each section of the test began with basic domainspecific questions that progressively increased in difficulty in order to cover different ability levels.The math assessment included questions on recognition of single or double-digit numbers, arithmetic, fractions, and basic problems (for example, how many hours in 120 minutes), while the literacy test assessed letter recognition, reading short words and sentences, and three final questions on completing a sentence with the correct item among four possible choices.The same 15-item math and literacy tests were administered in both rounds.Tests were administered at home to ensure that even children out of school were tested, enhancing internal validity.Parents or schools did not know the contents of the tests, nor the specific date and timing of testing, so they Test scores were standardized by child age in months for each survey round, with the control group having mean zero and standard deviation one, in order to deal with the wide age groups assessed as part of the evaluation.In line with the literature (for example, Banerjee et al. 2007), this was achieved first by removing interviewer effects from the raw scores through ordinary least squares (OLS) regression on interviewer dummies. 9The residuals from these regressions were nonparametrically estimated to obtain age-conditional means and standard deviations.We also generated a composite indicator of learning to address potential issues related to multiple testing, which should enhance statistical power to detect effects that go in the same direction (Kling, Liebman, and Katz 2007).We computed this index as an average from the normalized test scores and then standardized again to the control group within each round. 10In this way, we can interpret estimated ITT effects as the effect size relative to the control group (Banerjee et al. 2015).
Table 3 presents descriptive statistics of raw and age-standardized tests scores in the two learning domains by intervention arm for the longitudinal sample.Children in the school feeding group had larger scores in both rounds, with the difference from control being more pronounced at endline.However, none of the differences prior to the beginning of the intervention appeared to be statistically distinguishable from zero. 11The analysis of the raw scores highlights the low achievement levels in each outcome and survey round: at baseline, on average, children were not able to respond to two out of 15 questions in the math and literacy tests.This proportion increased slightly three years later, but raw endline scores were still very low, with the average pupil only being able to respond to about four out of 15 correct questions for math and literacy, which confirms Ghana's learning challenges.Consistent with these average low achievements, there were no ceiling effects by age at endline due to the test design.For instance, children between five and ten years responded correctly to three questions for both math and literacy, while children aged 11-15 years were able to answer five questions correctly on average.The analysis of age-standardized test scores at endline highlight the progress of children in the school feeding arm across all competencies.
Figure 2 presents the nonparametric distributions of raw (Panel A) and agestandardized (Panel B) scores in math and literacy by treatment arm at both rounds.Floor effects were present, particularly in the baseline data, highlighting that the tests were challenging, particularly for the younger children.A basic reading assessment in Ghana reported similar floor effects, whereby 42 percent and 20 percent of Grade 3 and 9. Controlling for interviewer dummies is a common practice in similar standardizations.It also helped tackling potential language effects, as unfortunately we do not have information on the specific language of test administration.The interviewer spoke the same language of the child.10.Although children were given assessments in all tests, discrepancies in sample sizes across raw and standardized scores reflect inability to convert raw scores into standardized scores (for example, lack of child age in months).A similar issue is highlighted in Graff Zivin, Hsiang, and Neidell (2018).This could be a potential concern if the missing scores correlate with treatment assignment.Regressions of treatment on score availability rules out this hypothesis, as the coefficients are zero and not statistically significant across all outcomes (results available upon request).11.A similar picture emerged from the analysis of baseline differences in raw scores for the baseline sample prior to attrition presented in Online Appendix 5.This provides a further reassurance about potential biases in treatment effects of school feeding on child learning stemming from nonrandom attrition.

Downloaded from
Grade 6 students, respectively, did not respond correctly to any of the test's six questions (Balwanz and Darvas 2013).Moreover, there was an improvement in mean achievement in both competences between baseline and endline, although scores were widely dispersed across the sample.This may reflect alleviation of the floor effects by the endline, but also widening of educational inequalities in the transition from primary to higher levels of education, by which time the most vulnerable children tend to enter the labor market, while the others progress to secondary school (De Groot et al. 2015).The figure also shows that the distribution of age-standardized achievements of the school feeding group appeared to be above the control at endline across the mid-to upper end of the distribution of math and literacy.Online Appendix 5 presents raw scores by child gender, household poverty, and residence (south vs. north Ghana).At both rounds, there were no large and significant differences between girls and boys, while gaps between nonpoor and poor children were evident.The greatest disparities in baseline raw achievements, however, were based on place of residence, underscoring important geographic inequalities in educational quality between north and south Ghana.Children from the southern regions had, on average, responded to about one additional question than northern peers across both competences.This gap was substantially reduced or closed at endline. Figure 3 presents empirical distributions of age-standardized test scores by gender (Panel A), poverty (Panel B), and place of residence (Panel C).While at baseline the distribution of achievements tended to overlap between treatment and control group, highlighting balance of outcomes between the treatment and control by those factors prior to the start of the program, the nonparametric distributions for the school feeding group often tended to shift toward the right at endline, particularly across the mid-to upper ends of the distribution, indicating larger gains in learning and cognition for children receiving school feeding, as compared to those in the control group.
Autocorrelations of test scores between baseline and follow-up were low (math: r = 0.23; literacy: r = 0.31, all significant at p < 0.01). 12This finding may be partially explained by some degree of measurement error and partly by the three-year lag between the assessments.We checked whether low autocorrelation among test scores in different waves is common in longitudinal data with a different data set (the Young Lives study from Ethiopia, India, Peru, and Vietnam).Autocorrelation in vocabulary scores between five and eight years in this sample was also low and roughly comparable to the one related to our literacy scores (r = 0.38, p < 0.01).

E. Identification
We assessed program impact through an ITT approach by comparing test scores between eligible children who were in communities randomly assigned to school feeding and the control.The ITT parameter represents the average effect of offering school feeding to children who were eligible for the program at baseline in treatment communities, regardless of whether they actually had school lunches at endline.
In the analysis plan we outlined two potential strategies to estimate the ITT parameters, depending on outcomes of interest: analysis of covariance (ANCOVA) and difference-indifferences (DiD).The former improves statistical power by conditioning the endline 12. McKenzie (2012), for instance, posits that low autocorrelation ranges between r = 0.2-0.4.Aurino,Gelli,Adamba,and Alderman 91 by guest on March 16, 2024.Copyright 2020   2012) and Frison and Pocock (1992), this is our preferred estimator due to its greater efficiency (defined as retaining unbiasdness with lower variance) in estimating average treatment effects with experimental data compared to a DiD or a post-estimator approach.Gains in efficiency are more marked when outcomes have low autocorrelation, as in our case.In econometric terms, we estimate Equation 1:

Downloaded from
(1) where y it , j and y i(t-1),j represent, respectively, the endline and baseline test scores (when available) 13 for child i residing in community j, SF it,j is a dichotomous variable for a child residing in a community randomly assigned to school feeding and thus uncorrelated with y i(t-1),j , and y r is a vector of region dummies to capture region-specific unobservable characteristics or potential regional variation in quality of implementation.Standard errors were clustered at the community level, which is the unit of randomization for school feeding.b 1 , the coefficient related to school feeding, provides the estimate of the treatment effects.Although we analyze treatment effects on pre-specified outcomes, and we estimate treatment effects on a composite index of learning, we further address multiple hyphotesis testing by adjusting p-values through the Romano-Wolf (R-W) step-down method (Romano andWolf 2005, 2016).These are estimated by running 2,000 iterations and clustering by community.

IV. Impact of School Feeding on Learning
Table 4, Panel A presents ITT estimates for the full sample employing ANCOVA.The randomized offer of school feeding led to moderately significant increases across all test scores (of about 0.15 SD), after adjusting for multiple hyphotesis testing.We then investigate heterogeneity in program effects.Table 4, Panels B, C, and D report treatment effects in models that stratify for child gender, household poverty, and geographical regions, respectively, so that we can evaluate total program effects for policy-relevant subpopulations. 14School feeding led to sizeable and statistically significant learning gains across all competencies for girls, children from households below the poverty line, and those living in northern Ghana.In the case of girls, math and literacy scores increased by 0.24 SD (R-W p < 0.01) and 0.2 SD (R-W p < 0.05), respectively, while the composite index rose by 0.27 SD (R-W p < 0.01).By contrast, the program had a much smaller and not significant effect for boys.For children from households below the poverty line at baseline (Panel C), gains in math and in the composite scores amounted to 0.3 SD (R-W p < 0.01), while the increases in literacy accounted to 0.23 SD (R-W p < 0.05).Similarly, children from the northern regions had increases in math and literacy accounting to a quarter of a standard deviation each (R-W p < 0.1).As for boys, gains among children from better-off households or regions were smaller and never statistically significant.For completeness, we also present DiD estimates for the main treatment effects in Table 5.While the treatment effects arising from both ANCOVA and DiD are in most cases similar, as anticipated, the former estimator proved more efficient than DiD.
In addition, we investigated variation in treatment effects by age in Online Appendix 6.The latter shows that the effect of school feeding was mostly similar between children of different age groups at baseline, with the exception of math.However, in the younger cohort (children who were aged 6-11 years at baseline), effects were more precisely estimated, probably due to larger sample sizes.Also, although it was not part of the analysis plan, we assessed heterogeneity by intensity of exposure to the program based on child's age and grade at baseline.Specifically, children who were either younger five years or who were enrolled in Grade 5 at baseline were considered as being exposed to only one year of program, in contrast to the remaining children, who we consider as having had two years of program exposure.Across all competencies, the interaction between treatment and a dummy measuring two-year exposure was positive but never significant, perhaps due to the limited size of the one-year exposure group (Online Appendix 7).Therefore, while this is suggestive of increasing returns to program exposure, as in Chakraborty and Jayaraman (2019), our data cannot fully assess this hypothesis.
14.We opted for this approach, as compared to a different one in which we would interact school feeding with the policy group of interest, for different reasons.First, we wanted to estimate the total effect of the policy on each subgroup and, second, because the stratification has the advantage that the separate regressions allow all parameters to vary by subgroup.Nonetheless, we tested the differential effect in the intervention between each of the comparison groups in a pooled regression model with interactions, and the Romano-Wolf adjusted p-values are around p z 0.1 in the case of gender and household poverty (results available upon request).were adjusted for multiple testing using the Romano-Wolf (2005, 2016) step-down method with 2,000 iterations and standard errors clustered at community level.The table above presents intent-to-treat effects on each outcome estimated through ANCOVA for the full sample and stratified by child gender, household poverty, and place of residence.Models were estimated through OLS.For each outcome, the model controls for its baseline value, a dichotomous variable related to the randomized assignment to school feeding, and region dummies.Math and literacy scores are age-standardized.The composite index of learning was computed as the average of the math and literacy scores, and then they were standardized to the control group within each round.Household poverty is a dichotomous indicator having the value of one if the household had baseline per capita consumption levels falling below the national consumption poverty line in 2013.Northern regions include Upper West, Upper East, and Northern region.Southern regions include Western, Central, Greater Accra, Volta, Eastern, Asanti, and Brong Ahafo.were adjusted for multiple testing using the Romano-Wolf (2005, 2016) step-down method with 2,000 iterations and standard errors clustered at community level.The table above presents intent-to-treat effects on each outcome estimated through difference-in-differences for the full sample and stratified by child gender, household poverty, and place of residence.Models were estimated through OLS.Models include a dichotomous variable for treatment assignment, a dummy for endline survey, and the treatment effect relates to the interaction between these two variables.It also includes region dummies.Math and literacy scores are age-standardized.Composite indices were computed as averages of the standardized scores and then they were standardized to the control group within each round.Household poverty is a dichotomous indicator having the value of one if the household had baseline per capita consumption levels falling below the national consumption poverty line in 2013.Northern regions include Upper West, Upper East, and Northern region.Southern regions include Western, Central, Greater Accra, Volta, Eastern, Asanti, and Brong Ahafo.

V. Mechanisms
While the RCT was designed to investigate educational outcomes in terms of learning, we offer a supportive exploration of possible mechanisms.Improved schooling, nutritional status, and cognitive capacities constitute potential channels through which school feeding can affect learning.First, school meals may promote enrollment, attendance, and grade attainment by subsidizing educational costs through the provision of a free meal conditional on attendance.Second, by addressing hunger and micronutrient deficiencies, school feeding can positively affect children's learning via reduced morbidity-related absenteeism, better nutritional status, and increased cognitive skills in the classroom, including increased attention and memory (Kristjansson et al. 2015;Afridi, Barooah, and Somanathan 2019).Further, it may be plausible that teachers can be more motivated by interacting with more attentive and responsive pupils (Afridi, Barooah, and Somanathan 2013;Glewwe and Kremer 2006).The potential health impacts of school feeding may be offset by substitution between meals, or changes in the intrahousehold distribution of food, as this could be diverted away from the child receiving the free meal, though evidence of this effect is mixed (Jacoby 2002;Ahmed 2004;Chakraborty and Jayaraman 2019;Kazianga, de Walque, and Alderman 2014).Also, high heterogeneity in the health pathway may be present, with effects most likely concentrated among malnourished children (Krämer, Kumar, and Vollmer 2018;Powell et al. 1998).
In the remainder of this section, we investigate the role of these potential pathways for impact.To estimate treatment effects, we use our preferred ANCOVA estimator, which controls for the baseline values of the outcome variables (Equation 1).We note, however, that results are broadly unchanged when we use DiD (results available upon request).Online Appendix 8 presents descriptive statistics.

A. Changes in Schooling
Table 6 presents ITT estimates of school feeding on the following indicators: school enrollment in any educational level, school attendance (conditional on enrollment) as measured by the number of days the child attended school out of a five-day week, and current grade attended by the child.All of these variables were measured in the household survey with questions directed to the child or their caregiver (for young children) in both survey rounds.These outcomes are included in the study protocol as key schooling outcomes potentially affected by the intervention.Panel A reports ANCOVA estimates of school feeding for the full sample, while Panels B, C, and D report ITT effects by gender, household poverty, and geographical areas, respectively.Increases in school enrollment emerge as an important plausible channel for impact, but only for children from the poorest households and geographical areas.This finding is expected in contexts such as Ghana, where basic enrollment rates are already high, and only the poorest children are excluded from basic education.Treatment effects for attendance and grade attainment were positive across all groups, but only significant for grade attainment of boys and nonpoor children.were adjusted for multiple testing using the Romano-Wolf (2005, 2016) step-down method with 2,000 iterations and standard errors clustered at community level.The table above presents intent-to-treat effects on each outcome for the full sample and stratified by child gender, household poverty, and place of residence.Models were estimated through OLS.For each outcome, the model controls for its baseline value, a dichotomous variable related to the randomized assignment to school feeding, and region dummies.Enrollment is a dichotomous variable indicating whether the child is enrolled to any level of education; attendance is an indicator counting the number of days the child attended by the child in the past school week.

B. Changes in Cognition
The indicator ranges from zero to five days.Current grade provides the educational grade (in years) the child is currently enrolled in.Household poverty is a dichotomous indicator having the value of one if the household had baseline per capita consumption levels falling below the national consumption poverty line in 2013.Northern regions include Upper West, Upper East, and Northern region.Southern regions include Western, Central, Greater Accra, Volta, Eastern, Asanti, and Brong Ahafo.were adjusted for multiple testing using the Romano-Wolf (2005, 2016) step-down method with 2,000 iterations and standard errors clustered at community level.The table above presents intent-to-treat effects on each outcome for the full sample and stratified by child gender, household poverty, and place of residence.Models were estimated through OLS.For each outcome, the model controls for its baseline value, a dichotomous variable related to the randomized assignment to school feeding, and region dummies.Household poverty is a dichotomous indicator having the value of one if the household had baseline per capita consumption levels falling below the national consumption poverty line in 2013.Northern regions include Upper West, Upper East, and Northern region.Southern regions include Western, Central, Greater Accra, Volta, Eastern, Asanti, and Brong Ahafo.
progressive matrices (SPM) and the digit span tests.These indicators represent two distinct cognitive dimensions.While the SPM test is an adaptation of the commonly used Raven's progressive matrices test and measures nonverbal fluid intelligence and problem-solving ability, the digit span test assesses working memory and executive function.For each question of the SPM test, the child was given a set of images and was asked to choose the image that would complete the picture.For the digit span test, the child was presented sequences of numbers of increasing lengths, and was asked to recall the sequences as prompted (forwards) and reversing the number order (backwards).The same 12-item tests were administered across both rounds.As for learning, we generated a composite measure of cognitive development.
School feeding had a positive effect on cognitive skills of the average child, with an increase of 0.12 SD in both the digit span (R-W p < 0.1) and SPM (R-W p < 0.05) scores and an increase of 0.14 SD in the composite score (R-W p < 0.05).Also, consistent with the results on learning, school feeding especially improved the cognitive development of disadvantaged learner groups.Specifically, the offer of school feeding led to an increase of 0.19 SD, 0.27 SD, and 0.25 SD in the digit span scores of treatment girls (R-W p < 0.05), children from poor households (R-W p < 0.01), and northern Ghana (R-W p < 0.01), respectively, as compared to peers in control groups.School feeding also led to increases in the SPM score of more than 0.2 SD among children from the most disadvantaged households (R-W p < 0.01) and regions (R-W p < 0.05).The improvement in the composite cognitive score following the offer of school feeding accounted to 0.18 SD for girls (R-W p < 0.05) and slightly less than 0.3 SD for children from poor households and northern Ghana (R-W p < 0.01).There were also improvements in cognitive development among boys in the treatment arm; specifically, their SPM score improved by about 0.15 SD (R-W p < 0.05) and by 0.12 SD in the composite index (R-W p < 0.1).

C. Changes in Nutritional Status
As per our protocol, a separate paper reports impact results on nutritional status (Gelli et al. 2019).However, given the potential relevance of this channel for learning, we report core results on nutrition.The school feeding program had no effect on the heightfor-age z-scores (HAZ), a marker of chronic nutritional status and on BMI-for-age zscores (BAZ), an indicator of concurrent nutritional status, for the whole sample.However, the program had significant effects on HAZ of girls (effect size: 0.12 SD, p < 0.05) and for young children in households living below the poverty line (effect size: 0.22 SD, p < 0.05).School meals also did not have an effect on the nutritional status of the aggregate school-age population in the northern regions, but the intervention increased HAZ by 0.20 SD in girls living in this area (p < 0.01).

D. Reduced Hunger in the Classroom
We investigate whether our results might have been driven by the fact that children may perform better in learning assessments after having eaten breakfast or the school lunch (Figlio and Winicki 2005).Although we did not record the time when the tests were undertaken, we check if there are differences in whether children from the treatment and from increased human capital (including general equilibrium effects) are not yet fully known.For instance, Bütikofer, Mølland, and Salvanes (2018) estimated that access to a school breakfast program run in the 1930s in Norway had positive long-term and intergenerational effects on education and earnings. 15While we leave these important issues for future research, back-of-the-envelope calculations based on the government of Ghana's transfer to caterers and an average of 200 school days per year suggest that the program cost about US$66 per child per year in the 2015-2016 school year.While this is a very rough estimation, as it does not include full implementation costs (for example, other costs at the school level that are not included in the government budget for school feeding), this figure falls within the range of the average cost per child of school meals in LMICs reported in Gelli and Darayani (2013).Taking inflation into account, 16 the GSFP thus compares well with other programs in LMICs in terms of costs.Also, Gelli and Darayani's estimations of program costs were based on WFP operating costs.As the WFP is the largest school feeding implementer in the world and operates through a centralized model that allows economies of scale, its cost estimates likely provide a lower bound for government programs.This is especially relevant for countries seeking food procurement within national boundaries using "home-grown" approaches, such as Ghana, in order to stimulate internal agricultural production and rural poverty reduction, at the potential cost of raising programmatic budgets through the purchase of locally grown crops (if these are more costly).
Overall, our findings highlight the role of government-led, large-scale school feeding programs as a social protection tool with positive and equitable impacts on human capital, particularly for marginalized groups of learners.Program impacts are especially remarkable when contextualized to the normal implementation challenges related to large-scale programs run in LMICs.These challenges add to the generalizability of our findings to "real-world" interventions, which may face more financial, implementation, and monitoring constraints than small-scale trials.
Increasing average learning levels by narrowing the gaps in the distribution of achievements is critical for sustainable economic and social development.Therefore, school feeding programs remain important educational and social protection tools for attaining the 2030 "learning for all" agenda.

Figure 1
Figure 1 Meal Content by Northern and Southern Regions Notes: This figure presents the proportion of specific meal types served in the day of the interview and in the following day by region.Data on the meal served were taken by the caterer's weekly meal logs.Northern regions: Northern, Upper East, and Upper West.Southern regions: Western, Central, Greater Accra, Volta, Eastern, Asanti, and Brong Ahafo.

Figure 3
Figure 3 Empirical Endline Distributions of Age-Standardized Math (Left) and Literacy (Right) Age-Standardized Test Scores, by Treatment Arm and Child Gender, Poverty and Geographical Region Notes: These figures present endline age-standardized scores by treatment and gender (Panel A), household poverty status at baseline (Panel B), and geographical region (Panel C).Nonparametric distributions were calculated through weighted local polynomial regressions using an Epanechnikov kernel.Household poverty is a dichotomous indicator having the value of one if the household had baseline per capita consumption levels falling below the national consumption poverty line in 2013.Northern regions include Upper West, Upper East, and Northern region.Southern regions include Western, Central, Greater Accra, Volta, Eastern, Asanti, and Brong Ahafo.

92
by guest on March 16, 2024.Copyright 2020 Downloaded from outcome on the assignment to treatment and the baseline value of the outcome.Following McKenzie (

94
The Journal of Human Resources by guest on March 16, 2024.Copyright 2020 Downloaded fromTable 4 Treatment Effects Estimated through ANCOVA: Full Sample, and Heterogeneity by Child Gender, Household Poverty, 01. Confidence intervals clustered at community level in squared brackets.R-W p-values

Table 1
Descriptive Statistics and Balance of Covariates at Baseline, Full Baseline Sample

Table 2
Baseline Correlates of Children Remaining in the Longitudinal Sample This table presents probability of remaining in the longitudinal sample estimated through linear probability models, with standard errors clustered at the community level.N = 3,433 children of target-age prior to attrition.Lower sample sizes reflect covariates that are missing or not applicable.Column 1 shows probabilities of child being followed up by treatment assignment; Column 2 presents odd ratios by baseline learning and cognition, while Column 3 interacts randomized assignment with key variables by which heterogeneity analysis was conducted throughout the paper.Household poverty is a dichotomous indicator having the value of one if the household had baseline per capita consumption levels falling below the national consumption poverty line in 2013.Northern regions include Upper West, Upper East, and Northern region.Southern regions include Western, Central, Greater Accra, Volta, Eastern, Asanti, and Brong Ahafo.

Table 3
Descriptive Statistics and Test of Balance of Raw and Age-Standardized Test Scores, by Survey Round and Treatment Arm, Longitudinal Sample The school feeding-control difference column reports the school feeding coefficient of a basic OLS regression of each outcome over school feeding arm and controlling for child age in months.Standard errors are clustered at the community level.Lower sample sizes in the cognitive scores (as compared to the full longitudinal sample) reflect missing values in those scores. Notes:

Table 5
Treatment Effects Estimated through Difference-in-Differences: Full Sample and Heterogeneity by Child Gender, Household Poverty, and Geographical Areas

Table 6
Table 7 presents treatment effects on two indicators of child cognitive development that are listed in the protocol as potentially affected by the intervention: the standardized Aurino, Gelli, Adamba, Osei-Akoto, and Alderman 99 Treatment Effects of School Feeding on Schooling Estimated through ANCOVA, Full Sample, and Heterogeneity by Child Gender, Household Poverty, and Geographical Areas

Table 7
Treatment Effects of School Feeding on Child Cognitive Scores, Full Sample, and Heterogeneity by Child Gender, Household Poverty, and Geographical Areas