Abstract
This study provides causal evidence on the impact of life skills programming on the mental health of adolescent girls aged 10–19 in three distinct low- and middle-income countries: Tanzania, Bangladesh, and Ethiopia. Life skills interventions significantly improved a component of mental health in all three contexts, with reductions in depression in Tanzania and improvements in socio-emotional development in Bangladesh and Ethiopia. However, findings suggest substantial heterogeneity in impact. Programs that target both adolescent boys and girls appear more effective than those that target girls alone, and existing supportive environments are a necessary condition for programs to improve mental health.
I. Introduction
Approximately one in seven adolescents experiences a mental health illness each year (WHO 2021), with many more suffering from adverse mental health. Poor mental health for young people is itself a concern (Patel 2013), but it also affects long-term human capital accumulation and can result in lower adult earnings and worse employment outcomes (Currie et al. 2010; Currie and Stable 2006; Hale, Bevilacqua, and Viner 2015; Richter et al. 2021; Smith and Smith 2010). In low- and middle-income countries (LMICs)—where 90 percent of adolescents live—conditions of adversity and lack of access to mental health services exacerbate both rates of poor mental health and consequences across the life course (Patel 2018; Reiss 2013; Zhou et al. 2020).
Importantly, mental health in young people goes beyond a diagnosable mental illness to a more comprehensive, multidimensional measure (Azzopardi et al. 2023). The World Health Organization (WHO) defines mental health as, “A state of mental well-being that enables people to cope with the stresses of life, to realize their abilities, to learn well and work well, and to contribute to their communities. Mental health is an integral component of health and well-being and is more than the absence of mental disorders” (WHO 2022, p. 8). Mental health thus includes not only mental illness but also constructs related to resilience, self-control, coping, life-purpose/goals, connectedness, self-efficacy, confidence, and grit (Manwell et al. 2015; Orth, Moosajee, and Van Wyk 2022; Pollard and Lee 2003; Renwick et al. 2022). These broader factors are strongly associated with mental illness among adolescents (Musumari et al. 2018; Tait, French, and Hulse 2003) and also influence human capital and labor outcomes during adolescence and adulthood (Cobb-Clark 2015; Heckman, Stixrud, and Urzua 2006; Krishnan and Krutikova 2012). This broader definition is particularly important for adolescents in LMICs, given limited resources to diagnose mental illness in these contexts and supply-side constraints on care (WHO 2022; Gureje and Oladeji 2022). Thus, interventions that target adolescent well-being more broadly may be a critical tool in tackling adverse mental health in LMICs.
Drivers of mental health among young people are also gendered, with adolescent girls having worse mental health than adolescent boys in many contexts (Campbell, Bann, and Patalay 2021). For young women, particularly those in LMICs, adverse mental health outcomes are often linked to increasingly restrictive gender norms (Baird et al. 2019; Kapungu and Petroni 2017; Koenig et al. 2021). This suggests that interventions that tackle restrictive gender norms may have the potential to improve mental health.
This work provides evidence on the short-term causal impacts of life skills programming (described in detail in Section II.B) on the mental health of adolescent girls aged 10–19 in three diverse LMIC settings (Tanzania, Bangladesh, and Ethiopia), utilizing randomized controlled trials (RCTs). Life skills programs provide adolescents, typically females, with safe spaces and education on practical topics, such as navigating relationships with boyfriends and parents, coping with unequal gender norms, reproductive and menstrual health, and building confidence. They also often include direct links to employment and skill building for older participants. There is heterogeneity in how programs bundle various features, but all have the ultimate objective of empowering girls to live better lives. Online Appendix Table 1 provides an overview of the recent impact evaluation literature on life skills programming for adolescent girls in LMICs.1
We employ a conceptual framework (Figure 1) to understand how life skills programming could impact the mental health of adolescent girls and to help synthesize findings across our varied interventions and contexts. We further explore heterogeneity in treatment effects both traditionally along dimensions identified in the literature (household wealth, education, community gender norms, risk exposure, and parental investment) and using machine-learning techniques to identify treatment heterogeneity across a rich set of observable adolescent and household characteristics.
In all three countries, we find that at least one variant of life skills interventions significantly improves a dimension of mental health, with reductions in depression in Tanzania and improvements in socio-emotional development in Bangladesh and Ethiopia. There is also considerable evidence of important heterogeneity by baseline community, household, and adolescent characteristics. Both traditional heterogeneity analysis using interactions and machine-learning techniques highlight the important role of baseline wealth, which suggests that benefits accrue to the relatively better off amongst a lower income population. The machine-learning heterogeneity analysis further suggests that benefits accrue to those who have broader underlying support systems (for example, enrolled in school, parents in the household, etc.).
This work makes three important contributions to the literature. First, we provide causal evidence on the impact of life skills programs on the mental health of adolescent girls using longitudinal data and interventions from three LMICs. While there is some evidence on the efficacy of mental illness–specific interventions for adolescents (Baird et al. 2020a; Layne et al. 2008; Miller-Graff and Cummings 2022), there is limited evidence on the impact of broader adolescent programming (Barry et al. 2013).2 Given the popularity of life skills programs in LMICs (Haberland, McCarthy, and Brady 2018) and the critical importance of mental health during adolescence, our findings narrow a key evidence gap. Second, by evaluating RCTs with similar goals across three countries and settings, these findings address external validity concerns often directed at small-scale single-context RCTs. Lastly, we investigate heterogeneity of the treatment effects, which allows us to unpack which programs might be more effective and for whom. This heterogeneity analysis provides insight into how and where to scale up similar programs in the future, as well as where additional research is needed.
II. Conceptual Framework
We organize our analysis using the conceptual framework presented in Figure 1, which summarizes how life skills programs could improve adolescent female mental health.3 Life skills programs can improve adolescent mental health by focusing on adolescent empowerment (for example, Leventhal et al. 2022), providing supportive environments and social networks though club membership and involving boys in the community (for example, Boxho et al. 2023), supporting parents (for example, Falb et al. 2016; Miller-Graff and Cummings 2022), promoting social norm change (for example, Andrew et al. 2022), and providing financial support (for example, Boxho et al. 2023). Our conceptual framework outlines these program features as potential change pathways toward improved adolescent female mental health. We also account for household-, community-, and country-level contexts that shape adolescent girls’ experiences.
Each of the identified change pathways has been linked to adolescent female mental health in the literature. Empowering girls has been found to be a key component for recovery from mental distress (Grealish et al. 2017; Swift and Levin 1987). Empowerment can be broadly defined as both a psychological sense of personal control and an ability to affect change in one’s own life and community, the latter of which relies on the malleability of external structures (Swift and Levin 1987). This suggests scope to target broader social and environmental factors in conjunction with girl-focused programming, pointing to the other change pathways.
Peer harassment (for example, sexual harassment and bullying) (Beattie et al. 2019) is negatively associated with the mental health of adolescent girls, suggesting the importance of engaging with boys in programming. Weak parental support (for example, death, absenteeism) is associated with worse mental health outcomes among adolescents, and recent systematic reviews in both high-income and LMIC contexts have highlighted the importance of parental involvement in depression interventions for adolescents (Dardas, van de Water, and Simmons 2018; Singla et al. 2020), motivating the role of supporting parents in adolescent mental health programming.
At the community level, restrictive gender norms are directly associated with worse mental health outcomes for girls (Baird et al. 2019). In addition, such norms affect mental health indirectly through effects on school dropout and early marriage (Beattie et al. 2019; Guglielmi, Mitu, and Seager 2021). Thus, programs promoting community social norm change alongside adolescent-focused programming may be more efficacious at improving female adolescent mental health.
Finally, there is a strong link between poverty and psychological distress (for example, Bøe et al. 2012, 2018), aligning with recent literature on the scope of cash transfers to improve mental health (Baird, de Hoop, and Özler 2013; Zimmerman et al. 2021), motivating the role of financial support.
Our conceptual framework also hypothesizes sources of potential heterogeneity in program effectiveness, based on a rapid review of the recent impact evaluation literature of adolescent life skills programming in LMICs. We summarize this literature in Online Appendix Table 1. We compiled studies through a scoping search of several databases for academic publications and working papers from the year 2010 through mid-2023, as well as relevant gray literature (technical reports of impact evaluations).4 Based on this literature review, we explore possible heterogeneity in program impacts along the dimensions of wealth, education, community gender norms, risk exposure, and parental investment.
III. Study Setting, Sample, Design, and Outcomes
A. Study Settings
This study provides evidence from adolescent girls living in three LMICs—two in Africa (Tanzania and Ethiopia) and one in South Asia (Bangladesh). Online Appendix Table 2 provides a summary of selected country characteristics.
Ethiopia is the poorest of the three countries we study. More than one-quarter of Ethiopians—and more than two-thirds of the rural population—live in extreme poverty, in comparison to 45 percent in Tanzania and only 13.5 percent in Bangladesh. All three countries have high youth dependency ratios, ranging from 40 percent in Bangladesh to nearly 83 percent in Tanzania (United Nations, Department of Economic and Social Affairs, Population Division 2022), highlighting the importance of studying interventions that aim at improving outcomes for young people.
Despite high rates of secondary school enrollment (72 percent), the median age of marriage for women aged 25–29 in Bangladesh is the lowest of the three countries at 16.7 years, and rates of adolescent childbearing are highest, with one-third of women giving birth to their first child by age 18 (UNESCO Institute for Statistics 2020; Demographic and Health Survey 2022). In Ethiopia and Tanzania, the median ages of marriage for women aged 25–29 are 18.1 and 19.6, respectively, and a quarter of women in this age group give birth to their first child by age 18 (Demographic and Health Survey 2022).
B. Study Samples
Our sample includes panel data on more than 4,500 adolescent girls aged 10–19 in Tanzania, Bangladesh, and Ethiopia. Table 1 provides a summary of the sample in each setting.
In Tanzania, our sample includes 1,449 adolescent girls aged 10–19 (at 2016 baseline) living across 79 communities in the Dodoma and Iringa regions of Tanzania. These 79 communities were selected due to their inclusion in an evaluation of BRAC’s adolescent female clubs, Empowerment and Livelihood for Adolescents (ELA), via a cluster-randomized controlled trial (cRCT) in 2009–2011 (Buehren et al. 2017) and where no additional community-level programming was introduced during the subsequent cRCT in 2016–2018 from which this study draws (see Shah et al. 2023 for more details). In communities with ELA clubs (32 communities), all members were selected for the survey. In communities without ELA clubs (47 communities), adolescent girls were sampled based on a household census and similarity to ELA members using propensity score matching based on age, wealth, school enrollment, marital status, and ever-pregnant status.
In Bangladesh, our sample includes 1,200 adolescent girls aged 10–145 (at 2020 baseline data collection) attending Grades 7 and 8 across 100 government and semiprivate schools in Chittagong and Sylhet divisions in Bangladesh. These divisions were selected by the Government of Bangladesh in partnership with the World Bank for this study due to them being relatively disadvantaged with regards to adolescent health and education outcomes. Data collection covers schools in nine districts of Chittagong (Bandarban, Brahamanbaria, Chandpur, Chittagong, Comilla, Cox’s Bazar, Lakshmipur, Noakhali, Rangamati, and Sunamganj) and three districts of Sylhet (Habiganj, Maulvibazar, Sunamganj). Adolescents were randomly sampled from school registration lists to be representative of their gender within their grade and school.
In Ethiopia, our sample includes nearly 1,900 adolescent girls aged 10–12 (at 2017–2018 baseline data collection) living in 126 communities (Enumeration Areas) in two poor, rural zones of central and eastern Ethiopia (South Gondar, Amhara and East Hararghe, Oromia). Ten districts within these two zones were purposely selected for study inclusion (five in each zone), with preference for areas characterized by high levels of food insecurity and high rates of child marriage, while additionally keeping in mind security concerns and programming capacity of the intervention implementation partners (Pathfinder and CARE Ethiopia). Adolescents were randomly sampled from full community lists developed through census-style enumeration by the research team.
Table 2 provides more detail on the study sample within each country, summarizing both adolescent (Panel A) and household (Panel B) characteristics at baseline. Ethiopia focuses on very young adolescents (mean baseline age of 10.9), Bangladesh focuses on adolescents attending Grades 7 and 8 (mean age of 12.7), and Tanzania targets adolescents across a broader age range of 10–22 (mean age of 14.7 in our sample, where we restrict to ages below 19 and younger for this analysis).
Age may be a key factor in driving the success of programs—both due to developmental appropriateness of the intervention and because adolescents may be more susceptible to impacts at certain ages. That said, age is not a perfect marker of development, as factors such as puberty happen over a wide age range and may in fact be more important than biological age (Viner, Allen, and Patton 2017). Understanding the role of age in driving program success among adolescents is part of a much broader research agenda (GAGE Consortium 2019). While this analysis is not well suited to unpack age alongside the many other factors that vary across studies, we control for age in all analyses, include age in our machine-learning heterogeneity analysis, and reflect on the role of age briefly in our discussion.
Our sample is broadly reflective of the national averages shown in Online Appendix Table 2. Most of the sample was enrolled in school (including all girls in Bangladesh, where school enrollment was a determinant of study inclusion), but aspirations for a degree beyond secondary school differ enormously between Ethiopia and the two other samples (58 percent of girls report such aspirations in Ethiopia, compared to 81 percent in Tanzania and 85 percent in Bangladesh). For each country, we summarize an index of adolescent perceptions of norms related to gender roles, which includes the roles of men and women in household decision-making and the role of women in caring for her family and home (Online Appendix Table 3 for more details on this index). Higher values of the index (on a scale of 0–2) indicate more traditional gender norms. Girls in Bangladesh perceive gender role norms as the least traditional (with a mean of 0.890), while girls in Ethiopian study sites perceive gender role norms as the most traditional (mean 1.645).
Panel B of Table 2 provides household-level characteristics for sample households in each country. Sample households in Tanzania are somewhat smaller than the corresponding national average (Online Appendix Table 2), while households in Bangladesh and Ethiopia are larger than the respective national averages (at 5.8 and 6.4 household members). Household head school attainment reflects national-level differences, with education levels in Tanzania (8.7 years) and Bangladesh (7.4 years) much higher than in Ethiopia (2.4 years). An indicator of improved flooring further suggests that households in Tanzania and Bangladesh are more advantaged relative to those in Ethiopia. Finally, an indicator for female caregiver participation in paid work indicates the highest rates of mothers working outside of the household in Tanzania (58 percent) relative to Bangladesh (15 percent) and Ethiopia (9 percent).
C. Interventions and Study Design
We evaluate two variations of a life skills programs using RCTs in each country. The programs vary in the breadth of their curricula across the change pathways outlined in the conceptual framework (Figure 1). Table 1, Panels B and C provide a summary of the interventions and study designs in each setting. We provide more detail here for each country.
1. Tanzania
In Tanzania, we evaluate a goal-setting activity (Goal Setting) that focuses on the first change pathway: empowering girls. Goal setting has been found to be helpful when addressing behavioral or emotional difficulties and has been a valuable tool when used in combination with cognitive-behavioral therapy (Beck 1976; Lochman et al. 2011; Gaudiano 2008). For the activity, facilitators engaged with adolescent girls one-on-one. They asked girls if they were willing to set a goal to remain healthy and stay STI/HIV free for the next year. If they agreed, they went through the process of setting specific, measurable, achievable, relevant, and timely (SMART) goals, as developed by Doran (1981). Girls then identified and committed to up to three specific strategies to achieve the prespecified goal. This initial goal setting activity was implemented in August 2017 and lasted approximately 90 minutes. Facilitators followed up with females four months later in December 2017 to remind them of their goals and discuss successes and challenges in implementing their strategies. This check-in lasted approximately 60 minutes. Follow-up endline data collection took place in mid 2018.
In a subset of communities (32 communities), girls were also members of preexisting ELA clubs. ELA is an education-based intervention designed to empower adolescent females by providing a safe social space, life-skills training, and support in adolescent development. Participation in ELA is voluntary, but members are expected to attend five afternoons per week (for three hours each afternoon). ELA club leaders engage parents in sensitization meetings to orient them to the club purpose and activities. Thus, ELA has components that address supporting parents and promoting community norm change.
The RCT is an individually randomized controlled trial, with 25 percent of girls randomly selected to be invited to participate in Goal Setting, stratified by community (369 treated, 1,080 control). These girls were selected across both ELA (32 communities) and control (47 communities). In the present analysis, we evaluate the Goal Setting intervention within communities where no other programming is offered and within communities where participants were also exposed to ELA. Therefore, we do not evaluate ELA directly, but use the context of the preexisting ELA clubs to explore differences in the treatment effects of Goal Setting when it is layered on top of ELA in contrast to when it is delivered as an independent activity. We hypothesize that Goal Setting will be more effective among ELA participants, who have been exposed to a broader set of life skills and live in communities that have been more broadly affected by the ELA activities. This analysis is on a subset of a broader RCT that includes two additional treatment arms providing sexual and reproductive health programming (see Shah et al. 2023). We exclude these arms from the current study.
2. Bangladesh
In Bangladesh, we evaluate two interventions, delivered virtually during COVID-19-related school closures across 100 schools, that span three of the five change pathways: empowering girls, engaging with boys, and promoting community social norm change. The two interventions are (i) a gender-neutral growth mindset program around malleable intelligence (Growth Mindset) and (ii) a gender sensitization program that focuses on shifting gender norms around girls’ education, as well as sensitizing adolescents to the issues of child marriage and child labor (Girl Rising). The Growth Mindset programming was offered both alone and in combination with the Girl Rising programming. Girl Rising was layered on top of Growth Mindset in a random subset of schools in order to address the structural constraints upheld by gender norms, so that adolescent girls would be better supported in operationalizing the tenets of the Growth Mindset curriculum.
The concept of growth mindset is rooted in Carol Dweck’s research on malleable intelligence. When children (and adults) have a growth mindset, they believe that their ability depends on effort and are not constrained by fixed intelligence—the belief that talents are innate (Dweck 2006). These interventions encourage adolescents to spend time working to improve their skills and academic performance, essentially setting goals for future improvement. Growth mindset interventions can result in higher motivation, effort, and ultimately increased educational attainment (Paunesku et al. 2015; Yeager and Dweck 2012; Alan, Boneva, and Ertac 2019). In our case, the intervention entailed a facilitated group call around an essay on malleable intelligence, followed by the adolescent writing a letter to a friend explaining the concept and responding to a series of messages with growth mindset content via SMS.
Girl Rising uses storytelling to create awareness of gender-based discrimination, change dominant gendered perceptions, promote gender-equitable attitudes and norms, and provide tools to participants to translate changes in attitudes and greater aspirations into behavior change (Vyas et al. 2020). Adolescents are provided a physical storybook with six stories of six girls from six countries, accompanied by six companion videos, shedding light on a variety of gender-based issues, such as early marriage. Adolescents discuss the stories each week in group calls and, after reading all stories, write and share their own stories.
In Bangladesh the RCT is a multi-arm cluster (school-level) randomized controlled trial. Of the 100 schools included in this sample, 32 schools were randomly assigned to the Growth Mindset arm, 34 schools were randomly assigned to the Growth Mindset + Girl Rising arm, and 34 schools were randomly assigned to serve as the control, stratified by school type (government and semi-private) and rural/urban status. Growth Mindset programming was implemented over the course of eight weeks from April to June 2021. The Girl Rising program was implemented over the course of eight weeks from September to November 2021. Girls assigned to receive both Growth Mindset and Girl Rising programming participated in both interventions in the same groups. Follow-up data were collected from March–April 2022. In the current analysis we compare each treatment arm to the pure control group and to each other. For more details on the Bangladesh sample, data collection, and RCT, see Seager et al. (2022).
3. Ethiopia
In Ethiopia, we evaluate the most expansive programming across the countries we study—programming that touches on all five identified change pathways (Figure 1)—across 126 rural communities. Act With Her (AWH) Ethiopia pairs regular curriculum-based group meetings for young female and male adolescents and their caregivers with community-level adolescent-centric systems strengthening activities, offering a more holistic approach to improving adolescent transitions to adulthood.
In the Act With Her program sites, participating adolescent girls meet weekly over a period of ten months in life-skills training group sessions led by a female mentor. Several sessions are devoted to aspirations and goal setting, and discussion of norms and attitudes related to gender is interwoven across a range of topics, including interpersonal conflict and negotiation, violence and safety, physical health and nutrition, sexual and reproductive health, and personal finance. Participating adolescent boys of the same age take part in their own curriculum-based sessions approximately twice a month over the same period, with a male mentor covering a subset of similar topics. Four sessions bring the adolescent boys’ and adolescent girls’ groups together for joint learning and discussion. Over the same ten-month period, caregivers of participating adolescents are invited to six facilitated group sessions of their own, to orient them to the topics covered in the adolescent group meetings and to help them to create a supportive environment for their adolescents. In a random half of these Act With Her communities, key community decision-makers and stakeholders are regularly brought together for structured meetings to discuss social norms6 and adolescent-centric services in order to initiate changes over time. Groups meet monthly to discuss harmful socio-cultural norms relevant to their local community and to devise an action plan as to how they can be tackled.
An additional set of communities in the Ethiopia study, which also received all of the above interventions, were randomly allocated to receive in-kind transfers to participating adolescent girls (Act With Her + Transfers) as well. Girls were allowed to choose from three equal-valued (115 USD) packages: school supplies, hygiene supplies, and a combination package. The majority of girls chose the school supplies option, which included a backpack, writing utensils, eraser, ruler, exercise books, solar lantern, compass, and English and math reference books.
The Ethiopia RCT is a multi-arm cluster (community-level) randomized control trial. Communities were first stratified by district (woreda) and marginalization status (lack of programming, isolated from key services and road/transport infrastructure), and then randomly assigned to Act With Her (58 communities), Act With Her + Transfers (29 communities), or control (39 communities). The program was launched in early 2019, and a follow-up survey was conducted in late 2019 and early 2020, when participants were on average 13 years old. In the present analysis, we compare the two treatment arms to a pure control group. For more details on the Ethiopia sample, data collection, and RCT, see Baird et al. (2020b).
D. Outcome Measures
Our analysis focuses on three dimensions of adolescent mental health that are captured in the WHO definition of mental health—depression, socio-emotional development, and locus of control. We also construct an overall index as a summary measure or a “sufficient statistic” for mental health. Table 3 provides further details and references on measurement by country.
For depression, we focus on two submeasures. First, we utilize a continuous measure of depression from the Patient Health Questionnaire (PHQ) across all samples. The research in Tanzania utilized the PHQ-2 scale (Kroenke, Spitzer, and Williams 2003), while the PHQ-9 scale (Kroenke, Spitzer, and Williams 2001) was administered in both Bangladesh and Ethiopia. These measures have been validated for adolescent samples in LMICs (see, for example, Anum, Adjorlolo, and Kugbey 2019; Mazzuca et al. 2019). The second submeasure of depression is an indicator for moderate or severe depression, based on the continuous PHQ measure.7 As the summary statistics in Table 3 show, the measured level of moderate depression is 13.5 percent in the Tanzania sample, but is less than 2 percent in the other two countries, rates consistent with global averages (WHO 2021). These lower rates are partially due to the younger age profile of our sample in Bangladesh and Ethiopia, but also likely reflective of the challenges of measuring depression among adolescents, particularly in LMICs (Carvajal et al. 2022). Our broader measure of mental health is motivated by the fact that diagnosable depression is only one piece of good mental health (Renwick et al. 2022).
Our second measure of mental health is socio-emotional development. The measurement of socio-emotional development varies across countries, but, in each country, the measure captures ability to overcome adversity or difficulty while navigating life. In Tanzania, we use an indicator for whether the adolescent is confident that they can complete a task set before them. In Bangladesh, we focus on a measure of grit that is generated from a seven-item scale following Alan, Boneva, and Ertac (2019), with scores ranging from 0 to 4, where higher scores indicate more grit. In Ethiopia, we use the Child and Youth Resilience Measure-12 (CYRM-12), a measure of youth resilience (Liebenberg, Ungar, and LeBlanc 2013), with scores ranging from 12 to 36, where higher scores indicate more resilience.
Our third measure of mental health is locus of control. For the Tanzania sample, we use the Pearlin Mastery Scale, which is a seven-item scale measuring the extent to which the respondent feels in control of the events that influence her life (Pearlin and Schooler 1978). The scale ranges from 12 to 40, with higher values indicating higher locus of control. For both Bangladesh and Ethiopia, we use a measure adapted from the World Values Survey (Haerpfer et al. 2020), which asks the respondent to rate how much control she feels she has over her own life on a scale from 1 to 10, with 10 being complete control. Summary statistics in Table 3 suggest that adolescents in all three samples feel a moderate to high level of control over their lives.
For regression analysis, we standardize each measure, except the indicator for moderate or severe depression, around each country’s sample control group mean for consistency of interpretation across countries. In addition, we generate an overall mental health index from the three standardized measures by orienting them all in the same (positive) direction, taking the unweighted mean across the measures, and standardizing around the control group in each country (Kling, Liebman, and Katz 2007). This provides an overall summary statistic across these three diverse constructs of mental health.
IV. Empirical Framework
We estimate intention-to-treat (ITT) regressions, implementing basic reduced form linear regressions as follows: 1
where Y1iv is outcome of interest for adolescent i in village (cluster) v measured at follow-up (t = 1). T is a vector of treatment indicators equal to one if the individual/cluster is assigned to a treatment (and zero otherwise). In Bangladesh and Ethiopia, T includes an indicator for each of the two treatment arms in the respective country: Growth Mindset and Growth Mindset + Girl Rising (for Bangladesh) and Act With Her and Act With Her + Transfers (for Ethiopia). In Tanzania, T includes an indicator for being invited to Goal Setting, an indicator for being in a community with ELA, and the interaction between the two. Each country has a basic set of baseline characteristics in Xʹ0iv to adjust for country-specific sampling design, including adolescent age, with ϕs noting randomization strata fixed effects (as defined in Section II.C). Standard errors are clustered at the level of treatment (village in Tanzania, school in Bangladesh, and kebele in Ethiopia) to account for the design effect. We employ sampling weights as appropriate in Bangladesh and Ethiopia.
While the analysis here is motivated by broader preregistered protocols for each of the three country studies, the subset of outcomes and comparisons investigated do not map perfectly to these protocols.8 Instead, we utilize the conceptual framework (Figure 1) to bring together these three distinct studies and construct outcome measures of mental health that allow for the greatest comparability of measures across studies. To further mitigate against concerns of “p-hacking” we implement two additional robustness checks. First, we calculate q-values for effect estimates that are adjusted for the false discovery rate (FDR), as described in Anderson (2008), to account for multiple hypothesis testing. We utilize Anderson’s Stata code to create FDR-adjusted q-values separately by country and intervention across the four main outcomes. Second, we reestimate all regressions using a series of different covariates to ensure that selection of covariates does not influence the findings, essentially validating the randomization process (approach discussed in detail in Section V.B). Details on the construction of these control variables and how they map to the contextual levels in the conceptual framework can be found in Online Appendix Table 3.
We build on our estimates of average treatment effects to explore broader distributional and heterogeneous effects. This analysis is again motivated by our conceptual framework (Figure 1) and rapid scoping review (Online Appendix Table 1) and serves to shed additional light on for whom and where programs might be more effective, allowing for further triangulation of findings across interventions. We approach this in three ways. First, we conduct Kolmogorov–Smirnov (K–S) tests to assess differences in the distributions of outcomes between treatment arms, moving beyond simple differences in means. Second, we explore heterogeneity by including interactions of the treatment indicators in Equation 1 with five key baseline measures identified in the recent impact evaluation literature (see Online Appendix Table 1): wealth, education, community gender norms, risk exposure, and parental investment (discussed in more detail in Section V.C). Finally, we complement this more traditional heterogeneity analysis with machine-learning techniques and utilize the causal forest algorithm developed by Athey and Wager (2019). This method allows flexible, high-dimensional combinations of covariates to identify who gains from the program in a way that researcher-determined interaction effects would typically avoid (Davis and Heller 2020). We use the generalized random forest algorithm by Tibshirani et al. (2023) to estimate conditional average treatment effects (CATEs) and descriptively summarize the characteristics of individuals who benefit most from the life skill programs based on high and low CATE groups. We provide more detail of the methodology in Section V.C.
We note that the evaluations of these interventions have been rigorously designed using detailed power calculations, with sample sizes selected based on standard minimum detectable effect sizes (see, Shah and Seager 2021; Baird et al. 2020b; Seager et al. 2022). We provide evidence on baseline balance for each country’s randomization in Online Appendix Tables 4–6 and on attrition in each country’s sample in Online Appendix Tables 7–9 (attrition is discussed in Section V.B).
In terms of baseline balance, we see no statistically significant differences for any treatment–country comparison on baseline measures of the outcome of interest (see Online Appendix Tables 4–6). For Tanzania, across 17 comparisons, we find one that is statistically significant at 95 percent confidence, with the joint test not statistically significant. For Bangladesh, across 48 comparisons we find five that are statistically significant at 90 percent confidence, well within the bounds of what would be expected with statistical chance. The joint F-tests on orthogonality across all baseline measures for Growth Mindset versus control and Growth Mindset + Girl Rising versus control are statistically significant (p = 0.000 and p = 0.066), so we further check our results through a series of structured covariate controls in the main estimation as robustness (see below). For Ethiopia, we find six of 48 comparisons statistically significant at 90 percent confidence, with all joint F-tests insignificant.
V. Results
A. Main Results
Table 4 and Figure 2 present impact estimates on the four distinct measures of mental health and the overall index of mental health using Equation 1. Table 4, Panel A provides results for Tanzania, Panel B for Bangladesh, and Panel C for Ethiopia. Results presented in Table 4 include basic controls (that is, adolescent age, geographic, and randomization strata fixed effects).9 Standard errors are presented in parentheses, and p-values from K–S tests of equality of distributions between the treatment and control group are presented in square brackets below the standard errors. Figure 2 provides a graphical representation of the average effects, again with three panels, one per country.10
Panel A of Table 4 and Figure 2 summarize the results from the evaluation of the Goal Setting intervention in Tanzania with and without the presence of ELA. We find no evidence that goal setting alone improves mental health, and it may, in fact, worsen it. In communities without ELA clubs, girls who were invited to Goal Setting are 8.6 percentage points more likely to exhibit symptoms of depression (p = 0.011, Column 2). We also observe a sizeable positive coefficient for the standardized PHQ-2 depression score (0.15 standard deviations), though the coefficient is not statistically significant at conventional levels (p = 0.144, Column 1). However, girls who were invited to Goal Setting in ELA club communities appear to be better off—they are 11 percentage points less likely to exhibit symptoms of moderate/severe depression compared to girls who were invited to Goal Setting in non-ELA communities (p = 0.012, Column 2). It appears that ELA provides a base level of life skills training that might be necessary to make Goal Setting successful in terms of improving mental health.
The mental health index estimates corroborate this with a negative and large coefficient on Goal Setting (−0.165, p = 0.114) and a positive and large coefficient on Goal Setting in ELA communities (0.199, p = 0.128). Figure 3 (Panel A) suggests that this difference in means is driven by distributional differences at the bottom of the distribution.
In Bangladesh, we find no evidence of impacts on depression for either Growth Mindset or Growth Mindset + Girl Rising (Panel B of Table 4 and Figure 2). This is perhaps not surprising given the very low reported rates of moderate/severe depression in this sample (1.9 percent, Table 3). In terms of socio-emotional development and locus of control, however, we find large positive effects from Growth Mindset. In particular, girls assigned to the Growth Mindset arm exhibit 0.317 standard deviations higher socio-emotional development (p = 0.005, Column 3) and 0.175 standard deviations higher locus of control (p = 0.116, Column 4) than girls in the control group. There is also suggestive evidence that differences go beyond mean differences to broader distributional differences, particularly with socio-emotional development (p = 0.128, Column 3). Online Appendix Figure 1 further illustrates this point, showing a clear distributional positive shift for GM compared to the control.
While the coefficients for Growth Mindset + Girl Rising also go in the expected direction, they are small and not statistically significant at conventional levels. For socio-emotional development, the impact of the Growth Mindset + Girl Rising arm is noticeably lower than the Growth Mindset arm (0.140 vs. 0.317), with a p-value on their equality of 0.126 (Column 3). The difference between Growth Mindset and Growth Mindset + Girl Rising is the inclusion of gender-transformative messaging, delivered through stories of girls around the world overcoming obstacles to attain education. Qualitative evidence suggests that many of the girls identified with the stories of other girls’ hardships, which may have highlighted external barriers to achieving goals and muted the salience of the growth mindset messaging. The Girl Rising stories discuss external factors that can prevent girls from completing their education and the role of social support in overcoming these challenges, potentially contradicting the Growth Mindset messaging that promotes internal locus of control and power to reshape outcomes. These factors may have contributed to the small and statistically insignificant impacts of Growth Mindset + Girl Rising. This highlights the need to consider how messaging across programs may complement or undermine each other prior to bundling.
The overall mental health index findings (Table 4, Panel B, Column 5 and Figure 2, Panel B) in Bangladesh summarize these conclusions, with large and significant positive impacts in the Growth Mindset arm (0.286, p = 0.024) and null effects in the Growth Mindset + Girl Rising arm (0.090, p = 0.342). Figure 3, Panel B also highlights that this shift occurs across the distribution, showing a clear distributional positive shift in the mental health index for girls assigned to the Growth Mindset arm compared to girls assigned either to the control or to Growth Mindset + Girl Rising.
Panel C of Table 4 and Figure 2 present results from Ethiopia, where the most expansive programming was evaluated. We find evidence consistent with Act With Her improving mental health for adolescents across outcomes, with the strongest impacts on socio-emotional development. Assignment to Act With Her improved resilience by 0.158 standard deviations (p = 0.020, Column 3). Similar to Bangladesh, there is also strong evidence of a broader distributional shift beyond the difference in means (p < 0.001, Column 3). Online Appendix Figure 1 shows positive shifts in resilience at the upper end of the distribution. There is also suggestive evidence that Act With Her improves mental health, reducing the standardized PHQ-9 index by 0.113 standard deviations (p = 0.115, Column 1), and it improves locus of control by 0.104 standard deviations (p = 0.155, Column 4). Again, we see no impact on moderate/severe depression, but as in Bangladesh, baseline levels were low at 1 percent.
As in Bangladesh, we find that the expanded version of the program in Ethiopia, which offered asset transfers, mutes overall impacts. While coefficients are still in the desired direction, they are no longer statistically significant at conventional levels. However, there is evidence of a distributional shift for socio-emotional development between Act With Her + Transfers (p = 0.041, Column 3) versus control and evidence that the distributions for the two interventions arms differ (p = 0.086, Column 3; see also Online Appendix Figure 2).
One hypothesis for the cause of this muted and uneven effect is that asset transfers were only offered to adolescent girls, not adolescent boys. Qualitative data suggest there was backlash from boys because of this asymmetry that negatively influenced impacts for a subset of girls. Some of the boys reported feeling angry that girls received things like solar lamps while they did not. This result points to the importance of thinking carefully about whether programs should focus specifically on one gender, with potential unexpected adverse effects.
The impacts on the standardized mental health index once again provide a useful summary statistic for mental health, with large, positive impacts for Act With Her (0.172, p = 0.017) and muted effects for Act With Her + Transfers (0.063, p = 0.406), further exemplified by the clear shift in the distribution of mental health, particularly at the upper end of the distribution (Figure 3, Panel C).
In summary, these findings point to the potential of life skills programming to improve mental health of adolescent girls. Across 30 comparisons in Table 4, 23 go in the direction of improved mental health, something unlikely to happen by chance. Of those that go in the undesired direction, two are precisely measured null effects (Act With Her + Transfers), and the remaining five are all for Goal Setting in the absence of ELA—the one intervention that appears to have had an adverse effect on mental health (see Table 4).
The impacts on individual measures further highlight the importance of program design and context in driving impact. In all three countries, two variations of programming were evaluated, and, in each country, there is evidence that one version of the program improved mental health, while there are null effects (and possibly negative impacts) from the other. Impacts in Tanzania highlight the importance of understanding the underlying systems and skills that are needed to support potential program curricula. Reduced treatment effects in Bangladesh when layering gender sensitization programming suggest programs addressing gendered social norms may make salient the limitations they impose. Girl Rising seemed to remind girls of all the gender norms in their communities, which make their lives more difficult. Muted programming impacts from an asset transfer in Ethiopia remind us of the potential perverse effects of offering programming to particular groups, but not to their peers. We further unpack heterogeneity in Section V.C and return to the role of program design in the final section.
B. Robustness
1. Attrition
Overall, there is little evidence that attrition is driving our findings. In Tanzania there is no evidence of differential attrition, either overall or by baseline covariates or outcome measures (Online Appendix Table 7). For Bangladesh we are more likely to survey (by approximately 3.6 percentage points) those in the Growth Mindset + Girl Rising group compared to Growth Mindset (p = 0.031), with some evidence of differential attrition by baseline covariates for Growth Mindset versus control (p = 0.043, Online Appendix Table 8). In Bangladesh, follow-up survey rates are high at 92.6 percent, so these small differences in follow-up rates are likely inconsequential. For Ethiopia, there is no overall differential attrition, but some evidence of differential attrition between treatment and the control communities based on baseline covariates (p = 0.067 for Act With Her interactions, and p = 0.084 for Act With Her + Transfers interactions; Online Appendix Table 9). Given that follow-up survey rates are slightly lower in Ethiopia (86.1 percent), this further motivates our robustness check to ensure that choice of covariates is not impacting outcomes.
2. Control sets
While the randomized design should limit the effect of covariate choice on the estimated treatment impacts, we subject our main results to a battery of different covariate adjustment strategies to address any potential concerns about baseline imbalance, differential attrition, and the fact that our specific analysis in this paper was not prespecified. We take a structured approach starting with Equation 1, then add sets of adolescent and household covariates sequentially, guided by our conceptual framework (Figure 1). Online Appendix Table 3 describes these covariates in detail. We estimate our empirical models using traditional methods of controlling for covariates and double-lasso regression for covariate selection (Urminsky, Hansen, and Chernozhukov 2016). Online Appendix Tables 10–12 present the results from this estimation.
Column 1 of Online Appendix Tables 10–12 replicates the results presented in the main text, using a sparse control set for each country, based on the randomization design in that location. Column 2 then adds a set of household- and adolescent-level controls that are common across countries, what we call the “consistent” control set, as we can measure them across all three countries. Column 3 implements the double-lasso regression using the previously described consistent controls, and Column 4 uses the same method but adds an additional set of country-specific household and adolescent controls, which we call the “expanded controls.” Finally, Column 5 implements the same specification as Column 4 but with the addition of the baseline outcome where available. The estimation results are consistent across specifications, supporting that randomization was implemented successfully and that any small baseline imbalances are likely inconsequential.
3. Multiple hypothesis testing
Finally, Online Appendix Table 16 provides the original p-values alongside the FDR-adjusted q-values to assess whether statistically significant findings are likely due to multiple hypothesis testing, as opposed to real impacts. Reassuringly, across all comparisons, results are qualitatively the same once the FDR adjustments are made. Ultimately, this battery of robustness checks further confirms our core findings.
4. Heterogeneity
While our overall findings suggest that life skills programs have the potential to improve mental health for adolescent girls in LMICs, they also highlight important variation in impacts across countries, interventions, and the pre-intervention distribution of the outcome of interest. It is possible, and in fact likely, that there is further heterogeneity in treatment effects based on sample baseline characteristics. To try to further unpack these findings and assess for whom the interventions are working—in the hope of providing some generalizable insights for future programming—we conduct within-country heterogeneity analysis. We tackle this both through traditional interactions and through machine-learning techniques.
Heterogeneity Interactions
The conceptual framework in Figure 1 outlines five key dimensions along which we test for heterogeneity, which we derived from a rapid review of the impact evaluation literature on adolescent life skills programming (Online Appendix Table 1). We identified 34 randomized evaluations of adolescent life skills programming in LMICs between 2010 and mid-2023. Fifteen of these studies conducted heterogeneity analysis (summarized in Online Appendix Table 1). While our samples do not allow us to study heterogeneity across dimensions such as ethnicity, age, marital status, and rural–urban residential location, we do provide analysis along the following dimensions: (i) household wealth, which captures baseline financial support; (ii) access to education, which captures aspects of both parental support and community norms; (iii) community gender norms, which captures how supportive environments in the community might be toward young women; (iv) baseline depressive symptoms, which captures an element of adolescent risk exposure; and (vi) an indicator for whether the adolescent’s mother lives in the household, which captures parental support.
Table 5 displays results from regressions that interact treatment indicators and baseline measures of each aforementioned dimension: (i) an indicator for above-median assets in the country sample,11 (ii) an indicator for enrolled in school in Tanzania and Ethiopia and for aspired to university degree in Bangladesh,12 (iii) an indicator for community gender norms greater than the median in the country sample,13 (iv) an indicator for PHQ score greater than median value within the country sample, and (v) an indicator of mother living in the household at baseline. Given this analysis is exploratory, we focus this discussion on the overall index of mental health but include results for heterogeneity for each of the other four mental health outcomes in Online Appendix Tables 17–19.
Focusing first on wealth in Column 1, Table 5, we find strong evidence in Bangladesh and Ethiopia that positive impacts on mental health are significantly larger for girls in relatively wealthier households for the programs that had statistically significant overall effects (Growth Mindset in Bangladesh and Act With Her in Ethiopia). We find no evidence of heterogeneity for the two interventions that had overall null effects (Growth Mindset + Girl Rising and Act With Her + Transfers). In Bangladesh, Growth Mindset has a 0.456 standard deviation (p = 0.008) larger effect for adolescents from households with above-median assets. Similarly, in Ethiopia, Act With Her had a 0.342 standard deviation (p < 0.001) larger effect in wealthier households. In Tanzania, though not statistically significant at conventional levels, there is also evidence that girls in households with above-median assets in the Goal Setting only communities fare better than those in households with at- or below-median assets (0.168, p = 0.242), corroborating the hypothesis that a base level of resources is necessary for Goal Setting programming to be effective. In ELA communities, where all girls are exposed to supportive programming, the coefficient on the interaction is negative but with a larger p-value (p = 0.558), suggesting that variation in wealth is less important when supportive programming is present. This finding is similar in spirit to findings from the evaluation of graduation programs on financial inclusion where benefits accrue to top quantiles given necessary base levels of access to savings/credit to benefit from the program (Banerjee et al. 2015).
Moving to Column 2 in Table 5, there does not appear to be statistically significant heterogeneity according to school enrollment status, though interactions are positive and sizeable for Tanzania (Goal Setting in ELA communities) and Bangladesh (both treatment arms). This is perhaps not surprising since enrollment rates are high across settings. In Bangladesh, 81 percent aspired to university education. The Ethiopia and Tanzania sample have similarly high rates of school enrollment, at 84 percent and 72 percent. Thus, the estimation of these interaction terms is based on a relatively small number of girls in each treatment arm.
Columns 3–5 show that the heterogeneity findings on the overall wellness index are not statistically significant at conventional for community gender norms, baseline mental health, or presence of mother in the household. That said, they do suggest some interesting dynamics worth further exploration. We highlight a few suggestive findings here.
Looking first at Tanzania, findings suggest that Goal Setting was more effective in communities with restrictive gender norms. Online Appendix Table A17, Panel B, shows that in ELA communities, the Goal Setting intervention is associated with larger decreases in moderate/severe depression in communities with more restrictive gender norms (−0.186, p = 0.034). There is also evidence that Goal Setting had more positive effects on mental health for those with higher baseline adverse mental health; adolescents with worse baseline mental health experienced greater improvements in locus of control (0.292, p = 0.089; Online Appendix Table A17, Panel D).
Turning to Bangladesh, findings suggest that impacts of Growth Mindset are muted in contexts with more conservative gender norms, while the opposite is true for Growth Mindset + Girl Rising, suggesting that the Girl Rising curriculum may be important in contexts where gender norms are particularly restrictive. We see a similar pattern in Bangladesh with regards to baseline mental health, with Growth Mindset once again less effective for adolescents with higher baseline adverse mental health, with the opposite for Growth Mindset + Girl Rising. This is further supported by Online Appendix Table A18, which shows a significant negative interaction of Growth Mindset with above-median gender norms (−0.450, p < 0.027) and a positive and significant interaction of Growth Mindset + Girl Rising with above-median PHQ score at baseline (0.415, p < 0.001). This points to the potential importance of more comprehensive programming for those that are more at risk.
In Ethiopia, both programs appear to be most effective in contexts with restrictive gender norms, a result that likely reflects the importance of the gender transformative curriculum. This is further supported in Online Appendix Table 19, which shows that the interaction of Act With Her + Transfers with more restrictive gendered norms is negative and significantly associated with lower PHQ scores, indicating the program is more effective at reducing depression in contexts with more restrictive gender norms. On the other hand, both variations of Act With Her were less effective for those with worse baseline mental health, pointing to the importance of a more direct focus of mental health in the curricula. Finally, across all contexts, having a mother in the household generally appears to bolster program effectiveness, illustrating that broader support systems may be important for successful programming.
Overall, our heterogeneity findings using traditional interaction methods provide some evidence of important heterogeneity, particularly in terms of household wealth. We summarize the findings as follows: (i) programs appear to be more effective when some other type of support system exists (for example, wealth, education, mother in the household) and (ii) programs that focus specifically on a certain constraint (for example, gender norms) do work better for subgroups that face this constraint (for example, Act With Her is more effective in communities with more restrictive gender norms). However, this appears to come at a cost, as these targeted components are also counterproductive for those who do not face the constraint. This points to the importance of both identifying the constraints that are preventing adolescent girls from achieving optimal mental health and then exploring targeting of the program. We expand on this further in our discussion.
Conditional Average Treatment Effects
Next, we estimate conditional average treatment effects (CATEs) to shed further light on which adolescent girls are benefiting from these programs. Specifically, we estimate CATEs nonparametrically using the causal forest function developed by Tibshirani et al. (2023). This technique allows us to be largely “agnostic” about the set of possible dimensions for heterogeneous treatment effects and uses a machine-learning approach to assess differential effects by population subgroups. It has become increasingly common in empirical economics (for example, see Athey and Wager 2019; Davis and Heller 2020; Abaluck et al. 2020) to avoid ad hoc heterogeneity analysis. See the Appendix at the end of this paper for more description on the estimation process. Based on the estimated CATEs, we then categorize adolescents into high and low average treatment effect (ATE) groups by creating below- and above-median CATE subgroups and estimate the ATE for each group. The final step in the process is to calculate and compare the mean values of our covariates of interest across the two CATE subgroups. This approach enables us to present descriptively the characteristics of adolescents who are more likely to benefit from treatment and those who are least likely to benefit.
For power reasons, we apply the causal forest analysis to data comprising adolescent girls pooled across all three countries by assessing heterogeneity for each of our five main outcomes mapped to a multidimensional vector space of common covariates observed across all countries (see Online Appendix Table 1 for full list of covariates).14 We present results of the primary heterogeneity analysis via augmented inverse propensity scores (AIPW) based on random forests fitted to pooled data across study countries for all combinations of treatment arm comparisons.15 Table 6 shows whether certain adolescents benefitted significantly more than others from the life skills programming—that is, is there any evidence of significant heterogeneity? If there is a significant difference in the effect between those who benefited the most (above- or equal-to-median CATE) and those who benefited the least (below-median CATE), we can then further unpack the characteristics of adolescent girls that are more likely to benefit.
Table 6 provides estimates of the CATEs across all outcomes and treatment arms. Across the 30 estimates, 50 percent have the opposite sign on the effect size between the above- and below-median CATE, and seven out of 30 show statistically significant differences between the two groups. For example, returning to our overall findings presented in Table 4, in Bangladesh there was no overall impact of Growth Mindset on the standardized PHQ depression score (−0.018, p = 0.844), but when we look at Table 6, we see that this masks considerable heterogeneity. For those with above-median CATEs (that is, more negative impacts), we see a −0.166 (p < 0.10) standardized effect size, while for those with below-median CATEs (that is, fewer negative impacts), we observe a positive coefficient (0.129, p = 0.196). The difference between these two is 0.295 and significant at the 95 percent level. Similarly, in Tanzania, we see no overall impact of Goal Setting + ELA on locus of control (0.096, p = 0.488), but we see substantial differences between those with above-median CATEs (0.106, p = 0.418) and those with below-median CATEs (−0.261, p < 0.05). Once again, the difference between these two is large (0.367 standard deviations) and significant at the 95 percent level. This finding suggests important underlying heterogeneity that drives some adolescent girls to benefit, while others are not impacted or, in some cases, made worse off.
In Table 7, we further explore differences in the characteristics of adolescent girls and their households, comparing those who benefited more (above- or equal-to-median CATE) to those who benefited less (below-median CATE). Looking at the types of adolescents who are more likely to benefit from programming across a wider set of characteristics than our empirical based heterogeneity analysis allows us to further unpack the types of adolescent girls likely to benefit from programming to inform future program design. We focus this analysis on the seven comparisons where there is a statistically significant difference between the two CATE groups in Table 6, as this is where there are potentially important differential treatment effects to unpack further. We focus on covariates that map to the conceptual framework from Figure 1.
Table 7 paints a complex picture of the underlying heterogeneity. We highlight a few findings worthy of further exploration in future studies. First, similar to our traditional heterogeneity findings in Table 5, benefits appear to accrue to the wealthier households. For example, in Tanzania (Goal Setting in ELA communities) and Bangladesh (Growth Mindset and Girl Rising), we find that above-median CATEs on locus of control are concentrated among the wealthiest. In Tanzania, the asset index was on average 5.56 (compared to 4.58) for adolescent girls who experienced above-median CATEs, and in Bangladesh, it was 6.82 (compared to 4.07). Again building on our traditional heterogeneity findings, we see that effects in Tanzania are also concentrated among those enrolled in school—88.6 percent of adolescents who experienced above-median CATEs were in school versus 56.1 percent of adolescents who had below-median CATEs. In Bangladesh, where all girls were enrolled at school at baseline, we similarly see that adolescent girls who had aspirations to earn a bachelor’s degree are more likely to have improved locus of control as a result of Growth Mindset + Girl Rising; in the above-median CATE group 90 percent aspire versus 74 percent in the below-median CATE group.
We are also able to explore age in this analysis. In Tanzania, where we have the widest age range, we do see larger treatment effects of Goal Setting in ELA communities on locus of control for younger adolescents (14.0 vs. 15.4). This finding points to the potentially important role of intervening early and supports a current push in the adolescent research field on building the evidence base on interventions for very young adolescents (Blum et al. 2014).
More generally, Table 7 suggests that benefits accrue to those that have broader underlying support systems, a finding that is also supported in our traditional heterogeneity analysis (Table 5). For example, in Tanzania, Goal Setting in ELA communities is more effective at improving locus of control for adolescents who are enrolled in school, have a father in the household, are wealthier, and have higher education aspirations at baseline. In Bangladesh, Growth Mindset + Girl Rising has larger effects on locus of control once again for wealthier households, those where parental education is higher, and those where adolescents had higher education aspirations. Finally, in Ethiopia, we observe larger reductions in depression for adolescents in Act With Her where the father and mother are in the household and the adolescent is enrolled in school and aspires to more education.
Taken together, the machine-learning findings largely align with the findings from the traditional heterogeneity analysis and, in this way, serve as a useful robustness check. They do, however, add some additional valuable nuance. In particular, these findings highlight that existing support goes beyond household wealth and mother in the household to household head’s education and presence of a father in the household. In a sense, this helps create a broader understanding of what a supportive environment consists of, providing valuable insight for future programming.
Ultimately, this exploratory heterogeneity analysis underscores that programming is most effective when household and community contexts are conducive to adolescents exercising agency and when adolescents have some preexisting underlying capabilities. Alongside girl-focused adolescent programming, it is likely necessary to incorporate intervention components specifically targeting supportive household and community environments to compensate when they are absent (for example, transformative community gender norm programming that is included with Act With Her and ELA with Goal Setting).
VI. Discussion and Conclusion
Our findings provide novel evidence that life skills programs in LMICs can promote the mental health of adolescent girls. In Tanzania, Bangladesh, and Ethiopia, we see positive movement on at least one dimension of mental health. The size of these effects is on par with effects of life skills programming on other outcomes of interest for adolescent girls.16
While the life skills interventions were successful at promoting mental health across all contexts, program design mattered. Effects appear larger in contexts where programs included adolescent boys alongside adolescent girls (Bangladesh and Ethiopia). This aligns with findings from Evans and Yuan (2022) and Shah et al. (2023), who show that programming that targets both genders is more effective at improving outcomes for females. This point is further supported by evidence from Ethiopia, where providing asset transfers to adolescent girls but not adolescent boys muted effects, indicating that favoring one sex over another could have detrimental impacts for all. Adding further nuance to the importance of program design, in Bangladesh, adding gender-focused programming (Girl Rising) to the gender-neutral intervention (Growth Mindset) muted the large and positive impacts of Growth Mindset. Qualitative evidence suggests that Girl Rising served as a reminder of the challenges that adolescent girls need to overcome to achieve their goals. The potential negative impact of making gender norms salient aligns with some of the suggestive evidence from heterogeneity analysis that effect sizes are smaller in communities with more gendered norms.
The existence of basic support structures seems like a necessary condition for program effectiveness. In Tanzania, the introduction of Goal Setting was harmful in the absence of supportive environments (ELA). Without the life skills provided by ELA, goal setting likely appeared unrealistic, thus creating undue stress for recipients. Both traditional interaction methods and machine-learning techniques highlight the importance of the existence of household wealth (measured by assets) in promoting better outcomes. While it is important to keep in mind that these measures are all relative—many of the better off households are still poor by standard definitions—this finding suggests that broader systems to promote adolescent well-being are necessary at the household and community level for life skills to bring about change, and the absence of them may in fact turn impacts negative. This is also why interventions that tackle constraints at multiple levels are likely to be particularly important when contexts do not have the broader systems in place. At the same time, our heterogeneity analyses point to the importance of tailoring programming to the specific constraints that adolescents are facing to maximize program benefit.
Finally, our results suggest that life skills interventions targeted at younger adolescents may be more efficacious at improving mental health more broadly. In Bangladesh and Ethiopia, where interventions focused on younger adolescents (mostly 10–14 years old in Bangladesh and 10–12 years old in Ethiopia), one intervention in each setting improved socio-emotional development, and there were suggestive improvements in locus of control. In addition, in our exploratory heterogeneity analysis, there is evidence of improvement in locus of control in Tanzania among younger adolescents. These results point to intervening with adolescents early. At the same time, Tanzania, where 50 percent of the sample is older than 15 years, is the only country where there was evidence of reductions in depression, as measured by the PHQ, suggesting that impacts on depression manifest at older ages during adolescence, when prevalence is higher (WHO 2021). This aligns with previous research that has found that for depressive disorders, 3.1 percent of diagnosed cases are onset by age 14 years and 13.2 percent of cases are onset by age 18 (Solmi et al. 2022) and points to the importance of continued intervention over the course of adolescence to address changing needs.
Our findings, drawn from diverse populations exposed to six interventions across three countries, begin to unpack the conditions under which life skills interventions are likely to promote mental health for adolescent girls. As life skills programs remain a feature of adolescent programming globally, future research is needed. Ultimately, life skills programming is unlikely to be the golden ticket to improved adolescent mental health without improving gender norms and the education, health, and financial systems that underpin the broader society where LIMC adolescents reside.
Acknowledgments
Sponsored by the NOMIS Foundation and the Center for Health and Wellbeing at Princeton University.
r denotes Manisha Shah, Sarah Baird, Jennifer Seager, Benjamin Avuwadah, and Joan Hamory are joint first authors. The order in which these authors’ names appear has been randomized. The authors thank participants at the Causes and Consequences of Child Mental Health conference hosted by the Center for Health and Wellbeing at Princeton University and the Ottawa Applied Microeconomics Lab, for their valuable comments and feedback; in particular, they thank Corrine Low for her insightful discussion. They thank David McKenzie, Berk Özler, and Luca Parisotto for useful conversations. Tania Ismail expertly assisted in the creation of the conceptual framework figure, and Saini Das assisted with the literature review. The authors gratefully acknowledge funding for the three underlying impact evaluations from The Hewlett Foundation, the Africa and South Asia Gender Innovation Labs at the World Bank, the Global Financing Facility at the World Bank, and UK Aid through the Gender and Adolescence: Global Evidence program. This research would not have been possible without the dedicated in-country partners, BRAC Maendeleo in Tanzania, Innovations for Poverty Action and Room to Read in Bangladesh, and Laterite and the Ethiopian Development Research Institute in Ethiopia, as well as implementation partners in Ethiopia, Pathfinder International and CARE. Most importantly, the authors are thankful to the young people and their families, who generously shared their time and experiences for the purpose of this research. The replication files for this analysis are posted at Jennifer Seager’s personal website (https://sites.google.com/view/jennifer-seager) under the publications tab, and at https://drive.google.com/file/d/1waZh5PYjCfbdmaUf1LekUzPOlKJEcoH_/view. If you have any questions, please email Jennifer Seager jseager{at}gwu.edu.
Appendix: Heterogeneous Treatment Effects Estimation via Causal Forest
We present a step-by-step process of how we estimate heterogeneous treatment effects (HTEs) in this section. We use the grf package developed by Tibshirani et al. (2023) to estimate all our causal forest models and implement these stages in R. In general terms, this package combines random forests, a popular machine-learning (ML) algorithm, with causal inference techniques. In our case, we use this to explore and understand how the mental health effects of life skills programming varies across observed characteristics of adolescents. The estimation algorithm is able to handle complex relationships across our various covariates without imposing any strict assumptions about our data generating process. The algorithm performs best in large sample cases, due to potential issues of reduced power to detect HTEs and risk of overfitting the models resulting in unstable HTE estimates. We circumvent some of these challenges by pooling our data across our study settings and conducting standardized checks to validate the robustness of our estimates. Our estimation process comprises the following stages: training, evaluation, and exploring heterogeneity stages. We discuss details of these three stages below.
Training stage. The first step in building our causal forest model to assess HTEs is the training stage. In this stage, the causal forest algorithm develops several decision trees, and each is trained on a random sample of observations and all covariates across study countries to predict the treatment effect for each adolescent girl. We utilize a doubly robust (DR) estimator in the training stage to ensure that we achieve the best model performance. The DR estimator first predicts both treatment propensities and outcome values for each observation. We then combine the predicted treatment propensities and outcome values as inputs for our main causal forest model to minimize bias in the estimated conductional average treatment effects (CATEs). The DR approach uses the two models to construct a consistent estimate for the CATEs even if either one of the models are misspecified.
Other inputs for the trained model include the treatment and outcome values, our set of covariates, and other arguments like “tune.parameters” that ensure optimal model performance.17 For instance, by specifying the “tune.parameters = TRUE” in our case, the model considers the total number of observations in each tree node split trees. This ensures all characteristics of observations in each split node are considered in the classification process for estimating the treatment effect for each observation. We then use the causal forest model to estimate CATEs for each adolescent observation. We repeat the estimation for various treatment and outcome combinations in the pooled data. In total, we estimated 30 different CATE models that comprised a blend of treatment indicators for the different arms of setting-specific treatment for each outcome of interest.
Evaluation. After estimating our CATEs in the training phase, we evaluate the model’s performance regarding potential overfitting issues. This is to assess whether our models have captured the true underlying patterns of heterogeneity in our data set and to ensure that CATEs are not biased due to noise or random fluctuations of signal in the data set. Following Chernozhukov and Semenova (2018), we use the best linear predictor (BLP) regression to evaluate the performance of CATE predictions on the held-out data. The BLP estimation validates the performance of our trained causal forest model on held-out trials (that is, predicting the values of the outcome for subpopulations not used to train the models). Overall, the BLP analysis showed that our models performed well and were efficiently calibrated. Thus, we did not have to do additional hyperparameter tuning. The mean prediction (coefficient that shows the presence of heterogeneity) form the BLP analysis, range from 0.83 to 2.53. The closer a coefficient is to one, the stronger the presence of heterogeneous effects in the data. We test the statistical significance of the models with BLP mean prediction coefficients between 0.95 and 1.2 to assess the presence of heterogeneity.
Exploring heterogeneity. After estimating the CATEs and evaluating the model performance, the next step is to assess whether there is variation in the adolescent level treatment effects across the different treatments and outcomes in each study country. To do this we first disaggregate the pooled data to the country level. We then group adolescent girls in each country according to whether their CATE estimates for each outcome are above (or equal) or below the median treatment effect (that is, groups girls into high or low treatment effects). We then estimate the average treatment effects in these two subgroups across each outcome and country-specific treatment separately using the augmented inverse probability weights (AIPWs) predicted from the propensity model during the training stage. We then make inferences on the evidence of heterogeneity by testing if the identified treatment effects in the high and low CATE subgroups are statistically different from each (that is, test null of no heterogeneity in treatment effects). This procedure is heuristic, as the “high” and “low” subgroups only provide qualitative insights about the strength of heterogeneity in treatment effects. Overall, we find that only seven out of the 30 models we estimated showed strong evidence of heterogeneous effects. We then select these models for the final stages of the analysis. In the final step, we determine the important dimensions of heterogeneity in our treatment effects by calculating the sample means of the covariates used to train the model for the CATE subgroups and test whether the sample means are statistically different when comparing adolescents with above-median CATE to those below. Through this, we are able to identify qualitatively the characteristics of young girls/women who benefit most (and least) from the life skills programs.
Footnotes
↵1. These programs are offered both in the community, providing safe spaces for both in- and out-of-school adolescents (Bandiera et al. 2019, 2020a, 2020b; Buehren et al. 2017; Buchman et al. 2022), as well in school settings, incorporating programming into the curricula (Dhar, Jain, and Jayachandran 2022; Ashraf et al. 2020; Edmonds, Feigenberg, and Leight 2023). While findings from life skills programs are mixed (Buehren et al., 2017), they have been shown to increase engagement in income generating activities (Bandiera et al. 2019, 2020a; Buchman et al. 2022), decrease unintended teen pregnancy and early entry into marriage or cohabitation (Bandiera et al. 2019, 2020a), protect adolescents from the worst effects of negative systemic shocks (Bandiera et al. 2020b), reduce school dropout and improve educational outcomes (Edmonds, Feigenberg, and Leight 2023; Buchman et al. 2022; Ashraf et al. 2020), reduce experience of intimate partner violence (Shah et al., 2023), improve gender attitudes (Dhar, Jain, and Jayachandran 2022; Edmonds, Feigenberg, and Leight 2023), and increase agency (Edmonds, Feigenberg, and Leight 2023).
↵2. There is a limited evidence base on the impact of cash transfers (Baird, de Hoop, and Özler 2013), as well as for specific vulnerable populations such as adolescents affected by HIV/AIDS (Ssewamala, Han, and Neilands 2009) and children (Fernald, Gertler, and Neufeld 2009) on mental health.
↵3. This conceptual framework builds on that of the Gender and Adolescence: Global Evidence (GAGE) research program (GAGE Consortium 2019), which uses a capabilities approach as pioneered by Sen (1984, 2004) and further developed by Nussbaum (2011) and Kabeer (2003) to capture the importance of gender dynamics. The capabilities approach recognizes that the ability of individuals to achieve overall well-being, including mental health, is a matter of what they are able to do and who they are able to be, given the contexts within which they live.
↵4. We started with a review of studies included in Temin and Heck (2020). We included all of these studies and also used this review to generate search terms for academic databases. Our database search included ScienceDirect, PubMed, Google Scholar, World Bank, and the National Bureau of Economic Research, and we sought English language articles including the words “adolescents” or “girl groups” and “life skills training.” We eliminated any studies that were not conducted in an LMIC or were not randomized. We additionally included research known to the authors related to group therapy and/or growth mindset and reviewed papers included in Singla et al. (2020).
↵5. Seven percent of the adolescents were aged 15–18.
↵6. Work on catalyzing shifts in social norms is primarily focused on applying CARE’s well-known Social Analysis and Action (SAA) approach to gender and social transformation, which seeks to enable communities to identify for themselves the linkages between social factors and well-being, and then determine what actions will help improve them (Mekuria, Sprinkel, and Cowan 2018).
↵7. PHQ-9 scores of 10 or higher are classified as “moderate or severe depression” (Kroenke, Spitzer, and Williams 2001). PHQ-2 scores of 3 or higher are classified as likely “major depression” (Kroenke, Spitzer, and Williams 2003).
↵8. For Tanzania, the data used for this paper are a subsample of a broader registered RCT (Shah and Seager 2021). While the specific outcomes used in this paper are not prespecified, they fall within “other health behaviors” and “behavioral economic parameters,” which were prespecified. The Bangladesh sample is a subset of a broader registered RCT (Seager et al. 2022). The measure of socio-emotional development, grit, is a component of a prespecified primary outcome measuring socio-emotional skills. The Ethiopia sample studied in this paper is also a subset of a broader registered RCT (Baird et al. 2020b). Both the depression and resilience scores were prespecified as primary outcomes for the Act With Her impact evaluation, and the depression indicator was prespecified as a secondary outcome of the study. Only the locus of control measure was not prespecified.
↵9. We provide results using more extensive control sets in Online Appendix Tables 10–12, and the results are qualitatively the same.
↵10. Online Appendix Tables 13–15 provide additional detail on program impacts item by item for those interested in impacts on specific questions underlying each measure. We choose to focus the main results on the validated scales for the discussion in the paper.
↵11. We construct an asset index using principal components analysis following the methods of Filmer and Pritchett (2001). A household is defined as having above-median assets according to their asset score.
↵12. In Bangladesh, school enrollment was the basis for study enrollment. Thus, we instead use aspirations to university degree in place of enrolled in school.
↵13. To generate a measure of community gender norms, we calculate the average of the Index of adolescent gendered attitudes related to gender roles (defined in Online Appendix Table 3) at the community level (village in Tanzania, school in Bangladesh, kebele in Ethiopia). We use sample weights in Bangladesh and Ethiopia. We then generate an indicator that the community-level gendered attitudes are above the median within each countries study sample, meaning norms around gender are more conservative.
↵14. Although the causal forest algorithm can handle complex relationships across our various covariates, it performs best in large sample cases. We pool our data to overcome some of the challenges of power to detect HTEs and unstable HTE estimates due to overfitting when sample sizes are relatively small.
↵15. The results we present use the robust augmented inverse probability weights (AIPW) approach to estimate CATEs and ATEs within CATE subgroups. The AIPW approach leverages information about adolescents’ characteristics to predict the likelihood of receiving treatment. It then adjusts the data to give more weights to adolescents who received the treatments but were less likely to and less weight to those who did not receive the treatments but more likely to (Athey and Wager 2019).
↵16. For example, Dhar, Jain, and Jayachandran (2022) find effect sizes on gender norms of 0.18 standard deviations of gender-focused life skills programming. Bandiera et al. (2020a) find ELA in Uganda has effect sizes of 0.12 standard deviations for gender empowerment and 0.370 standard deviations for HIV knowledge.
↵17. When the “tune.parameters” is turned on, our models identify the ideal values for parameters such as num.trees, min.node.size, and splitrule for which our models performs best.
- Received December 2022.
- Accepted August 2023.
This open access article is distributed under the terms of the CC-BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) and is freely available online at: https://jhr.uwpress.org.