## Abstract

We study gender gaps in learning and the effectiveness of female teachers in reducing them using a large, representative, annual panel data set from the Indian state of Andhra Pradesh. We find a small but significant negative trend in girls’ test scores in both math and language. Using five years of panel data, we find that teachers are more effective at teaching students of their own gender. Female teachers are more effective at teaching girls than male teachers but no worse at teaching boys. Thus, hiring female teachers on the current margin may reduce gender gaps in test scores without hurting boys.

## I. Introduction

Reducing gender gaps in education attainment has been an important priority for international education policy and is explicitly listed as one of the United Nations Millennium Development Goals (MDGs). This commitment has been reflected in the policies of many developing countries and substantial progress has been made in the past decade in reducing gender barriers in primary school enrollment. One key policy that is credited with increasing girls’ education is the increased recruitment of female teachers (UNESCO 2012, Herz and Sperling 2004). UNICEF has documented the practice in a variety of countries, including Bangladesh, India, Liberia, Nepal, and Yemen, and the United Nations’ Task Force for achieving the MDGs has advocated hiring more female teachers as an effective policy mechanism for reaching the goal of universal primary education of girls (UNDG 2010, Rehman 2008, Slavin 2006).

Although the idea that hiring more female teachers can bridge gender gaps is widely prevalent among policymakers, there is very little empirical evidence from testing this hypothesis in developing countries. In this paper, we study the causal impact of having a female teacher on the learning gains of female students using one of the richest data sets on primary education in a developing country. The data set features annual longitudinal data on student learning measured through independent assessments conducted over five years across a representative sample of 500 rural schools and over 90,000 students in the Indian state of Andhra Pradesh (AP). The data also include detailed information on teacher characteristics and on their assignments to specific classrooms in each year.

The combination of panel data and variation in the gender of teachers and students allows us to estimate the causal impact of matching teacher and student gender in a value-added framework. Identification concerns are addressed by showing that our estimates of gender matching do not change under an increasingly restrictive set of specifications including school, school grade, teacher, and student gender by grade fixed effects. We also show that there is no correlation between the probability of being assigned a female teacher and either the fraction of female students in the class or the mean test scores at the start of the year. Further, our estimation sample is restricted to schools that have only one section per grade, which precludes the possibility that students may be tracked across sections and that female teachers may be assigned to different sections based on unobservables.

We report five main findings in this paper. First, we find a small but significant negative trend in girls’ test scores in both math (0.02σ/year) and language (0.01σ/year) as they advance through the five grades of primary school.^{1} Girls have significantly higher test scores in language and equal test scores in math relative to boys at the end of first grade but score almost on par with boys in language and significantly worse in math by the end of Grade 5. These results are consistent with evidence of gender gaps in test scores (particularly in math) documented in both high- and low-income countries (Fryer and Levitt 2010, Bharadwaj et al. 2012) and suggest that the growing gender gaps documented at later ages in both these papers probably reflect a cumulative effect of a trend that starts as early as primary school.

Second, using five years of panel data and school grade and student gender by grade fixed effects, we find that teachers are 0.034σ/year more effective in teaching students of their own gender *relative* to teachers of the opposite gender. In other words, female teachers are 0.034σ/year more effective at reducing the gender gap in achievement than male teachers. Because female teachers differ from male teachers on several characteristics that may be correlated with teacher quality, we test the robustness of the “gender-match” result by including interactions between student gender and each of the teacher characteristics on which female and male teachers differ and find that our estimates are essentially unchanged.

The result above is a “difference-in-difference” estimate that compares the relative advantage of female teachers in teaching girls rather than boys with the relative disadvantage of male teachers in teaching girls rather than boys. However, the overall effectiveness of a teacher is also determined by his or her effectiveness at teaching students of the opposite gender. Our third result speaks to this issue, and we find that female teachers in our setting are more effective overall than male teachers. We find that girls who have a female teacher in a given year have 0.036σ higher annual test score gains than if they had a male teacher. However, boys perform similarly regardless of the gender of their teacher. Thus, girls are likely to benefit from a policy of hiring more female teachers and overall educational performance is likely to increase due to the lack of any offsetting effect on boys.

Fourth, we study the impacts of a teacher-student gender match on student attendance and find no evidence that teachers are more effective at raising the attendance for students of the same gender. This suggests that the likely mechanism for the “matching” effect on test scores is not on the extensive margin of increased student-teacher contact time but rather on the intensive margin of more effective classroom interactions.

Finally, we document that female teachers are more likely to teach in earlier grades. Combined with the results above, we estimate that around 10–20 percent of the trend of increasing gender gaps in test scores over time can be attributed to the reduction in the probability of girls being taught by female teachers as they advance to higher grades. Because teachers in higher grades are more likely to be male across several countries (UNESCO 2010), our results suggest that one possible channel for growing gender gaps in achievement (especially in math) could be the reduced likelihood of having female teachers in higher grades.

Our results suggest that the causal impact of having a female teacher (relative to a male teacher) on the annual learning gains of girls is positive in this setting, with no adverse impact on the learning gains of boys. Because controlling for observable teacher characteristics such as education, training, experience, salary, contractual status, union membership, place of origin, and absence rates (and their interaction with student gender) does not change these results, it *must* be the case that there are other unobservable differences across teacher gender that are driving our results. If we were able to identify an observable teacher characteristic that made the “female teacher” effect insignificant, it would be possible to target hiring on that characteristic in lieu of gender. However, since we are not able to identify such an observable mechanism for our estimated effect, a useful way of interpreting our results is that teacher gender may be a summary statistic for unobserved teacher characteristics (such as empathy, classroom management skills, or role model effects) that are not used in hiring decisions under the status quo.

Although there have been several studies on the impact of shared gender between teachers and students on learning outcomes in high-income country contexts, there is surprisingly little well-identified evidence on this question in developing countries. In the United States and United Kingdom, studies have shown improved test scores, teacher perception, student performance, and engagement of girls when taught by a female teacher in schools, with magnitudes of test score impacts similar to those found in our paper (Dee 2007; Dee 2005; Nixon and Robinson 1999; Ehrenberg, Goldhaber, and Brewer 1995; Ouazad and Page 2012). However, other studies conducted in both the United States and in European countries have failed to find such an effect (Holmund and Sund 2008; Carrington, Tymms, and Merrell 2008; Lahelma 2000; Winters et al. 2013; Marsh, Martin, and Chend 2008; Driessen 2007; Neugebauer, Helbig, and Landmann 2011), and some research even suggests that female teachers may adversely affect girls’ performance in areas where girls face larger stereotypes (Antecol, Eren, and Ozbeklik 2015). In higher education institutions in the United States, female professors have been found to have small effects on female students’ course selection, achievement, and major choice (Bettinger and Long 2005; Carrell, Page, and West 2010; Hoffmann and Oreopoulos 2009).^{2}

The question of the role of female teachers in reducing gender gaps is much more salient in low-income country contexts where gender gaps in school enrollment and attainment are much larger (OECD 2010; Hausmann, Tyson, and Zahidi 2012; Muralidharan and Prakash 2013; Bharadwaj et al. 2012) and where increased recruitment of female teachers has been actively advocated (UNDG 2010). The only related paper in a developing country setting is Rawal and Kingdon (2010), who use test score data on second and fourth grade students in the Indian states of Bihar and Uttar Pradesh and find a positive impact on educational achievement for girls taught by female teachers but find no similar effect for boys. Because the literature from the United States and Europe may not be transferrable to developing countries (given the larger prevalence of gender stereotypes and gender gaps in these settings), our estimates fill an important gap by providing among the first estimates of the impact of teacher-student gender matching in a developing country.

In addition to providing well-identified estimates of the impact of matching teacher and student gender on learning outcomes in a developing country where the literature is very sparse, our data set allows us to make advances relative to both the developed and developing country literatures on this subject. First, while several existing papers in this literature (especially those looking at college-level outcomes) use grades or test scores assigned by the students’ own teachers, the test scores used in this paper are based on independent assessments and grading. This allows us to be confident that the effects we measure reflect genuine impacts on learning by eliminating the concern that the measured effects of gender matching may reflect more generous grading by teachers towards students who share their own gender.

Second, and more important, the majority of papers in the global literature on this question (including Dee 2007 and Rawal and Kingdon 2010) use student fixed effects and variation in the gender of teachers across different subjects to identify the impact of the gender match on learning, but they are based on comparing *levels* of test scores as opposed to *value-added*. Thus, it can be difficult to interpret the magnitudes of the estimated effects without knowing the gender composition of the teachers in that subject in previous grades.^{3} Our use of five years of annual panel data on test scores allows us to estimate the impact of a gender match on the *value-added in the year that the match occurred*, which has a much clearer interpretation relative to the standard in the literature.

Finally, we observe students at a younger and more formative age than most of the literature, when the role of sharing gender with teachers may be especially important. This is also the age that is most relevant to policy for reducing education gender gaps in developing countries because the majority of students do not complete more than eight years of school education. Our estimates, based on a large data set that is representative of the rural public school system in a state with over 80 million people, are also likely to have more external validity across other developing countries than existing work.

## II. Context and Data Set

India has the largest primary schooling system in the world, catering to over 200 million children. As in other developing countries, education policy in India has placed a priority on reducing gender disparities in education, and both the Five Year Plans and Sarva Shiksha Abhiyan (SSA), the flagship national program for universal primary education, have called for an increase in recruiting female teachers as a policy for increasing girls’ education. SSA requires that 50 percent of new teachers recruited be women, and the 11th Five Year Plan suggested that it be increased to 75 percent (Government of India 2008). These calls for increased numbers of female teachers reflect a belief that through such mechanisms as role model effects, increased safety, reduced prejudices, and greater identification and empathy, female teachers are arguably more effective in increasing girls’ achievement in primary school relative to their male counterparts (Ehrenberg, Goldhaber, and Brewer 1995; Stacki 2002; Dee 2005).

This paper uses data from the Indian state of Andhra Pradesh (AP), which is the fifth most populous state in India with a population of over 80 million (70 percent rural).^{4} The data were collected as part of the Andhra Pradesh Randomized Evaluation Studies (AP RESt), a series of experimental studies designed to evaluate the impact of various input and incentive-based interventions on improving education outcomes in AP.^{5} The project collected detailed panel data over five years (covering the school years 2005–2006 to 2009–10) on students, teachers, and households in a representative sample of 500 government-run primary schools (Grades 1–5) across five districts in AP. The data set includes annual student learning outcomes as measured by independently conducted and graded tests in language (Telugu) and math (conducted initially at the start of the 2005–2006 school year as a baseline and subsequently at the end of each school year), basic data on student and teacher demographics, and household socioeconomic data for a subset of households. The assessments were created (based on the pedagogical objectives of the curriculum), administered, and externally graded by an independent agency, ensuring that the tests are valid measures of learning and that the scores are not biased by teacher subjectivity. The test scores are normalized within each year-grade-subject combination and all analysis is conducted in terms of normalized test scores, with magnitudes being reported in standard deviations.

The online appendix^{6} provides further details on the data set, including sample size by cohort (Table A1), testing for changes in characteristics of incoming cohorts by gender over time (Table A2), and differential attrition by gender (Table A3, Panel A). There is some differential attrition in the sample over time by gender (where attrition is defined as the fraction of students who had taken a test at the end of year *n* – 1 but did not take a test at the end of year *n*), with male students more likely to attrite (around 3 percent each year). However, this differential attrition is not a first-order concern for estimating the impact of a gender-match on test score gains because we see that there is no differential attrition by student gender as a function of gender-match during the school year (Table A3, Panel B).^{7}

Table 1, Panel A presents descriptive statistics on students who have at least one recorded test score and data on gender in the data set.^{8} Girls comprise 51 percent of the sample of public school students in our sample. This does not imply that more girls are going to school than boys, as it is likely that more boys are attending private schools (Pratham 2012). However, it does illustrate that, on average, girls are well represented in public primary schools. The girls in the sample come from modestly better-off socioeconomic backgrounds than the boys and have parents who are slightly more educated and affluent. These differences probably reflect two dimensions of selection into the sample—better-off households are more likely to send girls to school and better-off households are more likely to send boys to private schools. However, the magnitudes of these differences are quite small (often less than two percentage points), and the statistical significance reflects the very large sample size. Because the household surveys were completed for only 70 percent of the sample of students for which we have test score data, our main specifications do not include household controls.^{9}

Table 1, Panel B presents summary statistics for the teachers in our analysis. Female teachers comprise 46 percent of the total teacher body but are less experienced, less likely to have completed high school or a master’s degree, and less likely to hold a head teacher position. Not surprisingly, their mean salaries are also lower. They also comprise a much greater share of the contract teacher workforce than that of regular civil service teachers. Because teacher characteristics vary systematically by gender, we will report our key results on the impact of matching teacher and student gender both with and without controls for these additional teacher characteristics. We will also examine the extent to which our main results on the effects of a teacher-student “gender match” on learning outcomes can be explained by these observable differences in teacher characteristics by gender by including interactions of student gender with each of the teacher characteristics that are different across male and female teachers.

Table 2, Panel A presents summary statistics on gender differences in test scores by grade. We see that girls score as well as boys in math and score 0.05σ *higher* on language in Grade 1. However, there is a steady decline in girls’ test scores in both math and language as they advance to higher grades, and by the last two years of primary school (Grades 4 and 5) we see that girls’ initial advantage in language scores has declined and they do significantly worse than boys in math (by around 0.1σ). Table 2, Panel B quantifies the annual decline in girls’ relative scores by including an interaction term between student gender and grade in a standard value-added specification. We find evidence of a growing education gender gap among test takers in public primary schools, with a mean decline of 0.02σ/year in math scores and 0.01σ/year in language scores for girls relative to boys. The results are also robust to including school fixed effects suggesting that these differential trends are present both across and within schools.

One caveat to the interpretation of the above numbers is that they are based on a representative sample of test-taking students in public schools. Relative to the gender gap in the universe of primary age school children, our estimate may be biased downward if higher-scoring boys are differentially more likely to leave public school to attend private schools. Conversely, they may be biased upward if lower-scoring boys are more likely to be absent on the day of testing. We see some evidence of the second concern because girls who are absent from the test have slightly higher previous test scores compared to boys absent from the test (Table A3, Panel B).^{10}

We address this concern by reweighting the estimates in Table 2, Panel B to account for the differential probability of attrition by gender at each value of lagged test scores (we assign each observation a weight that is equal to the inverse of its probability of remaining in the sample). This reweighting is analogous to simulating the missing scores of attritors using their lagged test scores, which is the procedure followed by Bharadwaj, Loken, and Nielson (2013). We present these results in Table 2, Panel C and see that the results from Table 2, Panel B are unchanged, suggesting that any bias from differential attrition by gender and lagged test scores is second order (which is not surprising given the very small magnitude of test score differences among attritors by gender seen in Table A3, Panel B). Of course, even the reweighting only provides us with estimates of gender gaps for the population of students who enter public primary schools (and cannot account for the population of students in private schools), and thus our estimates should be interpreted as relevant for the population of students in public primary schools.^{11}

## III. Estimation and Identification

Our main estimating equation takes the form:

(1)

where *E _{itjk}* are student educational outcomes (test scores and attendance) for student

*i*, in year

*t*, grade

*j*, and school

*k*, respectively.

*F*is an indicator for whether the student’s current teacher is female,

_{itjk}*g*is an indicator for whether the student is a girl,

_{i}*F**

_{itjk}*g*is an indicator for whether a girl student shares her teacher’s gender in the current year, and μ

_{i}_{itjk}is a stochastic error term. The inclusion of the lagged test score on the right-hand side of Equation 1 allows us to estimate the impact of contemporaneous inputs in a standard value-added framework. Because all test scores are normalized by grade and subject, the estimated coefficients can be directly interpreted as the correlation between the covariate and annual gains in normalized test scores.

^{12}When studying attendance, we do not include the lagged attendance of the previous year. We later augment Equation 1 with

*, a vector of additional teacher characteristics, to estimate the robustness of our effects to holding other teacher characteristics constant.*

**T**_{itjk}The above estimating equation allows us to calculate the marginal impact of changing each component of the feasible student-teacher gender combinations relative to boys taught by male teachers (the omitted category).

The first coefficient of interest in this paper is β_{1}, which indicates the extent to which teachers are relatively more effective at teaching to their own gender compared to teachers of the opposite gender. Since the indicator variable is based on the interaction of teacher and student gender, the coefficient is a “difference-in-difference” estimate of the impact of female teachers when teaching girls rather than boys *relative* to their male counterparts teaching girls rather than boys. The coefficient on the interaction term therefore reflects the sum of the relative advantage of female teachers when teaching girls (rather than boys) and the relative disadvantage of male teachers when teaching girls (rather than boys); specifically, β_{1} = *(female teachers teaching girls* – *female teachers teaching boys)* – *(male teacher teaching girls* – *male teachers teaching boys)*.

A more intuitive interpretation is to note that β_{1} represents the relative effectiveness of female teachers (compared to male teachers) in reducing the test score gap between girls and boys. By construction, this is symmetric and equivalent to the relative effectiveness of male teachers teaching boys compared to girls *relative* to female teachers teaching boys compared to girls. It is important to highlight that a positive β_{1} does not necessarily imply that both boys and girls have better outcomes when sharing their teacher’s gender. For example, a positive β_{1} could coexist with a situation where all students are better off with female (or male) teachers because the general effectiveness of female (or male) is considerably higher (even for students of the opposite gender).

β_{2} is the difference in test score gains of girls taught by male teachers relative to boys taught by male teachers; specifically, β_{2} = (*male teachers teaching girls* – *male teachers teaching boys)*. β_{3} is the difference in test score gains of boys taught by female teachers relative to when taught by male teachers; specifically, β_{3} = (*female teachers teaching boys* – *male teachers teaching boys)*. Thus, β_{3} estimates the extent to which boys perform differently when they are taught by a female teacher relative to a male teacher.

Starting with the omitted category (male teachers teaching boys), adding combinations of β_{1}, β_{2}, and β_{3} allow us to measure other marginal effects of interest. Analogous to β_{3} for boys, testing whether β_{1} + β_{3} > 0 provides a formal test of whether girls gain by being paired with female teachers relative to male teachers. The derivation is below:

(2)

As highlighted earlier, it is possible that female teachers are relatively more effective at teaching girls than boys compared to male teachers (a positive β_{1}), but that female teachers are overall less effective (a negative β_{3}), resulting in girls being better off with male teachers despite the loss in gains from not sharing their teacher’s gender (β_{1} + β_{3} < 0).

Additionally, if we value both boys’ and girls’ educational achievement equally, then we would be interested in knowing whether the positive gain for girls taught by female teachers outweighs any adverse effects from mismatching boys to being taught by female teachers (*potential gain to girls* + *potential loss to boys*). The formal test for this is *λ _{g}*β

_{1}+ β

_{3}> 0, where

*λ*is the proportion of girls in schools. The derivation is below:

_{g}(3)

Thus, if the effect of female teachers on boys is negative, but their effect on girls is positive, we would find that β_{3} < 0 and β_{1} + β_{3} > 0. The test outlined in Equation 3 can also be interpreted as the overall effectiveness of female teachers relative to male teachers. Intuitively, the impact of replacing a male teacher in a classroom with a female teacher is equal to the sum of the impact of the female teacher on all students (β_{3}), and the additional gains to female students from matching with a female teacher (β_{1}), weighted by the fraction of female students in the classroom (*λ _{g}*).

### A. Threats to Identification

The main identification challenge in interpreting these coefficients causally is that teachers are not randomly assigned to schools, and it is possible that schools with more female teachers have greater female enrollment and are in areas that value education more and thus have steeper learning trajectories. Thus, it is possible that girls would perform well in these schools regardless of their teacher’s gender. In such a case, the estimate of β_{1} could be confounded by omitted variables correlated with both the probability of having a female teacher and steeper learning trajectories for girls. We address this concern by augmenting Equation 1 with school fixed effects and thereby estimating the impact of a gender match on value-added *relative* to the schools’ average effectiveness at improving value-added.

A further concern could be that teachers are not assigned randomly to grades within schools, and a similar omitted variable concern would apply if female teachers are differentially assigned to grades within schools in which students have higher learning trajectories and there is higher enrollment of girls. To address this concern, we include school-grade fixed effects, which controls for the average performance in a given *grade* in the school (instead of the overall performance of the school).

Finally, it could be the case that female teachers are generally assigned to grades where girls have steeper learning trajectories relative to boys. To account for such differential trajectories of learning in different grades *by student gender*, we also include student gender by grade fixed effects to estimate the parameters of interest by comparing test score gains relative to girls’ and boys’ *average learning trajectories in each grade*. Our preferred specification therefore includes both school-grade fixed effects and student gender by grade fixed effects to address these concerns.^{13}

A final concern is that if grades in a school have multiple sections, then the assignment of teachers to sections within grades could be based on omitted variables such as a greater probability of assigning female teachers to sections with girls who have a steeper learning trajectory. However, this is not an important factor in our setting because schools typically have fewer teachers than grades and the typical teaching arrangement is one of multigrade teaching (where the same teacher simultaneously teaches multiple grades). As a result, there are only a few cases where there are multiple sections per grade with different teachers assigned to different sections. We drop all such cases (6 percent of observations) where there are multiple teachers per grade. Since students are taught math and language by the same teacher in a given year for all our observations, we are unable to use student fixed effects to identify the impact of a “gender match” using variation in the gender of teachers across different subjects.^{14} Note that our identification strategy does not require teacher gender to switch in a given school grade over time, nor does it require teacher gender to switch within a cohort over time (across different grades).^{15} Rather, the inclusion of school grade and student gender by grade fixed effects implies that the identifying variation is coming from the differential effectiveness of teachers (by gender) at teaching girls versus boys relative to (a) the mean value-added experienced by students in that school and grade over the five years of data, and (b) the mean value-added for girls relative to boys in that grade across all schools in the sample.

### B. Testing the Identifying Assumptions

Table 3 shows the correlation between various classroom characteristics and the probability of the classroom having a female teacher. We see that there is no significant correlation between having a female teacher and the fraction of girls in the classroom or with the average test scores of incoming cohorts for either gender. However, female teachers *are* more likely to be assigned to younger grades. This is why our preferred specifications include school-grade fixed effects. Upon the inclusion of school-grade fixed effects, it continues to be the case that there is no significant correlation between having a female teacher in the class and either the fraction of female students or the test scores of the incoming cohort (Columns 5 and 6).

However, we see in Table 4 that girls do have a slightly more concave learning trajectory than boys. We estimate a standard value-added model that controls for lagged test scores (as in Equation 1) but allows for an interaction between student gender and grade and find that female students have lower value-added in higher grades. Because female teachers are more likely to be assigned to lower grades, the inclusion of school-grade fixed effects alone (the average test score gain in a grade within a school over the five years across both student genders) does not address the possible spurious correlation from female teachers being more likely to be assigned to grades where female students fall behind boys at a lower rate. Therefore, the inclusion of grade fixed effects by student gender in our main specifications is necessary to control for average value-added test scores in each grade *by student gender*. Thus, the parameters of interest in Equation 1 are identified relative to the average learning trajectory for girls in the same grade (student gender by grade fixed effects) and relative to the average learning trajectory in the same school for that grade (school-grade fixed effects).

## IV. Results

### A. Test Score Impacts of Matching Teacher and Student Gender

The main results of the paper (Equation 1) are presented in Table 5, which pools student test scores across subjects (results separated by subject are in Table 7). The columns show increasingly restrictive identification assumptions with school fixed effects (Column 2), school-grade fixed effects (Column 3), and both of these with student gender by grade fixed effects (Columns 4 and 5). Column 6 expands the preferred specification in Column 5 to include teacher covariates to test the extent to which *average* female teacher effects can be explained by other observable characteristics that can be used in teacher selection. Thus, the estimates in Column 5 are relevant to the policy question: “What will happen if we replace a male teacher with a female teacher whose characteristics are the same as those of the average female teacher?” On the other hand, the estimates in Column 6 answer the question: “What will happen if we just switch a teacher’s gender from male to female holding other commonly observed teacher recruitment characteristics constant?”

Of course, switching the gender in this latter case does still include all unobservable characteristics correlated with being a male versus female teacher. Though we cannot separately identify which of these unobservable characteristics are driving differences in average teacher effectiveness, the differences between Columns 5 and 6 help to clarify whether the female teacher effect in Column 5 can be explained by observable differences across male and female teachers, or whether it represents unobservable characteristics of teacher effectiveness that current recruitment practices do not use. While our main results are remarkably stable and robust under the various specifications, our discussion below will use the estimates in Columns 5 and 6, unless mentioned otherwise.^{16}

Averaged across subjects, we see that teachers are 0.034σ/year more effective in teaching to their own gender relative to a student of the opposite gender compared to teachers of the other gender. In other words, female teachers are 0.034σ/year more effective in reducing the gender gap between girls and boys relative to male teachers. We find no negative effect on boys from being taught by female teachers relative to male teachers (β_{3} is close to zero). We estimate that girls gain an extra 0.036σ/year when taught by female teachers instead of male teachers (β_{1} + β_{3}) and that there would be no loss to the boys in the classroom. However, the net increase in annual test score gains from replacing a male teacher with a female one (*λ _{g}* * β

_{1}+ β

_{3}), which we estimate as 0.019σ/year, is not significant. Thus, while replacing a male teacher with a female one on the margin is likely to benefit girls with no cost to boys, the magnitude of the positive effect on girls is small enough that the overall gain in test scores is not significant.

^{17}

Because female teachers systematically differ from their male counterparts in commonly observed characteristics used in teacher selection (Table 1, Panel B), we examine the extent to which the shared gender effects estimated in Equation 1 can be attributed to female students being differentially affected by characteristics that are more commonly found in female teachers. Table 6 shows a series of regressions where we follow the specification in Equation 1 but include teacher characteristics and the interaction of this characteristic with student gender. These include teacher demographic characteristics that may be correlated with teaching effectiveness (such as education, training, contractual status, seniority, and salary), as well as teaching conditions (multigrade teaching) and measures of teacher effort (absence). Doing so allows us to examine the extent to which we can attribute the mechanism for the positive β_{1} found in Table 5 to observable teacher characteristics (that differ across male and female teachers) differentially affecting female students.

Panel A of Table 6 reports the key results with only the specified teacher characteristic and does not control for other teacher characteristics, while Panel B includes all the other teacher characteristics as controls. The estimates of β_{1} are remarkably robust to including the student interactions with teacher characteristics that vary by teacher gender.^{18} In all cases, the estimate of the gain to a female student from switching to a female teacher (β_{1} + β_{3}) is positive and significant (ranging from 0.027 to 0.04 σ/year), and we continue to find no negative effects on boys (we never reject β_{3} = 0). When we include all teacher characteristics *and each of their interactions with student gender* (Panel B, Column 11), our estimate of β_{1} falls slightly but is not statistically different from previous estimates and continues to be statistically significant. This suggests that the effects we find for female teachers closing the gender gap, though small, cannot be readily explained by other characteristics that could be used in teacher selection.

Thus, while we are unable to identify the specific mechanism behind the positive effect of female teacher on girls’ learning gains, our results suggest that other characteristics used in teacher recruitment (and that systematically differ between male and female teachers) are not able to account for this effect. So if the goal of a policymaker is to reduce gender gaps in learning outcomes, our results suggest that teacher gender, even after holding observable characteristics constant, may be a useful summary statistic for other unobserved factors not used for recruiting teachers (such as empathy and role model effects) that contribute to better learning for girls.

### B. Results by Subject

Table 7 breaks down the results by subject (Panels A and B) and also conducts formal tests of equality across subjects for the key parameters of interest (Panel C). Overall, we see that the patterns observed in Table 5 are seen consistently for both subjects with no significant difference between them. There is a positive gender-match effect in both subjects (β_{1} > 0) and the difference is not significant. Similarly, boys do no worse with female teachers and we cannot reject β_{3} = 0 for either subject.

### C. Robustness to an Alternative Specification

An alternative approach to our preferred specification is to identify the shared gender effect relative to each *teacher’s* mean performance, which can be done by augmenting Equation 1 to include teacher fixed effects. This specification can be considered as an extension of Table 5, Column 6, in that it controls for *all* time-invariant teacher characteristics, including those that we cannot measure and include as controls (as we do in Table 5, Column 6). We see in Table 8 that the estimates of β_{1} from this specification are unchanged relative to those in Tables 5–7.

The unchanged estimates of β_{1} when we include teacher fixed effects provides further confirmation of the stability of our core result regarding the significant role of female teachers in bridging test score gender gaps in primary school. This specification confirms that estimates of β_{1} are not driven by unobserved characteristics correlated with female teachers that increase *overall* student achievement. However, we cannot estimate β_{3} in this specification, which is essential to be able to estimate the policy impact of hiring more female teachers.^{19} Because a social planner would care about both increasing overall test scores as well as reducing the gender gap, we focus our discussions on our default specifications shown in Table 5 and present the results in Table 8 as a further robustness check on the matching result.^{20}

### D. Attendance

We also study the impact of a teacher-student gender match on student attendance using high-quality data on student attendance and measured using unannounced visits to schools (as opposed to using administrative data that is less reliable). We find no significant effect of a gender match on student attendance (Table 9). We do find that female teachers are slightly more effective at increasing attendance overall (by approximately 0.6 percent), but there is no differential impact by student gender. This result is interesting because the rhetoric of hiring female teachers is often based on the belief that having female teachers increases the safety and comfort of girls in school and that their presence therefore encourages girls to attend school. Our results suggest, however, that the mechanism for the positive impact of a gender match on test scores is less likely to be due to effects on the extensive margin of increased school participation but more due to intensive margin increases in the effectiveness of classroom interactions between teachers and students.

This result could be reflecting the scenario where total primary school enrollment for both boys and girls is over 98 percent (Pratham 2012), and so the role of female teachers in increasing attendance of female students may be more limited in such a setting. Additionally, we observe attendance conditional upon enrollment rather than effects on enrollment into the school itself. Nevertheless, our results suggest that even after achieving gender parity in school enrollment there may be continued benefits to a policy of preferred hiring of female teachers due to their effectiveness in reducing gender gaps in test scores.

### E. Contribution of Fewer Female Teachers in Higher Grades to Growth in the Gender Gap

Finally, we calculate what proportion of the growing gender gap calculated in Table 2 can be attributed to girls being less likely to have a female teacher as they advance through primary school. Regressing the probability of a female teacher on the grade taught (with school fixed effects), we find that there is a four percentage point reduction in the probability of a student having a female teacher at each higher grade. Multiplying the reduced probability of a female teacher by the cost to girls of not having a female teacher in a given year (β_{1} + β_{3}), and dividing this by the total annual increase in the test score gender gap (estimated in Table 2), we estimate that the reduced likelihood of female teachers in higher grades accounts for 9 percent of the annual growth in the gender gap in math and 21 percent in language. The fraction of the growing gender gap in language that is accounted for by this channel is higher than in math because the absolute magnitude of the annual growth in the gender gap is lower in language. Using estimates without school fixed effects, these figures would be 8 percent and 15 percent, respectively (because the overall trend in the gender gap is slightly larger without school fixed effects—see Table 2).

## V. Conclusion

We study gender gaps in primary school learning outcomes in a low-income setting using one of the richest data sets on primary education in a developing country. We find that at the start of primary school, girls in rural public schools have a slight advantage in the local language (approximately 0.05σ) and are at par in math with the boys in the same schools. However, girls lose this advantage in both language (by 0.01σ/year) and in math (by 0.02σ/year) as they progress through the schooling system.

While these trends likely reflect a broad set of household, school, and social factors, one specific school-level policy that has been posited as a promising channel for mitigating gender gaps is the greater use of female teachers in low-income settings. Though this policy has been widely recommended and adopted, there has been very little well-identified evidence to support the claim. In this paper, we present some of the first well-identified empirical tests of this hypothesis in a developing country setting using an extremely rich data set that is representative of the rural public primary school system in the Indian state of Andhra Pradesh.

Our results suggest that female (and male) teachers are relatively more effective when teaching to their own gender, that learning for girls increases when they are taught by female teachers relative to male teachers, and that boys do not suffer adverse effects when taught by female teachers relative to male teachers even when controlling for additional observable teacher characteristics.

Our results are similar to other studies that find positive effects on achievement via test scores in both high- and low-income countries. Our pattern on gender matching and the magnitudes of our effects are very similar to those found in Dee (2007) in the United States and by Rawal and Kingdon (2010) in India. Both these studies find positive effects for girls and no adverse effects for boys, with the shared female effect ranging from 0.03–0.06 standard deviations.

While we find suggestive evidence that the mechanism of impact is through more effective classroom interactions (as opposed to increased teacher-student contact time), our data do not allow us to explore the further granularity of the specific mechanisms through which shared gender may influence learning (such as role model effects, greater empathy, and closer identification between teachers and students of the same gender). We find no evidence that characteristics thought to be correlated with teaching effectiveness (education or training), service conditions (salary, multigrade teaching), or teacher effort (absence) differentially affect girls and thus help explain our gender matching effects. Thus, even though we are unable to identify the mechanism for *why* female teachers differentially affect girls, we are at least able to show that the effect is not explained by differences (by teacher gender) in other common teacher recruitment characteristics. In other words, being female appears to be correlated with other unobservable characteristics that are correlated with classroom effectiveness but that are not currently the basis for teacher recruitment.

From a policy perspective, our estimates suggest that expanding the hiring of female teachers—both at the margin of the current patterns of hiring (assuming that the marginal female teacher hired has the same characteristics as the average female teacher) and also when holding other typical recruitment characteristics constant—may be a useful tool for bridging gender gaps in learning levels and trajectories in primary schools, at no cost to boys. However, this result may not hold beyond primary school because the unobservable characteristics that are correlated with being female and teaching effectiveness may not be equally salient in higher grades.^{21} Further decomposing the reduced form effects of gender matching and having a female teacher could help in crafting more nuanced policies to more efficiently bridge gender gaps in schooling in developing countries.

## Footnotes

Karthik Muralidharan is an associate professor in the department of economics at the University of California, San Diego.

Ketki Sheth is an assistant professor in economics at the University of California, Merced. The authors thank Prashant Bharadwaj, Julie Cullen, Gordon Dahl, Craig McIntosh, two anonymous referees, and several seminar participants for comments. The authors also thank the AP RESt team and the Azim Premji Foundation for collecting the data used in this paper and thank Venkatesh Sundararaman for the overall support provided to the AP RESt project. Financial assistance for the data collection was provided by the Government of Andhra Pradesh, the U.K. Department for International Development (DFID), the Azim Premji Foundation, and The World Bank. The findings, interpretations, and conclusions expressed in this paper are those of the authors and do not necessarily represent the views of any of the organizations that supported the data collection. The data used in this article can be obtained beginning July 2016 through June 2019 from Karthik Muralidharan, 9500 Gilman Drive, San Diego CA, kamurali{at}ucsd.edu.

Supplementary materials are freely available online at: http://http://uwpress.wisc.edu/journals/journals/jhr-supplementary.html

↵1. Note that these estimates may be biased due to differential migration to private schools and differential absence on the day of the test by student gender. However, as we discuss later, these estimates are robust to bounding by reweighting to account for this concern.

↵2. Analogous to gender, studies in the United States have also looked at the effect of sharing the ethnicity of a teacher and have generally found positive effects on such educational outcomes as drop outs, pass rates, and grades at the community college level, and teacher perceptions and student achievement in school-going children (Dee 2004, 2005; Fairlie, Hoffman, and Oreopoulos 2011). We do not focus on caste and religion because the fraction of teachers and students in the relevant categories are small (typically less than 20 percent) and as a result the fraction of “matches” are usually less than 5 percent (and often much smaller), which makes the estimates less stable to the series of robustness checks that we use in this paper to ensure that the estimates of the “match” are well-identified.

↵3. Thus, if this approach finds that a girl in eighth grade who has a female language teacher and a male math teacher does better in language, the interpretation of the point estimate is confounded by the possibility that the girl is also more likely to have had female language teachers in earlier grades (especially if teacher gender is correlated with subjects taught across grades, which is likely to be true).

↵4. The original state of AP was divided into two states on June 2, 2014. Because this division took place after our study, we use the term AP to refer to the original undivided state.

↵5. These interventions are described in Muralidharan and Sundararaman (2011).

↵6. The appendix can be found online at http://jhr.uwpress.org/.

↵7. This is exactly analogous to thinking about the impact of attrition in randomized experiments, where treatment effects are typically not biased by attrition, as long as there is no differential attrition by treatment status (which in our case is the matching indicator).

↵8. Fewer than 3 percent of students with test scores have no recorded gender.

↵9. While there are a few observable differences between the boys and girls in the sample, including these in the estimation will only matter if there are differential interactions between these household characteristics and teacher gender across boys and girls. We verify that our results are robust to the inclusion of household characteristics (see Table A4) but prefer to not include household characteristics in our main estimating equations because doing so reduces the sample size by 30 percent and it is possible that the remaining sample may have some nonrandom attrition.

↵10. Note that this could also mean that girls with lower test scores are slightly more likely to stay in the sample, whereas their male counterparts are more likely to be absent from testing.

↵11. It is worth noting that the entire literature on gender gaps in test scores is based on samples of students who are tested in schools, and we know of no annual student-level panel data set on test scores (in any country) that can account for differential sorting into private schools and differential rates of dropping out or attendance. The only way to do this would be to have a household panel on test scores using a representative sample of households—and no such data exists to the best of our knowledge. Thus, our reweighted estimates (that account for differential attrition probability in our panel data) are likely to be the most reliable estimates of the evolution of gender gaps in test scores over grades in developing countries.

↵12. In the case of first grade where there is no lagged score (because there was no testing prior to enrolling in school), we set the normalized lagged score to zero. Our results on the impact of “gender matching” on test score gains are unchanged if we drop first grade from the analysis.

↵13. Because the data are drawn from schools that were exposed to various experimentally assigned programs, all estimates include dummy variables indicating the treatments assigned to the school. This turns out to not matter in practice because our main specifications of interest use school fixed effects, which makes the treatment status of the school irrelevant for identification purposes.

↵14. As we note earlier, this is the approach used in most of the existing studies in this literature. However, an important weakness of this approach is that it is based on levels and not value-added, which makes the estimates difficult to interpret without knowing the gender of teachers for each subject in earlier grades.

↵15. We avoid using a student fixed effects estimate because the identifying variation in a specification with student fixed effects would come from changes in teacher gender in different grades. However, girls having higher value-added in lower grades and the probability of female teachers being assigned to lower grades would create an upward bias in the “matching estimate.”

↵16. All coefficients and tests continue to be of similar statistical significance under specifications with standard errors clustered at the school or teacher level.

↵17. In other words, since the fraction of girls in the sample is roughly half (

*λ*= 0.51), the positive effect of female teachers on girls is not large enough for half of this effect to be significant._{g}↵18. In the interest of space, we only show these results for the subset of characteristics that are significantly different across teacher gender (see Table 1, Panel B). The estimate of β

_{1}is unchanged and significant for interactions with all other teacher characteristics in Table 1, Panel B (such as religion and caste) as well.↵19. A positive and significant β

_{1}is possible even if female teachers reduced gender gaps by being less effective at teaching boys while being no more effective at teaching girls than male teachers (or even less so). Thus, it is essential to use β_{3}in conjunction with β_{1}to estimate the overall impact of hiring more female teachers.↵20. Note that identification concerns regarding β

_{3}are also addressed by the results in Table 3, where we see no significant difference in the initial test scores of students assigned to a female teacher in any of the six columns. Also, our default specification uses school-grade fixed effects, mitigating concerns of omitted variables correlated with teacher gender and student test score gains both across and within schools across grades. Table 8 provides further suggestive evidence that the estimates of β_{1}, β_{2}, and β_{3}in Tables 5 and 6 are unbiased, because β_{1}and β_{2}are unchanged when we include teacher fixed effects.↵21. We do not find higher effects of female teachers on female students in earlier grades, suggesting that at least within primary school there is no evidence of a declining “gender match” or “female teacher” effect from the first to the fifth grade. Nevertheless, we cannot extrapolate these results beyond primary school.

- Received August 2013.
- Accepted December 2014.