Abstract
This study examines gender peer effects on students’ academic and noncognitive outcomes. We use a nationally representative survey of middle school students in China and focus on schools that randomly assign students to classrooms. Our findings show that having a higher proportion of female peers in class improves students’ test scores and noncognitive outcomes, which include their social acclimation and general satisfaction in school. A further decomposition of channels suggests that teacher behavior, greater student effort, and the improved classroom environment are the primary channels through which peers’ gender influences student outcomes.
I. Introduction
We investigate how classroom gender composition affects students’ academic and noncognitive outcomes. Researchers and policymakers have long believed that peer effects—for example, gender, race, ability, and social background— play important roles in determining student outcomes (for example, Sacerdote 2001; Zimmerman 2003; Angrist and Lang 2004; Arcidiacono and Nicholson 2005; Ammermueller and Pischke 2009; Carrell, Fullerton, and West 2009; Gould, Lavy and Paserman 2009).1 Understanding the interaction of gender in the educational production function is particularly relevant for the optimal grouping of students in schools and classrooms and may shed light on the debate concerning single-sex and coeducational schools.
Along this line, previous studies have emphasized the influence of the presence of girls on peers’ academic outcomes (for example, Hoxby 2000; Whitmore 2005; Lavy and Schlosser 2011; Black, Devereux, and Salvanes 2013; Hu 2015). However, little is known about how peers influence other students’ noncognitive outcomes. We attempt to fill this gap in the literature by using unique data on individual students’ mental stress, social acclimation, and general satisfaction in school. These outcomes are valuable not only because they provide a more comprehensive view of students’ development through schooling, but also as good predictors of their long-run well-being. Since Jencks et al. (1979), studies have extensively documented the importance of noncognitive skills in explaining long-term significant life outcomes and labor market success (Heckman and Rubinstein 2001; Heckman, Pinto, and Savelyev 2013; Bertrand and Pan 2013). We therefore aim to expand the boundaries of student outcomes and explicitly consider students’ noncognitive skills as an output of the educational production process.
Another contribution of our work to the literature lies in our decomposition of the mechanism. Lavy and Schlosser (2011) examine several channels through which peers influence student learning that go beyond the focus on the gender dimension and suggest that a further important step is to quantify the relative weight of each channel. We exploit rich questionnaires from a nationally representative survey of Chinese middle schools that includes student and teacher behaviors and classroom environments. We use a method following Heckman, Pinto, and Savelyev (2013) and Gelbach (2016) to quantify the importance of each channel.
A common challenge in uncovering peer effects in school is the nonrandom grouping of students. Students with similar backgrounds or characteristics tend to associate with one another, and peer groups tend to be self-selected.2 For our research question, if there are unobserved characteristics of students that are associated with both gender composition in the classroom and students’ outcomes, the estimation of gender peer effects would be biased. To address this identification problem, researchers often exploit crosscohort variation (Hoxby 2000; Gould, Lavy, and Paserman 2009; Lavy and Schlosser 2011; Black, Devereux, and Salvanes 2013) or use random assignment (Sacerdote 2001; Zimmerman 2003; Carrell, Fullerton, and West 2009; Kremer, Duflo, and Dupas 2011; Chetty et al. 2011; Shue 2013; Hu 2015). Here, we rely on unique information on classroom assignments, which were obtained from the survey questionnaire, and focus on middle schools in which students are randomly assigned to classrooms.
We use the China Education Panel Survey 2014 (CEPS 2014), which is a nationally representative survey of middle school students and teachers, to estimate how peers’ gender composition, as measured by the proportion of female classmates, affects students’ academic and noncognitive outcomes. We restrict the sample to schools that randomly place students in classrooms. Students in our refined sample cannot self-select into classrooms, and those assigned to the same classroom stay together for learning and extracurricular activities throughout the three years of middle school. A balancing test and robustness checks further confirm randomized assignment. The main outcome variables include students’ test scores, obtained from school administrators, and non-cognitive outcomes obtained from their survey responses regarding mental stress, social acclimation, and general satisfaction in school.
We find that having a higher proportion of female peers in the classroom positively affects students’ academic and noncognitive outcomes. Specifically, a ten percentage point increase in the proportion of female classmates raises students’ test scores by 10.2 percent of a standard deviation and improves their social acclimation and satisfaction in school by 7.7 percent of a standard deviation. These results are robust after controlling for student and teacher characteristics. We also find some heterogeneity of gender peer effects. For instance, the positive effect on test scores is stronger among male students, or when the teacher is male.
By further exploring the mechanisms behind the benefits of having female peers, we find support for four channels: a more interactive teaching style, more time allocated to teaching-related tasks, an improved classroom environment, and greater effort exerted by students in learning. In particular, when there are more female students in class, teachers behave differently—they tend to introduce more discussions with and among students, allocate more time to teaching and grading, and be more patient with and responsible for their students. Students also report that the environment is friendlier and more satisfying and that they devote more hours to homework and tutorials. These changes associated with gender composition may be attributed to the observed benefits for students’ learning and noncognitive outcomes. We do not find strong support for ability-based spillover from female students.
In the study closest to ours, Lavy and Schlosser (2011) find positive gender peer effects on cognitive outcomes and examine students’ behavioral outcomes as mechanisms. Our study also demonstrates a significant benefit for test scores, but we treat noncognitive outcomes as an output of the educational production function and as important measures of student development. Accordingly, we include a richer set of noncognitive measures, such as mental stress and satisfaction in school, which are excluded from Lavy and Schlosser. Another difference is that in examining the mechanisms, Lavy and Schlosser find that gender composition in the classroom changes students’ perceived classroom environment and interstudent and teacher–student relations, but not students’ or teachers’ own behaviors. Instead, our analysis shows that teacher and student behavior varies with gender composition in the classroom. In addition, they are unable to identify the relative weight of each mechanism and instead emphasize the importance of further studies to “distinguish between peer effects that result from changes in individual behavior and peer effects that result from externalities on the classroom environment” (Lavy and Schlosser 2011, p. 32). By exploiting rich questions from the survey, we find evidence for both individual behaviors and the classroom environment and further decompose the weight of each channel.
More broadly, our results contribute to the understanding of peer effects in the educational production function. In this literature, the definition of “peers” varies by context, including peer cohorts within the same school (Angrist and Lang 2004; Arcidiacono and Nicholson 2005; Ammermueller and Pischke 2009; Gould, Lavy, and Paserman 2009); roommates in college dormitories (Sacerdote 2001; Zimmerman 2003); and peer groups in military academies (Carrell, Fullerton, and West 2009; Lyle 2007). Peer effects have been found in several dimensions. Along the ability dimension, several studies have found that peers’ abilities have positive effects on student achievement (see, for example, Sacerdote 2001; Zimmerman 2003; Ammermueller and Pischke 2009). Along the racial and social background dimensions, Angrist and Lang (2004) evaluate the effects of the METCO Program, which assigned minority students to schools in affluent suburbs of Boston, and find modest and short-lived peer effects. Gould, Lavy, and Paserman (2009) show that the overall presence of immigrants in a grade adversely affects students’ academic achievement.
It is important to note that peer effects could be related to context and culture. In the Chinese context, students usually spend significant amounts of time with the same set of peers, that is, their classmates. They follow the same class schedules and take all lectures together. Moreover, peers participate as a group in study sessions, extracurricular activities, and field trips. Therefore, peer effects might be particularly pronounced in Chinese schools. Using Chinese school settings, Lu and Anderson (2014) and Hu (2015) also find a positive effect of having female peers on test scores, while Ding and Lehrer (2007), Park et al. (2015), and Ma and Shi (2014) show significant benefits for academic achievement of having high-ability peers. Our study, by using representative Chinese survey data, confirms the gains in test scores and further documents new findings in the domain of noncognitive outcomes and also investigates potential channels.
In particular, our findings regarding the mechanisms—classroom environment, teacher behavior, and student effort—echo prior studies that also combine test scores and survey data to investigate how peer effects operate. For instance, Stinebrickner and Stinebrickner (2006) use administrative and survey data from Berea College and find that peers’ actions and beliefs may change a student’s effort in studying, as well as their beliefs and time use. Booij, Leuven, and Oosterbeek (2017) manipulate the composition of undergraduate tutorial groups and find improved academic achievement among low- and medium-ability students by switching from ability-mixing to tracking groups. Feld and Zölitz (2017) use randomly assigned university class sections and find that low-achieving students are harmed by high-achieving peers. Both of these studies highlight the channel of student interactions and involvement in the classroom, through which peer ability influences student outcomes.
Our findings add another piece of evidence that peer interaction and student effort are important avenues by which peer effects operate in classrooms, and we demonstrate that teachers’ behavior also changes with students’ gender composition and has great explanatory power for student outcomes. Understanding and decomposing the mechanisms can shed light on practical and affordable opportunities to improve student outcomes without actually changing the fraction of female students. For example, when there are fewer female students than desired, instructors may consider behaving more patiently and responsibly in their interactions with students. School administrators may also consider other ways to make the classroom friendlier (for example, encouraging group activities within and after classes) and boost student motivation (for example, strengthen incentives) to achieve benefits similar to those gained by having more female peers.
II. Data and Variables
Our main data source is the 2014 China Education Panel Survey (CEPS). This is a nationally representative survey that includes middle schools from 28 counties and city districts and collects rich information from students, teachers, parents, and school principals using questionnaires.3
We exploit a novel question on the survey that asks school principals and teachers how students are assigned to classes, and we restrict our estimation sample to schools that randomly assign students. The refined sample includes 8,988 students across 208 classrooms in 67 schools.4 Table 1 presents summary statistics for our main variables: students’ academic and noncognitive outcomes, their own and their peers’ gender, and basic demographics.
Summary Statistics
We measure peers’ gender by the proportion of female peers in the same class. Typically, in middle schools in China, students are assigned to classes at the beginning of the seventh grade and take the same courses throughout their three years in middle school. Peers in the same class interact extensively for both academic and nonacademic purposes. During a regular school day, students remain in the same classroom all day, and teachers come to the classroom to deliver lectures in each subject. They also participate in a variety of exercises and activities together, such as study sessions, sports events, and field trips. As shown in Table 1, approximately 49 percent of the students in the sample are female; not surprisingly, this is also the average proportion of female peers that a given student has. Online Appendix Figures 1A and 1B plot the original and conditional distribution of the proportion of females, respectively, and suggest a sufficient variation of gender composition across classrooms.5
Students’ academic performance is measured by their test scores (provided by schools’ administrative offices) in three core courses: Chinese, mathematics, and English. These subjects are compulsory for all middle school students and are the main components of the high school entrance examination (zhongkao). Within a school, all teachers of a given course use a similar syllabus and give the same exams during a common testing period.6 Therefore, test scores in the core courses are consistent and reasonable measures of students’ academic achievement for students in the same grade in the same school. In addition, we supplement the test scores with students’ selfassessed performance scores. Specifically, they are asked to report whether they have difficulties in learning each subject by using a scale of 1 (a lot) to 4 (not at all). As shown in Table 1, the sample mean is 81.2 for students’ test scores and 2.47 for students’ selfassessment scores. It is worth noting that both measures have large standard deviations, which suggests wide dispersion among students. In our regression analyses, to facilitate interpretation, we normalize scores within each subject–grade–school level to obtain a mean of zero and a standard deviation of one.
Measures of noncognitive outcomes are obtained from students’ responses to eight survey items. Four questions ask about their mental stress. Students are asked to report the frequency, during the previous seven days, of four feelings on a scale from 1 (never) to 5 (always): (i) depressed, (ii) blue, (iii) unhappy, and (iv) life is meaningless. Two questions ask about their general satisfaction in school. Students are asked to rate how much they agree with the following statements on a scale from 1 (strongly agree) to 4 (strongly disagree): (v) “School life is boring,” and (vi) “I feel confident about my future.” Finally, two questions ask about their social acclimation—that is, how frequently they participate in various activities—on a scale from 1 (never) to 6 (always): (vii) going to museums, zoos, or science parks with classmates from school and (viii) going to movies, plays, or sporting events with classmates from school.7
We follow the Autor, Levy, and Murnane (2003) aggregation method to obtain the overall effect of peers’ gender on students’ noncognitive outcomes. Specifically, we first conduct a principal component analysis (PCA) to classify the eight survey items into two categories: (i) the level of mental stress and (ii) the level of social acclimation and satisfaction in school. Next, we create an overall index for each category. This aggregation improves statistical power to detect effects that are consistent across specific outcomes while each individual outcome also has idiosyncratic variation. We estimate the overall effect in the main analysis and report the effects on each individual non-cognitive measure in the Online Appendix. To facilitate interpretation, we normalize each index to obtain a mean of zero and a standard deviation of one.
The data also contain a rich set of students’ predetermined characteristics, such as age, ethnicity, local residency status, whether they are an only child, whether they attended preschool, whether they skipped a grade in primary school, whether they repeated a grade in primary school, their parents’ education, and their (baseline) noncognitive measure during primary school.8 We use these predetermined characteristics to conduct a balancing test and also include them in our regressions as further controls.
III. Estimation Strategy
To investigate how gender composition affects student outcomes, we implement the linear-in-means model, which has been widely adopted in the literature (for example, Sacerdote 2001; Guryan, Kroft, and Notowidigdo 2009). Specifically, we use the following regression model:
(1)where Yics are the measures of academic and noncognitive outcomes for student i in class c of school s; Peerfem–ics is the proportion of females in student i’s class, excluding student i; Femaleics indicates whether the student i is female;
includes student i’s predetermined characteristics and teacher controls;
are peers’ ability controls, including baseline academic ability for male and female peers separately;9 λsg is the school–grade fixed effect; and εics is the error term. We cluster standard errors at the class level, accounting for correlation in outcomes for students in the same class.
The coefficient of interest is β1, which captures gender peer effects on students’ academic and noncognitive outcomes. An unbiased estimation requires that conditional on all of the controls in the equation (student and teacher characteristics, peers’ ability, school–grade fixed effects), our regressor of interest, Peerfem–ics (proportion of female peers), is uncorrelated with the error term, εics. A possible threat to the identifying assumption is that students may select into classes through unobservable factors, and therefore β1 may reflect the sorting of students with certain characteristics rather than the effect of peers’ gender.
To address this concern, we focus on schools that randomly assign students to classes, in the spirit of Sacerdote (2001); Zimmerman (2003); Carrell, Fullerton, and West (2009); and Shue (2013). Gong, Lu, and Song (2018) use the same research setting to investigate the effect of teacher’s gender on student outcomes. In the next two subsections, we provide institutional information regarding how students are assigned to classes and perform validity checks on random assignment. Another concern is endogenous school choice. While random class assignment is conducted within schools, students’ school choices may not be random. To address possible nonrandom matching between students and schools, we include school–grade fixed effects λsg in all specifications; therefore, the identification comes from within each school–grade unit and across randomly assigned classrooms. Lastly, students’ and teachers’ predetermined characteristics, , and peer’s ability controls,
, further improve the balance between classrooms and our estimation efficiency.
A. Class Assignment and Estimation Sample
Our key research question concerns the effect of peers’ gender on student outcomes. Understanding how students are assigned to classrooms is vital to our estimation and analysis. Middle schools in China use different strategies to assign students. Some schools administer placement exams prior to enrollment and use students’ scores and/or rankings to assign them to classrooms. Others assign students on the basis of whether they are local residents or migrants, and a third type assigns students according to the primary schools they attended.
More recently, an increasing number of primary and secondary schools have begun to use random assignment to classrooms. This approach is heavily encouraged by the Ministry of Education and local governments to ensure equal and fair opportunities for all students during their compulsory education years. Schools that adopt random assignment typically rely on a computer program to implement the randomization. Alternatively, when the enrollment size is relatively small and manageable, parents of incoming students are invited to draw lots to determine their child’s class placement. After students are assigned to classes, teachers draw lots to determine which classes they will teach and manage.
The CEPS asks school principals and teachers about class assignment, which allows us to identify and focus on schools in which students are randomly assigned to classrooms. In particular, we restrict the estimation sample to schools that satisfy three conditions. First, the school principal reports that students are randomly assigned to classrooms. Second, once class assignment is determined prior to the beginning of the seventh grade, the school does not rearrange classrooms for the following three years. Third, all head teachers in a grade report that students in their grade are not assigned by test scores.10 Based on these criteria, our refined sample contains 67 schools, 208 classrooms, and 8,988 students, which accounts for approximately 59.8 percent of the original CEPS sample.11
To the extent that students in our estimation sample are all randomly assigned to classrooms and remain with the same peers for the next three years, our sample should mitigate any potential concerns regarding self-selection of students to classrooms and/or peers. Nevertheless, we provide further validity checks in the next section.
B. Verifying Random Class Assignment
To confirm that students in our sample are randomly assigned to classrooms, we conduct a balancing test among students with varying proportions of female peers. If class assignment is indeed random, students who have different proportions of female peers should be similar in terms of their observed characteristics. We regress student’s predetermined characteristics—gender, age, minority, local residence, only child, whether attended preschool, whether repeated or skipped a grade in primary school, baseline noncognitive measures during primary school, and parents’ education—on the proportion of female peers in their classroom.12
Table 2 presents results of the balancing test. Column 1 reports the unconditional estimates, and Column 4 reports the conditional estimates with school–grade fixed effects. The unconditional estimates show that some student predetermined characteristics, such as whether they repeated a grade in primary school and their parents’ education, vary with the proportion of female peers. Yet most of the differences become much smaller and statistically insignificant after we control for school fixed effects in Column 4. The only exceptions are being an only child and predetermined noncognitive measure, but the magnitudes of the differences are very small. For example, while the estimate on only child is significant, the coefficient implies that a ten percentage point increase in the fraction of female peers is associated with only a 2.8 percentage point increase in the likelihood that the focal student is the only child in the family.
Balancing Test for Predetermined Characteristics
It is worth noting that the balancing test for student i’s own gender may encounter a potential bias caused by sampling peers without replacement. Because a student cannot be assigned to herself, the sampling of peers is conducted without replacement. In our setting, an immediate problem is that the peers of a female student are chosen from a group with fewer females than the peers of a male student from the same class. Guryan, Kroft, and Notowidigdo (2009) discuss this issue, and we follow their proposed solution to further control for the mean of the sampling pool, that is, the proportion of female peers in the same grade at the same school excluding student i. Results are reported in the lower panel of Table 2; the estimate of Peerfemics is very small (0.01) and statistically insignificant.
Moreover, we follow the literature (Lim and Meer 2017, 2018; Carrell and West 2010; Carrell, Sacerdote, and West 2013) to conduct a permutation test with a resampling approach. First, for each classroom within a school, we randomly draw 10,000 synthetic classrooms of the same size from the sample of all students in the school–grade block. We do this for all student characteristics (that is, gender, age, minority, local residence, only child, whether attended preschool, whether repeated a grade in primary school, whether skipped a grade in primary school, baseline noncognitive measures during primary school, and parents’ education). Second, for each student characteristic, and for each classroom within a school–grade, we calculate the average value for each characteristic within a classroom. We then obtain an empirical p-value—that is, the proportion of the 10,000 resampled classrooms with lower statistics for the corresponding characteristic (for example, female student dummy) within the observed classrooms. Last, we find that for all 13 predetermined characteristics, including student’s own gender, p-values are uniformly distributed from a χ2 test. Overall, we do not find evidence of nonrandom placement of students into classrooms by predetermined characteristics, including baseline academic ability and own gender.
Altogether, results from the balancing test suggest that student characteristics are well balanced across classrooms with different fractions of female peers, which lends further support to our identification assumption that students in our sample were randomly assigned to classrooms.
IV. Main Results
A. Gender Peer Effects on Academic Performance
We first examine the gender peer effect on students’ academic outcomes using regression Model 1. Table 3 reports the estimated effects of female peer proportion on students’ test scores in core courses. To facilitate interpretation, we normalize test scores by school, grade, and subject to obtain a mean of zero and standard deviation of one. All regressions include subject and school–grade fixed effects.
Gender Peer Effect on Test Score
As shown in Table 3, Column 1, the coefficient for the proportion of female peers is positive and statistically significant, which suggests that on average, when a student has more female peers in the class, they tend to achieve higher grades. After controlling for predetermined characteristics of the focal student (Column 2), teachers (Column 3), and the academic ability of female and male peers (Column 4), we find that the effect is consistently positive and statistically significant at the 1 percent level.
To appreciate the economic significance of the effects, we use the more conservative estimate from Column 4, which controls for student and teacher characteristics and peers’ ability. The coefficient, 1.019, suggests that a ten percentage point (approximately 1.25 standard deviations) increase in the proportion of female classmates raises a student’s test score by 10.19 percent of a standard deviation.
In addition to test score, which is an objective measure of academic performance, we also examine a subjective measure of academic performance, that is, students’ selfassessed scores regarding their learning effectiveness. Online Appendix Table 1 presents the results. Contrary to the positive effect on test scores, gender peer effects on selfassessment scores are very small and statistically insignificant. Taking the two sets of results together, our findings suggest that having more female classmates improves students’ academic performance, but not necessarily their perceived performance and/or confidence in learning.
B. Gender Peer Effects on Noncognitive Outcomes
To examine gender peer effects on students’ noncognitive outcomes, we focus on two indexes generated from eight items on the student questionnaire (Section II details construction of the indexes). One index measures students’ mental stress, and the other measures their social acclimation and general satisfaction in school. Both indexes are normalized to have a mean of zero and standard deviation of one. By definition, lower scores for mental stress and higher scores for social acclimation and general satisfaction indicate better outcomes.
Columns 1–3 in Table 4 present the estimated effects on students’ mental health. Across the specifications, the estimated impact on mental stress is small in magnitude and statistically insignificant, which suggests that having more female peers does not appear to influence students’ mental stress levels.
Gender Peer Effect on Noncognitive Measure
Columns 4–6 in Table 4 report the estimated effects on students’ social acclimation and general satisfaction in school. Overall, we find a positive effect of having more female classmates on students’ outcomes along this dimension. The effect remains robust after controlling for student and teacher characteristics, as well as for peers’ ability.
In Online Appendix Table 2 we present estimated effects on the eight noncognitive variables used to construct the indexes. Generally, the findings are consistent with the baseline effect. For instance, having more female peers in the classroom causes students to feel that school life is fulfilling and increases social interactions among students. The effects on mental stress, such as feeling blue or unhappy, are very small and statistically not different from zero.
Overall, our results consistently suggest that having a higher proportion of female peers in the classroom improves students’ social acclimation and general satisfaction in school.
C. Robustness Checks
In this section, we conduct several empirical exercises to test for random assignment, check whether our results are mainly driven by spillover from female students’ academic advantage or by teachers’ differential teaching and grading when more female students are present, examine sample attrition, and explore students’ behavioral outcomes that may be related to our noncognitive measures.
1. A further test for randomization
Our identification strategy relies on the random assignment of students to classrooms. We selected the sample using a strong criterion for random assignment, that is, crosschecking each principal’s report with those of the respective head teachers. A balancing test of student baseline characteristics also provides reassurance in this regard. Nevertheless, we conduct a further test to examine whether our regression sample might be contaminated by schools that in fact adopt nonrandom assignment rules and therefore bias the estimates.
In this empirical exercise, we randomly drop schools from the sample and examine whether regression results change dramatically. If our baseline sample contains mostly randomized classrooms, estimates using the reduced sample should not seriously deviate from our baseline estimates. To maintain sufficient sample size, we drop two schools each time, and conduct a total of 2,211 regressions for each outcome variable. Online Appendix Figure 2 plots the distribution of estimates for test scores, mental stress, and social acclimation and satisfaction separately. We find that all distributions are centered around the respective baseline estimates. Upper and lower bounds also lie in the same direction as the baseline estimates. These findings suggest that our baseline results are unlikely to be severely biased by the possible inclusion of nonrandomized classrooms.
2. Effects from female students’ ability spillover
Our main results document the overall effect of having female peers. One concern is that the effects may come from the spillover of female students’ academic ability and performance, given that the literature has established girls’ advantage in test scores during primary and middle school.
We address this issue from various angles and provide evidence that the effects are unlikely to be solely driven by girls’ academic ability. First, we compare female and male student characteristics and baseline academic ability in Online Appendix Table 3. Not surprisingly, there are some gender differences, but the magnitudes are economically small, and the pattern of academic performance before middle school is mixed. Although male students are more likely to repeat grades, they are also more likely to skip grades. Second, when we control for the academic ability of female and male peers, the main results remain similar to the baseline and statistically significant (Table 3, Column 4), which suggests that peers’ ability does not explain all of the effects of having more female peers.
Third, we examine the effects on test scores by subject. The premise is that if academic peer effects can explain our findings, then the subject in which female students demonstrate greater advantage should also show stronger effects from having more female peers. In Online Appendix Table 4, the coefficients on female dummy show a gender gap in test scores for each subject, which suggests that girls lead boys by 0.58 standard deviation in Chinese, 0.539 in English, and only 0.148 in math. In contrast, the benefit of having more female peers—the coefficients on proportion of female peers—is largest for math. In other words, the pattern of academic peer effects goes against that of gender peer effects. It seems unlikely, therefore, that our findings can be entirely explained by academic peer effects that are correlated with gender.13
3. Teacher assignment, differential teaching, and grading
One concern about the effect on test scores is that it may not reflect better academic achievement, but rather differential teaching and grading by teachers. For instance, if teachers grade more leniently or use a different syllabus when there are more female students in the classroom, we would also observe a positive effect on test scores associated with more female peers.
To address this concern, first we conduct a balancing test in Online Appendix Table 5 on teacher’s characteristics, that is, regressing teacher predetermined characteristics (gender, education, certificate, experience, title, tenure, etc.) on female peer proportion and control for school–grade fixed effect. We find that most estimates are small and statistically insignificant, suggesting no strong correlation between teachers’ observable characteristics and the percentage of female students in the classroom. We also include these teacher controls in all regressions, and our estimates remain stable.
Second, while it is difficult to verify the grading and teaching policies of each school in our sample, we provide some anecdotal evidence of consistent teaching and grading across teachers and classrooms in the same grade in the same school. As part of the compulsory education, middle school curricula are designed and enforced by the Ministry of Education at the national level. All schools are required to follow the curriculum for each grade, and teachers cannot arbitrarily change the courses, difficulty level, teaching hours, or scheduled outlines on their own. Education administrators at the province and city level also strictly enforce the implementation and management of coursework and usually recommend group preparation for teachers who teach the same subject within a school and grade. Group preparation is organized in regular meetings, in which members receive a detailed plan that includes teaching materials, assignments, and tests, and they revise as needed in a collective manner. Also, teachers of the same subject are required to grade midterm and final exams as a group. In some cities, schools in the same district organize uniform examinations and grading for the same subject and grade.
Third, we further offer suggestive evidence by examining differences across subjects. Of the three core subjects—math, Chinese, and English—math presumably has more objective components and grading rubrics than the other two. Our premise is that if teachers were to grade differently when there are more female students in the classroom, such a bias is more likely to affect test scores in Chinese and English. As shown in Online Appendix Table 4, we observe a positive effect of having more female peers in all subjects. It is strongest in math, with an estimated coefficient of 1.299 (significant at the 1 percent level). This empirical finding appears to contradict teachers’ differential grading across student gender composition.
4. Sample attrition
There are missing values in student outcome variables and predetermined characteristics. We next address the sample attrition problem and check whether peer gender is correlated with the likelihood of missing variables, which could result in biased estimates of gender peer effects. We regress the attrition dummy (whether a variable is missing) on peer gender, student gender, and school–grade fixed effects. As shown in Online Appendix Table 6, the coefficients on peer gender (proportion of female peers) are all close to zero and statistically insignificant, which indicates that our main results are not driven by sample attrition.
5. Related behavioral outcomes
In constructing the index for noncognitive outcomes, we focus on four variables for mental stress and four variables for social acclimation and satisfaction in school. We also identify two behavioral outcomes that may relate to students’ noncognitive factors: frequency of being late for school and dropping classes. We estimate the effect of having more female peers on these two behavioral outcomes and find a lower likelihood of being late for school or dropping classes (Online Appendix Table 7, Columns 1–2). These findings are consistent with improved noncognitive outcomes.14
V. Mechanisms
We find positive and significant effects of having female peers on students’ academic and noncognitive outcomes. In this section, we explore potential mechanisms and, in particular, focus on how teacher behavior, classroom environment, and student behavior may change when there are more female students in the classroom. We are aware that it is difficult to exhaust all relevant mechanisms or rule out the possibility that other mechanisms are in play. Accordingly, we conduct a decomposition analysis, which shows that these channels can explain a great deal of the gender peer effect.
A. Teacher Behavior: Teaching Style and Effort
Here we examine how teacher behavior, such as teaching style and effort exerted to work, vary by the gender composition of students in the classroom.
First, it is possible that teachers tailor their teaching style and communication strategies, or provide feedback differently, according to student gender—which, in turn, affects student outcomes. To assess the relevance of this mechanism, we construct an index of teacher feedback using PCA and two questions from the student survey: (i) “The teacher always praises me,” and (ii) “The teacher always asks me to answer questions in class.” Students are asked to rate to what extent they agree with the statement on a scale from 1 (strongly disagree) to 4 (strongly agree). Similarly, we use two items from the teacher questionnaire to construct the teaching style index: (i) “I introduce discussion among students in lectures,” and (ii) “I interact with students in lectures.” Teachers are asked to rate how often they adopt these methods in class on a scale from 1 (never) to 5 (always). We also include two variables on teacher behavior and effort. Parents are asked to rate the head teacher of the classroom based on whether they are patient and responsible, and teachers are asked to report how many hours they spend teaching and grading homework.
Table 5 reports the estimation results. Note that the columns use student-level (Column 1), parent-level (Column 2), and teacher-level (Columns 3 and 4) data, and therefore the number of observations varies across specifications. Results show that when there are more female students, teachers are more patient with and responsible for students. They also tend to spend more time on teaching and grading and adopt a more interactive teaching style—that is, by inviting students to engage in discussions among themselves and with the teacher during class. The effect on teaching style is large in magnitude but not precisely estimated, possibly due to lack of power. We do not find a significant impact on how teachers give feedback to students. Overall, there is evidence that teachers behave differently when there are more female students in the classroom—they adopt a more interactive teaching style and exert more effort.
Mechanism: Teacher Behaviors
B. Classroom Environment
A second possible mechanism is that students’ gender composition affects the general environment in the classroom, which influences students’ academic achievement, social acclimation, and satisfaction in school. To investigate this potential channel, we construct an index of classroom environment using two survey items from the student questionnaire: (i) “I feel that my classmates are friendly to me,” and (ii) “I feel that our classroom has a satisfying atmosphere.” Students are asked to rate the extent to which they agree with the statements on a scale from 1 (strongly disagree) to 4 (strongly agree). We normalize responses with a mean of zero and a standard deviation of one and fit regression Equation 1.
The results, as shown in Table 6, Column 1, demonstrate that when more female peers are present in the classroom, students report a significantly more friendly and satisfying classroom environment. The improved classroom environment may render learning more effective and enjoyable, and thus benefit students’ academic achievement. A friendlier environment can also facilitate student interaction and support a feeling of being well adjusted among school peers. Our findings also echo prior studies, such as Booij, Leuven, and Oosterbeek (2017) and Feld and Zölitz (2017), which find that peer composition affects student interaction and involvement in the classroom.
Mechanism: Classroom Environment and Student Effort
C. Student Behavior: Learning Effort
Last, we analyze how peers’ gender affects student behavior. In particular, peers’ gender might affect students’ motivation and effort exerted in learning, which in turn influence their academic outcomes. There is evidence that peers affect student effort in terms of studying and their use of time (Stinebrickner and Stinebrickner 2006). On the questionnaire, students are asked to report how many hours they spend each week on homework and tutorials. We use this information to investigate how gendercomposition in the classroom affects students’ effort in learning.
Column 2 in Table 6 reports the estimation results, which suggest that students spend more time on homework and tutorials when they have more female peers. The effects are economically and statistically significant. We also notice that female students tend to exert greater effort than male students. A possible explanation is that when there are more female peers, students feel greater peer pressure to work hard.
D. Decomposition of Mechanisms
Our findings show that gender peer effects may work through teachers’ teaching style and effort, classroom environment, and student effort, which in turn influence student outcomes. To further understand how much each channel explains gender peer effects, as well as their combined explanatory power, we following Heckman, Pinto, and Savelyev (2013) and Gelbach (2016) to exploit a decomposition method. In particular, we denote as the mechanism variable j and consider the following estimation specification:
(2)
Next, we include all relevant mechanism variables into Equation 1 and consider the following specification:
(3)
Gelbach (2016) shows that
(4)
This suggests that mechanism j’s component is , and the remaining unexplained part is
. For each mechanism, we compute its explanatory power for gender peer effect by
.15
Figure 1, Panel A plots the estimated decomposition of gender peer effects on academic outcomes into teachers’ teaching style, time spent on teaching and grading, and patience and responsibility; students’ effort; classroom environment; and other factors. We find that for the overall effect on test scores, teachers’ effort (time spent on teaching and grading) explains approximately 4.4 percent of the effects; teachers’ patience and responsibility explains around 7.9 percent; classroom environment explains 2.8 percent; and student effort explains 8.6 percent. Taken together, they explain 23.7 percent of gender peer effects on test scores. The remainder is unexplained by these mechanisms.
Decomposition of Mechanism
Figure 1, Panel B presents the decomposition of gender peer effects on social acclimation and general satisfaction. We find that similar to its effect on test scores, teachers’ behavior—teaching style, time allocation, and responsibility—in total explains approximately 10.4 percent of the effect on students’ social acclimation. Classroom environment accounts for 9.0 percent of the effect, and student effort explains a smaller share (2.7 percent) of the improvement in social acclimation and satisfaction in school.
Overall, we find evidence that having more female students in the classroom motivates teachers to allocate more time to teaching and grading and to lecture more interactively, increases student effort, and improves the classroom environment. These channels explain a large share of gender peer effects. Our findings are consistent with those of Lavy and Schlosser (2011), who find that an increased proportion of female peers reduces the level of disciplinary problems, improves interstudent and teacher–student relationships, and reduces teacher fatigue. Our analysis innovates by measuring not only students’ perceptions of the school environment, but also individual-level behaviors, such as students’ effort in learning, teachers’ time spent on working, and teaching style.
Understanding the mechanisms of gender peer effects is important for policy design. To the extent that the number of female students in a school is fixed, the benefit of having more female peers in one class could be offset by the cost of having fewer female peers in another class. Understanding the sources of gender peer effects sheds light on more practical and affordable opportunities—in particular, teacher behavior, classroom environment, and student effort—to improve student outcomes. For instance, when there are fewer female students than desired, instructors may consider behaving more patiently and responsibly toward students and adding more discussion during lectures. In assigning teachers to classrooms, principals can take student gender composition into account, in that teachers who tend to be more patient and to actively engage students may be able to compensate for the lower proportion of female students in the classroom. In the same vein, head teachers may consider other ways to make the classroom friendlier (for example, by encouraging group activities within and after classes) and boost student motivation (for example, by strengthening incentives) to achieve benefits similar to those of having more female peers.
VI. Heterogeneity in Gender Peer Effects
Finally, we explore how gender peer effects vary by student characteristics—that is, own gender, parents’ education, and teacher gender and experience. We include interaction terms between female peer proportion and the corresponding variable and report results in Table 7.
Heterogeneous Effects
We find differential peer gender effects between female and male students. While a higher proportion of female peers improves test scores for both female and male students, the effect is much stronger for male students (Table 7, Column 1). The effect size suggests that a ten percentage point increase in the proportion of female students increases average test scores of boys and girls by 14.0 percent and 6.5 percent of a standard deviation, respectively. For noncognitive outcomes, girls appear to benefit more from having female peers than boys. The effect on boys’ mental stress is not statistically significant, while girls are less likely to suffer from mental stress (Table 7, Column 5). The effect on social acclimation and general satisfaction is positive for both female and male students; female students tend to benefit more, although the difference is not statistically significant (Table 7, Column 9). This is consistent with Lavy and Schlosser (2011), who find that when more female peers are present, girls tend to report better interstudent relationships and social adjustment in class.
The heterogeneity in the effects on test scores may reflect certain differences in the returns to increased effort of study. As shown in Online Appendix Tables 8A and 8B, when we decompose the mechanisms of peer effects for male and female students separately, a major difference is the channel through effort. When more female peers are present in the classroom, both male and female students allocate more time to study, but the increased effort translates into better academic performance only for boys. A possible explanation is that compared with male students, female students spend more time at the baseline (on average, 27.5 hours on homework and tutorials per week, versus 24 hours for male students in our sample), and the marginal benefit from additional effort may diminish.
There is also evidence of gender peer effects interacting with teacher’s gender. Table 7, Column 3 suggests that while there are overall positive effects from female peers, the gain is larger when students have a male teacher. This suggests that having a female teacher might be a substitute for having more female peers. We do not find heterogeneous effects by teachers’ experience or parents’ education on either students’ test scores or their noncognitive outcomes.
VII. Conclusion
In this study we use a nationally representative survey of middle school students to investigate gender peer effects on students’ academic performance and noncognitive outcomes. By employing information about classroom assignment within schools, we are able to restrict the sample to schools that randomly assign students to classrooms and therefore estimate the causal relationship between peer gender composition and student outcomes.
Our results show that having a higher proportion of female peers in the classroom significantly raises students’ test scores and improves social acclimation and general satisfaction in school. By exploring the potential mechanisms through which peers’ gender plays a role, we find evidence that teachers behave more patiently and responsibly toward students and spend more time on teaching and grading. We also find an improved classroom environment and greater student effort exerted in learning. These mechanisms collectively explain a significant fraction of the identified peer gender effects.
Our findings make several contributions to the literature and have some policy implications. First, while most previous literature focuses on the effects of the school environment on students’ academic outcomes, our study provides more evidence on the impact on students’ noncognitive outcomes, which are important factors in explaining academic achievement, labor market success, and other significant life outcomes (Heckman and Rubinstein 2001; Heckman, Pinto, and Savelyev 2013 ; Segal 2013; Bertrand and Pan 2013).
Second, we provide rich evidence for the mechanisms that drive these effects: teacher behavior and teaching style, classroom environment, and student effort. Understanding these mechanisms sheds light on educational policies designed to improve student outcomes. For example, our decomposition exercise shows that teacher behavior and classroom environment explain a considerable amount of gender peer effects on test scores. One implication could be that to compensate for the small share of female students in certain classes, schools could assign teachers by considering their work style, attitudes, and motivation/effort. Also, improving the classroom environment might achieve outcomes similar to those of having more female peers.
Footnotes
The authors appreciate comments from Nicola Bianchi, Jonathan Guryan, Jessica Pan, Ivan Png, Songfa Zhong, three anonymous referees, and scholars and seminar participants at Northwestern University, National University of Singapore, Asian Meeting of the Econometric Society, and International Symposium on Contemporary Labor Economics. The authors acknowledge financial support from the Singapore MOE AcRF Tier 1 (R313000129115), Natural Science Foundation of China (71803027), Ministry of Education in China (18YJC790139), and Shanghai Pujiang Talent Program (18PJC012). The views expressed in this paper are those of the authors and do not necessarily represent the views of Singapore Ministry of Education, Natural Science Foundation of China, Ministry of Education in China, or Shanghai science and technology commission. The data used in this paper are available at the website of Chine Education Panel Survey: https://ceps.ruc.edu.cn/. The authors are willing to assist.
Supplementary materials are freely available online at: http://uwpress.wisc.edu/journals/journals/jhr-supplementary.html
↵1. See Epple and Romano (2011) for an extensive review of the literature.
↵2. Manski (1993) documents three types of effects that can generate similar peer outcomes: (i) correlated effects arise when individuals with similar backgrounds self-select into the same group; (ii) exogenous effects arise when individuals’ predetermined characteristics affect their peers’ outcomes; and (iii) endogenous effects arise when individuals’ outcomes directly affect their peers’ outcomes. Since we are interested in the effects of gender, which is predetermined, the endogenous effects are not applicable here. The focus of our identification strategy is to separate exogenous effects from correlated effects.
↵3. The CEPS is the first and largest nationally representative survey in China to focus on secondary school students and teachers. The survey began in 2013 and applied a stratified sampling design: 28 counties/city districts are chosen nationwide, with four middle schools and multiple (but not all) classrooms within each school chosen to represent a given county/city district. For a given classroom that is chosen, the survey covers all students, the head teacher, and the main-subject teachers.
↵4. The CEPS is a longitudinal survey starting with students in Grades 7 and 9 in the 2013–2014 academic year. We use the first wave for this paper because the second wave had not been released when we conducted the research. In addition, the second wave has a low retention rate, as it loses track of all Grade 9 students (who had graduated from middle school) and around 15 percent of the Grade 7 students. Some classrooms are reported to have had changes in student composition since the first wave, which could contaminate our estimation. As such, we use the first wave of the survey to ensure consistency and accuracy.
↵5. In addition, the corresponding 1 – R2 from the regression that regresses peer female proportion on school–grade fixed effect and all control variables equals to 0.243, suggesting a sufficient variation of gender composition across classrooms.
↵6. Exams are graded in a rigorous and consistent manner. During the grading process, each student’s name, class, and ID are hidden from the grader. Within a grade in the same school, teachers divide the grading work so that the same question is typically graded by the same teacher using a consistent rubric.
↵7. These measures are comparable to indicators used in the literature. For example, Lavy (2020) measures students’ satisfaction and social adjustment in school with two survey questions; “I feel well-adjusted socially in my class,” and “I am satisfied in school.”
↵8. Noncognitive measures during primary school include: “Express opinions clearly in primary school,” a selfreported score from 1 (disagree) to 5 (agree) regarding whether they expressed opinions clearly in primary school; “Respond quickly in primary school,” a self-reported score from 1 (disagree) to 5 (agree) regarding whether they responded quickly in primary school; and “Learn new stuff quickly in primary school,” a selfreported score from 1 (disagree) to 5 (agree) regarding whether they learned new material quickly in primary school. We take an average of these three measurements and normalize to have a mean of zero and a standard deviation of one.
↵9. Student characteristics are the same set of demographic variables as described in Section II. Teacher controls include teachers’ gender, age, years of schooling, experience, professional job title, marital status, and whether they graduated from a normal college. Peers’ ability control is measured by whether they repeated grades or skipped grades in primary school. We include male peers’ ability and female peers’ ability separately.
↵10. Criteria are based on responses on the principal and teacher questionnaire. First, all school principals were asked to report which of the following assignment rules they used to place students: (a) based on pre-enrollment test scores, (b) based on students’ residential status, (c) random assignment, or (d) based on other factors. We restrict our sample to schools that use method (c), random assignment. Second, the same principals were asked whether their schools rearrange classrooms for Grades 8 and 9; we exclude those that do so. Finally, each head teacher was asked whether students in the grade level taught are assigned by test scores; again, we drop the entire grade if any head teacher answers yes.
↵11. Limiting the sample to randomized classrooms may raise concern regarding the external validity of our findings. To this end, we compare school-level characteristics between randomized and nonrandomized classrooms, such as public or private, share of rural students, share of local versus migrant students, share of teachers with a professional title, the school principal’s education background and working experience, and average age of schools (how long they have operated) in the district. Results show very similar statistics across random and nonrandom samples, which suggests that our sample restriction will not severely affect the generality of our findings.
↵12. The balancing test contains fewer observations than the summary statistics (Table 1), due to missing values for some of the student characteristics. We test sample attrition in Section IV and do not find any correlation between attrition and gender composition in the classroom.
↵13. Previous studies also discuss this issue, and our findings are consistent with theirs. Lavy and Schlosser (2011) argue that it seems unlikely that all gains in achievement are generated solely by girls’ ability spillover, as they also find positive gender peer effects in subjects in which girls’ achievement is lower than boys. Hoxby and Weingarth (2005) show that even after controlling for peers’ lagged achievement, the positive gender peer effects are still robust.
↵14. Another survey question related to our noncognitive measure is students’ level of feeling grief. We excluded it in our construction of the mental stress index, as the variable may capture short-run, drastic changes in the environment rather than the student’s noncognitive factors. Nevertheless, the estimated effect after including “feeling grief” is similar to the baseline findings.
↵15. Note that if some unmeasured mechanisms are correlated with the observed mechanisms and/or if the observed mechanisms are measured with error, γj might be biased. Therefore, the decomposition results should be interpreted with caution.
- Received September 2018.
- Accepted August 2019.