Abstract
This study investigates how teamwork influences students’ human capital, which is defined to be academic performance and personality traits. In a rural county in China, we randomly select classes in elementary schools and form small teams within the treatment classes. Team members need to complete team activities. We find that the act of forming teams can significantly improve students’ academic performance. Teamwork also causes substantial changes in noncognitive skills. Students in the treatment classes achieve higher scores in conscientiousness, extraversion, openness, and neuroticism but lower scores in agreeableness. These changes indicate a higher level of performance motivation.
I. Introduction
Lack of motivation in students in basic education has been recognized as a worldwide phenomenon (Hanushek 2003; Glewwe and Muralidharan 2016; Muralidharan 2017; World Bank 2019). In the literature on peer effects, a trend shows that group‐based incentives work better than individual‐based incentives in improving students’ academic performance (Blimpo 2014; Li et al. 2014).1 However, the pecuniary incentives in the interventions may reduce the feasibility of implementation in developing areas. In addition, current interventions do not pay sufficient attention to noncognitive skills, which were found to be an important component of human capital.2 Given that students often make prosocial decisions when faced with team incentives (Babcock et al. 2015), it is interesting to know whether the easily available intervention of teaming up students in school activities without pecuniary incentives could motivate students. Students’ human capital development, which is defined as a combination of academic performance (cognitive skills) and personality traits (noncognitive skills), would also be affected by the intervention. In this study, we investigate how the human capital is affected by the teamwork. The experimental setting preserves the schools’ teaching schedule and arrangement of activities. The class teachers, who are responsible for the class as a whole, report to the teams the team performance daily. The class teachers also keep a confidential record of each team member’s performance. In an incentive‐compatible environment, students need to cooperate with their teammates to outperform other teams. Perceived team incentives may be stronger in the teams than individual incentives because ideally students need to consider the impact of their individual behaviors on the team. We track the changes in cognitive and noncognitive skills and explore the effects of teamwork on the students’ human capital development fostered in our randomly formed teams. Beyond the mere quantification of the effects of teamwork, in this study we aim also to understand the underlying mechanism.
We randomly select classes in elementary schools as the treatment groups in a rural county in China and form small teams with five to six members within the classes. The students in the teams complete daily duty tasks, attendance checks, and counts of disruptive behaviors.3 The students finish their homework individually but submit their homework as a team throughout an entire semester. The students in the control classes engage in the same tasks individually instead of as a team. Teachers provide reports on the students’ disruptive behaviors and task performance in both types of classes. However, the reports on the treatment classes are based on the performance of the entire team. The students need to adapt themselves to the teamwork when completing the team tasks. Using the difference‐in‐difference (DID) method, we find that forming teams improves cognitive and noncognitive skills of the students in the treatment classes.4 Our experimental design and estimation strategy enable us to identify properly the effects of teamwork on human capital. First, the treatment classes are strictly selected via a lottery to minimize selection bias. The descriptive statistics indicate the absence of significant differences in prior academic performance and personality traits between the students in the treatment and control groups.5 Second, the students in the control and treatment classes are required to take the same types and numbers of tasks. The only difference is that the students in the treatment classes have fixed partners in the team activities, such as daily duties. The experimental design minimizes interruption to the regular teaching schedule. The students in treatment classes are not required to stay in school longer or undertake more tasks than their counterparts in the control classes. Other factors that may potentially affect human capital are controlled, and any observed changes in human capital can be substantially attributed to the effects of teamwork. The mechanism analysis shows that changes in behaviors are the functioning mechanism behind the improvement in academic performance. The students changed their behaviors owing to self‐policing, social stigma, and learning from role models. The setting of team appraisals as a reward functions as motivation to change personality traits.
This study contributes to the literature in three ways. First, we determine that teamwork elicits significant changes in the students’ personality traits, and the changes reflect a high level of performance motivation in the students. Unlike pioneering research on noncognitive skills that focused on the outcome of individual behaviors (for example, Evans, Oates, and Schwab 1992; Aizer 2008; Neidell and Waldfogel 2010) or mental stress and level of social acclimation and satisfaction in school (Gong, Lu, and Song 2021), we employ an alternative perspective that focuses on personality traits as direct measures of noncognitive skills (Heckman and Rubinstein 2001). Research on education provides evidence that the positive effects of teamwork can manifest in more than one dimension, depending on the team activities.6 In our experiment, we expect teamwork to induce changes beyond academic performance. We choose the “Big Five” personality traits to account for the complexity of the effects of teamwork. We find that the students in the treatment classes obtain high scores in conscientiousness, extraversion, openness, and neuroticism but low scores in agreeableness. All the changes in personality traits in our results indicate a high level of performance motivation (Judge and Ilies 2002; Hart et al. 2007).
Second, we establish that forming teams is a cost‐effective way to improve education outcomes. Most of the related studies focus on how pecuniary team incentives can be used to improve teams’ academic outcomes. Blimpo (2014) and Li et al. (2014) found that students in developing countries who received peer (team) incentives improved their academic performance.7 Instead of focusing on the effects of pecuniary incentives, we identify the effects of teamwork alone. As pecuniary incentives are absent in our experiment, the students improve their performance solely to achieve team success. The results of the heterogeneity tests suggest that teamwork exerts a large positive impact on the students in lower grade levels. Compared with pecuniary incentives, which may not be accessible to schools with constrained budgets, forming teams is more affordable and feasible.
Third, we provide new insights into the mechanisms that motivate students in teams. Our results suggest that the students in the teams have significantly fewer disruptive behaviors and better focus during lectures than their counterparts in the control classes. We further explore students’ reasons for changing their behaviors by conducting a follow‐up survey on all 15 class teachers who participated in the experiment and a 10 percent random sample of the students in the treatment classes. The results show that 93 percent of the teachers believe that giving the students information on their team performance and setting team appraisals as rewards are essential to motivate the students. The motivation is well perceived, as 90 percent of the students indicate that they paid more efforts to study and observe in‐class discipline. The changes in personality traits also confirm the students’ higher performance motivation level. In addition, the students report that they changed their behaviors owing to intrinsic and extrinsic reasons, such as self‐policing, social stigma, and learning from role models. The improved academic performance is a result of the students’ efforts to study.
In summary, the results of this study indicate that forming teams, by itself, can benefit students’ human capital development. The effects of the intervention are reflected in not only the students’ improved academic performance but also their personality traits, which may generate positive effects in the long run. Multiple mechanisms exist behind the effectiveness of forming teams.
In the following, Section II introduces the curriculum schedule and seating arrangement in elementary schools in China. Section III describes the experimental design and data set. Section IV discusses the estimation strategy and empirical model. Sections V and VI present the empirical results and expound the functioning mechanism of the teams, respectively. Finally, Section VII concludes the paper.
II. Curriculum Schedule and Seating Arrangement in Elementary Schools in Rural China
Elementary school students in rural China are enrolled in units of classes. Once students are enrolled in a school and assigned to a class, they typically remain in the same class for the entirety of their six‐year primary education. Class transfers are possible but rare.8 On weekdays, students generally arrive at school at 8:00 a.m. and leave at 4:30 p.m. They attend six lectures with a duration of 45 minutes, with a ten‐minute intermission between lectures. Many students spend lunch breaks in class. The curriculum schedule ensures that students spend most of their school hours with their classmates. Outside of school, apart from doing homework, students spend time on activities such as extracurricular reading or chores.9
The seating arrangement in elementary schools in rural China is relatively fixed. Students are assigned to different rows according to their height, that is, short students sit in the front rows for the practical reason that tall students may block the view of short students if they sat in the front rows. Shuffling is implemented to help the students’ vision development but occurs mainly by columns instead of rows. Students generally sit in pairs, and seatmates share a desk or have individual but connected desks. Seatmate pairings remain fixed even with shuffling.
The fixed seating arrangement and curriculum schedule in elementary schools in China provide us with ample opportunities to conduct the field experiment. We randomly assigned students with similar heights to teams and ensured that the team members sat next to one another in either the same row or same column.10 We expected physical proximity to increase the students’ awareness of the teams with minimum interruption, if any, to the common seating arrangement. Applying the seating arrangement rule to both types of classes meant that the students in the treatment classes would have sat in the same position and had the same peers if they were in the control classes. The difference, if any, between the treatment classes and control classes can be attributed to forming teams.
III. Experimental Design
We conducted the experiment intervention in a rural county (L County) of Hunan Province in Central China for five months, from September 2015 to January 2016 (Wang 2021). The duration of the intervention covered the entire autumn semester of the academic year 2015–2016.11 In addition, we invited all 15 participating class teachers and a 10 percent random sample of students in the treatment classes to answer a follow‐up survey in the spring semester 2021, five years after the end of the intervention. With the permission and cooperation of the local education bureau, we randomly selected five elementary schools from the complete list of elementary schools in the county to conduct the experiment.12 The students in the participating schools were initially assigned to different classes randomly at the start of first grade. We focused on the students in the third, fourth, and fifth grades.13 We randomly chose two classes in each grade from each school. Via a lottery, we assigned one class as a treatment class, wherein we randomly formed teams. We assigned the other class as the control class, wherein no intervention was implemented. In this section, we describe how the teams were formed. In total, we had 30 classes comprising 15 treatment classes and 15 control classes. A total of 1,589 students made up the 30 classes, specifically 907 male students (57 percent), and 682 female students (43 percent). Table 1 presents the gender composition of the sample schools.14
To measure academic performance, we collected information on the students’ scores from three examinations, that is, their final examination in the spring semester of academic year 2014–2015 and the midterm examination and final examination in the autumn semester of academic year 2015–2016. We used the scores from the spring semester as the baseline measurement because they reflected the academic performance of the students before their participation in the experiment. For each examination, scores from three subjects—Chinese, mathematics, and English—were reported.15 All examinations were unified and designed by the local education bureau, and all the participating schools administered the examinations at the same time. Thus, the scores were comparable across schools.
During the intervention, we requested all the students answer a paper‐based questionnaire twice. Online Appendix B presents the invitation letter and instructions for the students. The first‐round questionnaire was given two weeks before the start of the autumn semester, and the second‐round questionnaire was administered two weeks before the final examination. The questionnaire collected information on the students’ demographics and attitudes toward studying and personality traits measurements. The questionnaire also gathered basic information on the students’ parents, such as their education, occupation, and income. The students answered the questionnaires during self‐study sessions in the presence of class teachers, who then collected the forms and uploaded the detailed information to our online system. To guarantee data accuracy and completeness, the class teachers returned incomplete forms or forms with obvious errors to the corresponding students for correction until the forms were satisfactory. We supervised the entire procedure of questionnaire completion and uploading to ensure the accuracy and effectiveness of the information.
In order to understand the mechanisms behind the effects of forming teams, we also conducted a follow‐up survey of the class teachers and a random sample of the students in treatment classes. In the teacher’s survey, we asked the class teachers about their observations on the changes in the students’ behaviors and the general experimental implementation. In the student’s survey, we asked the students to reflect on their behavioral changes during the experiment. The questions were designed to explore the reasons behind their behavioral changes. We provide the details of the follow‐up survey in Section VI.
Forming teams and arranging seats were crucial components of the experiment. We set the team size to five to six members, which is within the range of the optimal team size noted in the literature (Drakeford 2012). We justify the practice of forming teams as follows. In Chinese classrooms, students are seated in rows from the front to the back of the classroom, with short students taking the front rows. To be comparable with the current classroom structure, we first assigned the students to three sets according to height, that is, below‐average, average, and above‐average height. In each set, every six students were grouped by lottery, and their seats were also assigned by lottery.16 We ensured that the short students sat in the front rows, which was how they would be assigned if they were in the control classes. We also ensured that the members of a same team sat next to one another either in the same row or in the same column. In consideration of the students’ visual development, seating was shuffled every two weeks, but the team members remained unchanged. Figure 1 illustrates the seating arrangement in the experiment classes.
We selected some standard school activities as team tasks to increase interaction within the teams and raise team awareness. Such tasks included daily duties, attendance checks, disruptive behavior warnings as a team, and homework as a team. Homework was completed individually, but the team members were required to submit their homework on time as a team.17 In the schools in our experiment, examination grades were determined by only the students’ performance in examinations. Homework grades were not a component of the final grade. The academic performance in the analysis was measured by the standardized examination scores. Students took the examinations on their own. The scores reflected academic ability at the individual level. The class teachers reported the team performance daily to the class. The records included the total number of disruptive behaviors of each team and the team performance on the team tasks at the aggregate level. A confidential record of disruptive behaviors at the individual level was also kept by class teachers. This record was used in the mechanism analysis of Section VI.
Teams may have differentiated effects on students in the geographic core of a team and those on the periphery owing to different exposures to members of other teams. However, the seating arrangement and seating shuffling in our experiment ensured that the students, except those sitting in the first and last rows, sat in the core and periphery of their team for approximately the same length of time.
In the control classes, the seating arrangement was also set according to the students’ height and by lottery without forming teams. The similar seating arrangement in both types of classes implied that the students in the treatment classes would have sat at the same position if they were in the control classes. After setting the seating arrangement, we instructed the class teachers to inform us of any changes. However, no changes in the seating arrangement were reported. The students in the control classes also received a record of their individual in‐class behaviors and school task performance. Figure 2 illustrates the experimental procedure.
A feature of our sample is that owing to limited educational resources, the same set of teachers, class teachers, and subject teachers were assigned to the treatment and control classes in the same grade in each school. The teachers received training before the experiment implementation. We specifically requested the teachers to treat the control classes and treatment classes equally and to form teams only in the treatment classes. We expected the training to control for the teaching quality and prevent potential spillover effects. In our follow‐up survey, all the class teachers indicated that they did not treat the treatment classes differently, such as by paying more attention or by using a different curriculum or pedagogy. The only exception was that class teachers reported the students’ performance based on team performance in the treatment classes and based on individual performance in the control classes. Yet, class teachers kept a record of disruptive behaviors in both treatment classes and control classes. The teachers did not receive any pecuniary incentives from this experiment based on the students’ academic performance. We also requested the inspectors, one for each grade in a school, to observe and ensure that the teachers did not treat the control and treatment classes differently. This feature of our methodology eased concerns that the experiment might cause changes in behaviors of the teachers.
In summary, owing to our sample selection strategy, the treatment classes and control classes were completely comparable. In the treatment classes, forming teams would raise the students’ awareness of teamwork. Their nearby classmates would have been the same ones if they were in the control classes. The students undertook the same number and types of activities on school days. Finally, we expect the teachers’ quality and behaviors to be the same in both groups of classes.
IV. Empirical Model
We employed the DID method to estimate the effects of teamwork on academic performance (cognitive skills) and personality traits (noncognitive skills). The experimental design emphasized randomness to the highest potential. However, individual and family characteristics could present considerable variation at the individual level and influence students’ human capital substantially. The DID analysis can properly control for factors external to the teams that could affect the students’ cognitive and noncognitive skills. The validity of the DID analysis relied on the randomness of the treatment classes and teams. To avoid endogeneity, we enforced randomness in choosing the treatment classes and forming the teams. We conducted a between‐group t‐test on the means of the baseline scores, noncognitive skills, and individual characteristics from the first‐round questionnaire. Table 2 reports the descriptive statistics. The results showed no significant differences in the baseline scores. No significant differences were observed in the personality traits of openness, conscientiousness, and extraversion and in terms of gender, age, and height. Significant differences were found only in the agreeableness and neuroticism dimensions of the “Big Five” personality traits. We regarded the differences in the two dimensions as a random fluctuation. In addition, our empirical analysis revealed that these initial differences did not bias our results.
We controlled for several sets of variables that influenced academic performance and personality traits. Individual characteristics were gender, age, and height. Class characteristics were school, grade, class, and team dummy variables. Family characteristics were parents’ income, extracurricular reading time, and household chore time. We also employed the baseline examination scores to control for the students’ initial study abilities. Our model is presented as follows: 1
where yit denotes the standardized examination scores or noncognitive skills of student i at time t; Xi is the vector of individual characteristics, class characteristics, and family characteristics; treatmenti represents the dummy variable “treatment classes,” where treatment classes are denoted with one and control classes are zero; and postt is the time dummy, where experiment implementation is denoted with one, and pre‐implementation with zero. β1 is the unobserved fixed effects in the treatment classes, β2 pertains to the unobserved time effects in the treatment and control classes, β3 refers to the students’ initial study abilities, ϵit denotes the error term, and γ is the coefficient on the cross‐term between the treatment and post dummies. γ is the core coefficient in our research and presents the effects of teamwork.
V. Empirical Results
In this section, we present the descriptive statistics of the key variables, the main results of the effects of teams—in particular, the effects of teams on cognitive and noncognitive skills—and the heterogeneity tests on the effects of teams on academic performance.
A. Descriptive Statistics of Variables
Table 3 provides the descriptive statistics of the dependent and independent variables. We measured cognitive skills using standardized scores. Standardization was applied to the average scores in Chinese, mathematics, and English to yield standardized scores with a mean of zero and a standard deviation (SD) of one.18 Noncognitive skills were represented by the “Big Five” personality traits, namely, openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism.
B. Effects of Teamwork on Cognitive Skills
Table 4 reports the effects of forming teams on cognitive skills. As the treatment classes and control classes were paired at each grade in a school, we clustered the standard errors at the school–grade level to control for the “paired” feature (de Chaisemartin and Ramirez‐Cuellar 2020). In Panel A, we tested whether the treatment class dummy was correlated with the students’ test scores by using either the midterm examination scores or final examination scores as the dependent variables in an ordinary least squares (OLS) regression. We added the individual characteristics, school characteristics, and family characteristics to the regression by steps. We observed a robust correlation between the treatment dummy and the dependent variables. In Panel B, we used DID analysis on Equation 1 to test the effects of forming teams on the test scores. We added the control variables by steps. Columns 1–4 display the results of the comparison between the midterm examination scores and baseline examination scores with different model specifications.19 The students took their midterm examinations two months after the start of the experiment. Forming teams significantly improved the students’ academic performance in their midterm examination. On average, being in a team increased a student’s standardized score by 0.090–0.100 SD, depending on the model specifications. All the results were significant at the 1 percent level. Columns 5–8 present the results of the DID analysis comparing the final examination scores and baseline scores, which were taken five months after the start of the experiment. Compared with the effects observed in the midterm examination, the teams exerted a slightly weaker effect on improving the students’ academic performance in the final examination. With the full model specification, we observe that the students’ scores increased by 0.085 SD on average from the baseline examination to the final examination. In the last row of Panel B, we conducted a small‐sample inference, and the results confirmed the robustness of the results.20
C. Heterogeneity Tests on Cognitive Skills
We searched for evidence of the various effects of teamwork across the individual characteristics (that is, gender, academic performance, and grade). Table 5 summarizes the heterogeneous effects.
The first two columns in Table 5 test the gender differences in the effects of teamwork on academic performance. No significant gender differences existed in our results.
We also explored whether the teams exerted different effects on the students with different academic abilities. We divided the students’ baseline scores into four tiers, that is, A (above 85), B (75–84), C (65–74), and D (below 64). Columns 3–6 in Table 5 report the effects for each tier. The results indicated that forming teams had a positive effect on all the tiers but was not significant for tiers A, B, or C. The between‐group tests suggested that no significant differences existed in forming teams for the students with different academic abilities.
Students in lower grades are typically less mature than students in higher grades in terms of cognitive development. Students in lower grades may be more malleable than students in higher grades when the new concept of teams is introduced. We hypothesized that the students in lower grades can benefit more from teams compared with those in the higher grades. Columns 7–9 in Table 5 report the effects of teams on the third‐, fourth‐, and fifth‐grade students, respectively. The effects of forming teams on the third‐grade students were the largest, with scores increasing by 0.186 and 0.165 SD in the midterm and final examinations, respectively. Both results are significant at the 1 percent level and thus supported our hypothesis.
D. Effects of Teamwork on Noncognitive Skills
In this section, we investigate how forming teams affected the students’ personality traits. We intentionally designed team tasks to raise the students’ awareness of teamwork. Frequent communication and interaction were observed among the students when carrying out their team tasks. We expected the students’ personalities to be influenced and shaped by the interactions. Eagerness for positive appraisal may also elicit motivation, which can be reflected in the changes in the students’ personality traits. Our questionnaire probed the students’ attitudes through several statements from which the “Big Five” personality traits were constructed (Goldberg 1990, 1992). We standardized the personality traits to have a mean of zero and a SD of one. Table 6 presents the differences in personality traits between the treatment and control classes.
The results showed the significant effects of teams on the students’ personality traits. Of the five dimensions, conscientiousness and extraversion presented the most prominent differences in scores. Comparing the results from two rounds of questionnaire, being a member of a team provided the students in the treatment classes with higher scores than their counterparts in the control classes. The between‐group difference increased by 0.194 and 0.153 SD in conscientiousness and extraversion, respectively. The scores in openness also increased in the treatment classes, though the between‐group differences were not significant. Being in a team slightly decreased agreeableness in the students by 0.042 SD and increased neuroticism by 0.072 SD.
In the literature on psychology, the correlation between personality traits and motivation has been extensively studied.21 Conscientiousness was found to be the strongest and most consistent trait correlated with performance motivation. Neuroticism and extraversion were also found to be positively correlated with motivation. Hart et al. (2007) distinguished intrinsic achievement motivation from extrinsic motivation. Conscientiousness, openness, and extraversion were found to be positively correlated with the intrinsic achievement motivation, whereas conscientiousness and neuroticism were positively related to the extrinsic achievement motivation. Agreeableness was also found to be negatively associated with extrinsic achievement motivation. Our results match these findings and showed that forming teams provided the students with incentives to perform well in school. The high scores in conscientiousness and neuroticism and low scores in agreeableness implied that eagerness for positive appraisal created extrinsic motivation for the students. Moreover, evidence showed that the students internalized some motivations because their scores in extraversion also increased. The high motivation level may exert a large effect on the students’ labor market outcome in the long run (Heckman and Mosso 2014).
To identify the effects of teamwork on the students’ noncognitive skills, we performed a DID analysis on the noncognitive skills, as shown in Equation 1. Table 7 presents the results of the OLS estimates and the DID estimates. Forming teams increased conscientiousness and neuroticism by 0.140 and 0.200 SD, respectively, and decreased agreeableness by 0.149 SD. The effects are comparable in size to those reported in the literature of intervention on personality (Roberts et al. 2017). These outcomes suggested that being a member of a team raised the students’ awareness of teamwork and responsibility, and eagerness for positive appraisal served as a motivation to perform well.
We also conducted heterogeneity tests on gender, academic performance, and grade levels. The results are summarized in Table 8. The boys and students with the lowest initial academic performance had a large increment of scores in conscientiousness and extraversion. The students in the lower grades achieved high scores in conscientiousness. The girls and students in the higher grades received low scores in agreeableness and high scores in neuroticism without significant changes in either conscientiousness or extraversion. The between‐group tests suggested no significant differences between the different cohorts, except that the students in the higher grade demonstrated low agreeableness and high neuroticism. The outcomes suggested that forming teams created incentives for all the students regardless of gender, academic performance, and grade.
VI. Mechanism Analysis
In this section, we search for the potential mechanisms behind forming teams. As motivation should be reflected directly in the changes in behaviors, first, we analyzed the data on the students’ in‐class behaviors collected during the experiment intervention. Second, we analyzed the data from the follow‐up surveys on the teachers and students about their perception and comments on the working mechanisms.
We focused on two types of in‐class behaviors, namely, class discipline and in‐class attention. Class discipline has an effect on study outcomes. Disruptive behaviors, such as talking without permission and pranking, can not only distract students’ attention but also cause negative externalities on their neighbors and eventually lead to poor academic performance. We collected the students’ disruptive behaviors reported by their teachers. The disruptive behaviors were talking without permission, pranking, reading comic books in class, sleeping in class, and tardiness (late to school). All the behaviors were measured as frequency per week. In‐class attention correlates with class discipline but differs by measuring the effort exerted by a student to study. For example, daydreaming students may not demonstrate any disruptive behavior, but their lack of focus hinders the effectiveness of their study. We collected self‐reported in‐class attention levels from the students. The students reported their attention level as “mostly cannot pay attention,” “not sure,” or “mostly can pay attention.”
Table 9 shows a between‐group comparison of class discipline. Initially, the students in the treatment classes had low frequencies of talking without permission and pranking but also exhibited high frequencies of reading comic books, sleeping in class, and tardiness. No significant between‐group differences in in‐class attention were observed. In the second‐round questionnaire, the between‐group t‐tests indicated that the students in the treatment classes improved their in‐class behaviors in all dimensions because they engaged in fewer disruptive behaviors than the students in control classes. The between‐group differences were significant in talking without permission, pranking, and reading comic books. Although the between‐group differences were not significant in sleeping in class and tardiness, the students in teams had fewer faults than their counterparts in the control classes. Interestingly, the frequency of disruptive behaviors increased in both the treatment classes and control classes. However, the frequency increased faster in the control classes than in the treatment classes. Forming teams seemed to prevent class discipline from deteriorating. The results also showed that the students in the treatment classes had significantly higher attention levels than their counterparts in the control classes.
We conducted a DID analysis of the students’ behavior. Class teachers of both the control classes and treatment classes kept a record of the students’ behavior at the individual level. As we controlled for the changes in the other factors influencing the students’ behaviors, such as individual characteristics and family characteristics, we were able to single out the changes in behaviors induced by teams. Table 10 reports the results. When we controlled for the other factors, we found that being a member of a team reduced the frequencies of talking without permission, pranking, and reading comic books by 1.96, 1.56, and 0.80 times per week, respectively, which accounted for 58 percent, 49 percent, and 81 percent of the corresponding disruptive behaviors reported in the second‐round questionnaire. Joining a team also increased the students’ in‐class attention significantly. The improved academic performance was a result of improved in‐class behaviors, as predicted in the literature on education (Wentzel 1991; Malecki and Elliot 2002).
We conducted a follow‐up survey in the experiment location in spring semester 2021. The survey covered the full sample of 15 class teachers and a 10 percent random sample (80 students) of the students in the treatment classes.22 The randomness of the survey sample was confirmed by the comparison with the treatment sample in Table 11.23 The survey questionnaire is presented in the Appendix below. In the survey of the class teachers, we asked seven questions about their observations on the changes in the students’ behaviors. In addition, four questions were about the general experiment implementation. On a scale of one to five, representing “strongly agree,” “agree,” “neither agree nor disagree,” “disagree,” and “strongly disagree,” the teachers indicated whether they agreed with the statements. Table 12 reports the results. All 15 teachers indicated that they could recollect the experiment, and 93 percent confirmed the strong connections between the team members (Question 1), and 87 percent observed mutual supervision between the students on studying and in‐class discipline (Questions 2 and 3). The effectiveness of the teams was also confirmed by the teachers, as 80 percent believed that forming teams improved the students’ test scores (Question 6).
For the open questions, we invited the teachers to comment on the mechanism that made the teams effective, and 93 percent of the teachers believed that giving students the information on their team performance and setting team appraisals as a common team goal were essential to motivate the students. The teachers also observed intensified competition between the teams to receive positive team appraisals. Moreover, the teachers mentioned that the mechanisms listed in the questionnaire, such as satisfactory in‐class discipline and mutual help, could be helpful in improving the students’ academic performance. A total of ten teachers (67 percent of the sample) mentioned that they appreciated the opportunity to join the experiment and adopted the student teams for all their classes after the experiment ended.
In the follow‐up survey conducted on the students, we asked them to reflect on their behavioral changes during the experiment. The questions were designed to explore the reasons behind the changes. Specifically, we wanted to determine whether the changes occurred owing to self‐policing or social stigma to misbehaving or learning from role models. If the students answered that they changed their behavior because of pressure from other team members from misbehaving (Questions 11 and 12), then social stigma was considered the reason for the behavioral change. If the students answered that they changed their behavior voluntarily (Questions 9 and 10), then self‐policing was considered the reason for the behavioral change. Learning from role models was another potential reasons investigated (Questions 13 and 14). For the statements, students were asked to give their opinion using a scale ranging from one to five, representing “strongly agree,” “agree,” “neither agree nor disagree,” “disagree,” and “strongly disagree,” respectively. Table 13 reports the survey results of the students.
In general, joining teams had a positive effect on the students, as 93 percent considered themselves more responsible (Question 4), and 92 percent considered themselves more cooperative after joining a team (Question 15). The student teams also cultivated an environment for the students to make friends (Question 7). Forming connections with the other team members took time, but the connections were strong once they formed (Questions 1, 2, and 9). In addition, 90 percent of the students indicated that they exerted considerable efforts to study and observe in‐class discipline to obtain positive team appraisals (Question 6). The results matched the class teachers’ observations that the students were motivated to study and behave in class. The high level of performance motivation was reflected in the changes in the students’ personality traits (Judge and Ilies 2002). We tried to distinguish self‐policing from social stigma and learning from role models. Though we received more confirmative answers (strongly agree or agree) on self‐policing and learning from role models than on social stigma, the Kolmogorov–Smirnov tests in Table 13 showed that all three reasons were equally important in changing the students’ behaviors, as the answers were from the same distribution.
The follow‐up surveys confirmed that setting team appraisals as a common team goal functioned as the mechanism to elicit the students’ efforts. Owing to their motivation, the students’ personality traits changed, and they exerted considerable efforts to study and observe in‐class discipline. In psychology theory of group effort, a trigger of an individual’s effort is a shared highly valued team goal (Fishbach, Steinmetz, and Tu 2016). Our observation had a good fit in the theory.
Another potential mechanism that induced changes in academic performance was changes in noncognitive skills. We provide the evidence for the strong correlation between noncognitive skills and cognitive skills in Table 14. In Table 15, we regress the test scores on the treatment effect by controlling for the noncognitive skills. The treatment effect remained significant, which implied the existence of additional mechanisms influencing cognitive skills other than changes in noncognitive skills. In addition, as we observed simultaneous changes in the cognitive and noncognitive skills, determining the causality in our current study was not feasible. Nevertheless, noncognitive skills are a potential mechanism to explore.
In summary, setting team appraisals as a common team goal was the mechanism that motivated the students. The students’ personality traits and in‐class behaviors changed as they became motivated. The improved in‐class behaviors eventually resulted in enhanced academic performance in the treatment classes.
VII. Conclusion
In this study, we investigate the effects of teamwork on the human capital development of elementary school students. The randomization in the field experiment enables us to properly identify the effects of teamwork. Furthermore, we conduct a follow‐up survey to identify the important mechanism behind forming teams.
We find that forming teams, by itself, increases the students’ academic performance by 0.085–0.100 SD, which is salient in the students in lower grades. With respect to noncognitive skills, joining teams increases the students’ scores in conscientiousness, extraversion, and neuroticism and lowers their scores in agreeableness. We observe no significant change in the students’ openness. Based on the literature on psychology as a reference, the results suggest that forming teams provides the students with motivation to perform well, owing to their eagerness to receive positive team appraisals. In the mechanism analysis, we provide evidence that the students in the treatment classes demonstrate better in‐class behaviors than their counterparts in the control classes. The improved academic performance is a result of the efforts to study. In the follow‐up survey, 93 percent of the teachers believe that providing feedback to the students and setting team appraisals as a common team goal are essential to motivate students. In addition, 90 percent of the students indicate that they exerted considerable efforts to study and observe in‐class discipline to receive positive team appraisals. The survey results show that changing behaviors voluntarily (self‐policing), pressure from teammates (social stigma), and learning from role models are the reasons behind the students’ behavioral changes.
Overall, forming teams is a cost‐effective way to provide incentives that can be easily scaled up in resource‐strained environments. Human capital, that is, cognitive and noncognitive skills, can be improved through teamwork. In future study, exploring the effective ways to organize teams would be interesting.
Multiple extensions can be conducted based on our current findings. Determining the effects of including academic activities, such as cooperative learning, on the effectiveness of student teams would also be interesting. Investigations on the mechanisms, such as determining the influence of changes in noncognitive skills on changes in cognitive skills, can provide a new avenue for conducting behavior intervention.
Appendix
Follow‐up Survey
We perform a runs test on the selected sample. The results are presented in Appendix Table A1. No evidence of violation of the randomness was observed.
In the survey on the class teachers, we asked them seven questions about their observations on the changes in the students’ behaviors. In addition, four questions were about the general experiment implementation. The questions are as follows:
I observed a close connection between the student team members.
I observed mutual help for studying among the student team members.
I observed mutual supervision on studying among the student team members.
I observed mutual supervision on in‐class discipline among the student team members.
I observed strengthened awareness of competition among the students.
The student teams were helpful in improving academic performance.
The student teams were helpful in maintaining in‐class discipline.
I paid more attention to the treatment classes than to the control classes during the experiment.
I made a separate curriculum for treatment classes.
I adopted a different teaching pedagogy for the treatment classes.
I organized team activities in the control classes during the experiment.
In the survey on the students, we instructed them to reflect on their behavioral changes during the experiment. The questions are as follow:
After being assigned to a team, it took me some time to adjust to the team activities.
After being assigned to a team, it took me some time to make friends with the other team members.
If possible, I would like to have the team activities continue after the experiment.
After being assigned to a team, I believed that I should be responsible for the team appraisals.
After being assigned to a team, I considered whether my actions would affect the team appraisals.
After being assigned to a team, I exerted efforts to receive positive team appraisals.
The student teams provided me with an environment to make friends easily.
I liked interacting with the team members even after school.
After being assigned to a team, I studied hard voluntarily.
After being assigned to a team, I observed in‐class discipline voluntarily.
After being assigned to a team, I studied hard because the team members would put pressure on me if I did not do so.
After being assigned to a team, I observed in‐class discipline because the team members would put pressure on me if I did not do so.
After being assigned to a team, I studied hard because one of the team members was my role model.
After being assigned to a team, I observed in‐class discipline because one of the team members was my role model.
My awareness of cooperation increased after I joined the team.
Footnotes
↵1. Peer effects were extensively studied in the literature and were found in different stages of education and types of groups (Burke and Sass 2013; Jain and Kapoor 2015; Booij, Leuven, and Oosterbeek 2017; Feld and Zölitz 2017). Ding and Lehrer (2007), Carman and Zhang (2012), and Li et al. (2014) provided evidence on peer effects in Chinese high schools, middle schools, and elementary schools, respectively. Peer effects can be found in academic activities (Carrell, Fullerton, and West 2009; Lu and Anderson 2015; Li, Mak, and Wang 2019) and nonacademic activities (Lavy and Sand 2019). See Epple and Romano (2011) and Sacerdote (2011) for detailed reviews on peer effects.
↵2. In a growing literature, noncognitive skills have become an important component of the definition of human capital explicitly (Currie and Almond 2011) and implicitly (Cunha and Heckman 2007).
↵3. Daily duty tasks are greeting teachers and classmates in the morning, cleaning classrooms, and organizing exercises during class intermissions.
↵4. The experimental design emphasized randomness as much as possible. However, individual and family characteristics, in which absolute randomness cannot be achieved, have substantial effects on students’ human capital. Compared with a direct between‐group comparison, the DID method can help us properly control such factors.
↵5. In our sample, all the participating schools had classes in the treatment and control groups. The students in the treatment groups had slightly higher scores in neuroticism and lower scores in agreeableness than those in the control groups. These between‐group differences increased with the introduction of forming teams; thus, the initial difference in neuroticism and agreeableness did not bias our estimation of the other traits.
↵6. Slavin (1980), Johnson and Johnson (2002), Hänze and Berger (2007), and Drakeford (2012), among other scholars, proposed that study teams can help improve the students’ learning attitudes, school participation, academic performance, and social skills.
↵7. In Li et al. (2014), the pecuniary incentive for teams is RMB 200 (approximately USD 29) per class. In the work of Blimpo (2014), the individual incentive varies from USD 10 to USD 30.
↵8. We did not observe any class transfers during the experiment period.
↵9. After‐school classes are uncommon in rural China. In our sample, fewer than 1 percent (11 out of 1,589) of the students were enrolled in after‐school classes.
↵10. Section III provides detailed information on the seating arrangement. We are aware of the potential endogeneity between the students’ academic performance and height (Case and Paxson 2008; Vogl 2014) and thus controlled for the students’ height in the analysis. We found that possible endogeneity did not bias our results. Another potential endogeneity is that people may feel more comfortable communicating with people of a similar height. However, as the seating arrangement in the control classes was also based on height, the students in the treatment classes would have been assigned to the same rows if they were in the control classes. As the question in this study is whether teams are effective in improving students’ human capital, the current experimental design can properly identify the effects of teamwork because the same seating rules were applied to the treatment classes and control classes.
↵11. In Chinese elementary schools, the autumn semester starts on September 1 and ends around January 15.
↵12. Online Appendix A provides the cooperation agreement.
↵13. We excluded first‐ and second‐grade students because their literacy and comprehension capabilities may be insufficient for the completion of the questionnaires. We also excluded sixth‐grade students because they were in transition to middle school, which could cause difficulties in follow‐up survey.
↵14. Male students were overrepresented in our sample for two possible reasons. First, the county where we conducted the experiment had an extremely high percentage of people working as migrant workers in urban areas. As parents move to urban areas, they are likely to bring their daughters with them for safety concerns. Therefore, boys may be overrepresented among the children left behind. Second, biased birth gender favoritism is more severe in rural areas than in urban areas. Such favoritism would result in a higher percentage of boys in the population.
↵15. Chinese, mathematics, and English are the three main subjects in Chinese elementary schools. The students in the participating schools do not study English before third grade; thus, we exclude the English scores from the baseline measurement for the third‐grade students.
↵16. Tall students may block the view of short students sitting behind them. To solve this problem, we teamed up the students according to height. This arrangement may sacrifice a certain level of randomness in the seating arrangement but was a practical necessity. To minimize the potential endogeneity between the students’ height and academic achievement, we controlled for seating arrangement and height in the empirical analysis.
↵17. Although team activities, such as cooperative learning, which requires students to undertake a certain level of teaching, can increase interaction among students, such activities can also alter the teachers’ behaviors in the experimental classes. To identify the specific effects of teams, we limited the team activities to those that did not alter the teachers’ behaviors.
↵18. We used three standardized scores in our analysis, that is, standardized final exam scores, standardized midterm exam scores, and standardized baseline scores. For each examination, we first took the average on the scores of three subjects. Then the standardization was implemented on the averaged scores. Only the scores in Chinese and mathematics were used to construct the baseline scores of the third grade.
↵19. As we did not ask the students to answer the questionnaire before their midterm examinations to update their individual and family characteristics, we used the information collected in the first‐round questionnaire.
↵20. Finite sample inference p‐values were computed with the Stata command ritest (Hess 2017). We took 1,000 permutations on the variable “treatment × post.” The permutation was implemented at the school–grade level in order to keep the pairwise data structure.
↵22. All 15 class teachers answered the survey and 73 out of 80 students (attrition rate 9 percent) answered the survey.
↵23. Appendix Table A1 provides additional evidence on the randomness of the survey sample.
- Received January 2021.
- Accepted February 2022.
This open access article is distributed under the terms of the CC‐BY‐NC‐ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) and is freely available online at: https://jhr.uwpress.org.