Abstract
This paper studies how college majors are chosen, focusing on the underlying gender gap. I collect a data set of Northwestern University sophomores that contains their subjective expectations about choice-specific c outcomes, and estimate a model where majors are chosen under uncertainty. Enjoying coursework and gaining parents’ approval are the most important determinants in the choice for both genders. However, males and females differ in their preferences in the workplace, with males caring about the pecuniary outcomes in the workplace much more than females. The gender gap is mainly due to gender differences in preferences and tastes, and not because females are underconfident about their academic ability or fear monetary discrimination. The findings in this paper make a case for policies that change attitudes toward gender roles.
I. Introduction
The difference in choice of college majors between males and females is quite dramatic. In 1999–2000, among recipients of bachelor’s degrees in the United States, 13 percent of women majored in education compared to 4 percent of men, and only 2 percent of women majored in engineering compared to 12 percent of men (2001 Baccalaureate and Beyond Longitudinal Study). Figure 1 highlights the differences in gender composition of undergraduate majors of 1999–2000 bachelor’s degree recipients (see also Turner and Bowen 1999; Dey and Hill 2007).
Gender Composition of Majors of 1999–2000 Bachelor’s Degree Recipients Employed Full-Time in 2001.
These markedly different choices in college major between males and females have significant economic and social impacts. Figure 2 shows that large earnings premiums exist across majors. For example, in 2000–2001, a year after graduation in the United States, the average education major employed full-time earned only 60 percent as much as one who majored in engineering (also see Garman and Loury 1995; Arcidiacono 2004, for a discussion of earnings differences across majors). Paglin and Rufolo (1990) and Brown and Corcoran (1997) find that differences in major account for a substantial part of the gender gap in the earnings of individuals with several years of college education. Moreover, Xie and Shauman (2003) show that, controlling for major, the gap between men and women in their likelihood of pursuing graduate degrees and careers in science and engineering is smaller. The gender differences in choice of major have recently been at the center of hot debate on the reasons behind women’s underrepresentation in science and engineering (Barres 2006).
Average income of 1999–2000 Bachelor’s Degree Recipients Employed Full-Time in 2001 by Major.
There are at least two plausible explanations for these differences. First, innately disparate abilities between males and females may predispose each group to choose different fields (Kimura 1999). However, studies of mathematically gifted individuals reveal differences in choices across gender, even for very talented individuals (Lubinski and Benbow 1992). Moreover, the gender gap in mathematics achievement and aptitude is small and declining (Xie and Shauman 2003; Goldin et al. 2006), and gender differences in mathematical achievement cannot explain the higher relative likelihood of majoring in sciences and engineering for males (Turner and Bowen 1999; Xie and Shauman 2003). These studies suggest gender differences in preferences and / or beliefs as a second possible explanation for the gender gap in the choice of major.1 However, no systematic attempt has been made to study these preferences and beliefs. This paper attempts to fill this void.
In this paper, I estimate a choice model of college major in order to understand how undergraduates choose college majors, and to explain the underlying gender differences. The choice of major is treated as a decision made under uncertainty–uncertainty about personal tastes, individual abilities, and realizations of outcomes related to choice of major. Such outcomes may include the associated economic returns and lifestyle as well as the successful completion of major. My choice model is motivated by the theoretical model outlined in Altonji (1993), which treats education as a choice made under uncertainty. I, however, do not model the choice of college. In addition, because I do not have data needed to estimate a dynamic model, I assume that individuals maximize current expected utility, and estimate a static choice model.2
The standard economic literature on decisions made under uncertainty generally assumes that individuals, after comparing the expected outcomes from various choices, choose the option that maximizes their expected utility. Given the choice data, the goal is to infer the parameters of the utility function. Since expectations about the choice-specific outcomes are unknown, the literature makes assumptions on expectations to infer the decision rules. This approach does not allow for the possibility that subjective expectations may be different from objective measures, assumes that formation of expectations is homogeneous, and makes nonverifiable assumptions on expectations-formation rules. Since observed choices might be consistent with several combinations of expectations and preferences, and the list of underlying assumptions may not be valid (Manski 1993), this is problematic. The solution to this identification problem is to use additional data on expectations, and that is precisely what I do.
I have designed and conducted a survey to elicit subjective expectations from 161 Northwestern University sophomores regarding choice of major. Though Northwestern University is a selective institution, the interest in understanding gender differences in major choice is driven by the underrepresentation of women in science and engineering; since individuals attending elite universities are more likely to make it to the higher echelons of science and engineering, I believe that Northwestern University is an appropriate setting to explore these issues.
In contrast to most studies on schooling choices that ignore uncertainty (exceptions include Bamberger 1986 and Arcidiacono 2004), I estimate a random utility model of college major choice allowing for heterogeneity in beliefs. My approach also differs from the existing literature by accounting for the nonpecuniary aspects of the choice. Though the importance of nonprice determinants in the choice of majors has been highlighted in a few studies (Fiorito and Dauffenbach 1982; Easterlin 1995; Weinberger 2004), few studies have jointly modeled the pecuniary and nonpecuniary determinants of the choice (Arcidiacono 2004; Beffy, Fougere, and Maurel 2012).3 The approach in this paper allows me to quantify the contributions of both pecuniary and nonpecuniary outcomes to the choice. Moreover, the model is rich enough to explain gender differences in choices. Estimation of the choice model reveals that the most important outcomes in the choice of major are enjoying coursework, enjoying work at potential jobs, and gaining the approval of parents. Nonpecuniary outcomes explain about half of the choice behavior for males and more than three-fourths of the choice for females. Males and females have similar preferences at college, but differ in their preferences regarding the workplace: nonpecuniary outcomes in the workplace (enjoying work at the jobs, and reconciling work and family) matter a lot more to females than males.
On the methodology side, this paper adds to the recent literature on subjective expectations (see Manski 2004 for an overview of this literature). In the last decade or so, economists have increasingly undertaken the task of collecting and describing subjective data. Studies have shown that subjective data tend to be good predictors of behavior (Euwals et al. 1998; Hurd et al. 2004). Recently, expectations data have been employed to estimate decision models (van der Klaauw and Wolpin 2008; Delavande 2008; van der Klaauw 2012). The choice model used in this paper is motivated by the framework in Delavande (2008), which estimates a model of birth control choice for women. This paper contributes to this literature by providing an extensive description of students’ expectations about major-specific outcomes, and by using subjective expectations data to estimate a choice model. One recent paper that is closely related to the methodology used in this study is Arcidiacono, Hotz, and Kang (2012).4 In their sample of Duke University male undergraduates, they find that ability and earnings are the main determinants of college major choice.
Finally, this paper is related to the literature that focuses on the underlying reasons for the gender gap in science and engineering. For policy interventions, an important question is whether gender differences in choices are driven by differences in preferences or in beliefs. For example, if the gender gap existed because of gender differences in beliefs about ability and self-confidence, then policy interventions like single-sex classes could possibly reduce the gap. On the other hand, if gender differences in preferences explained the gender gap in college major choice, such a policy may not be very effective. Existing studies on schooling choices have either focused on gender differences in preferences (Daymont and Andrisani 1984), or gender differences in beliefs (Valian 1998; Weinberger 2004), but not both. The framework developed in this paper makes a clear distinction between preferences and beliefs.5 This allows me to decompose the gender gap in major choice into differences in beliefs and differences in preferences. First, I find that gender differences in beliefs about ability constitute a small and insignificant part of the gap. This implies that explanations based entirely on the assumption that women have lower self-confidence relative to men (Long 1986) can be rejected in my data. Gender differences in beliefs about earnings, and about reconciling work and family in engineering explain about 1 percent of the gap. Second, the majority of the gender gap in majors that I consider can be explained by gender differences in beliefs about enjoying studying different fields, and differences in preferences. I simulate an environment in which the female subjective belief distribution about ability and future earnings is replaced with that of males; in the case of engineering, this reduces the gap by about 15 percent only (opposed to a reduction in the gap of about 50 percent if the simulation is done on beliefs about enjoying coursework). These results suggest that simply raising expectations for women in science, as claimed by Valian (1998), may not be enough, and that wage discrimination and underconfidence with regard to academic ability may not be the main reasons why women are less likely to major in science and engineering. Instead, changing females’ preferences may reduce this gap. The results in this paper would make a case for policy interventions that change attitudes about gender roles. One possible way to accomplish that is to implement policies that encourage females to enter science and engineering academia, since there is evidence that indicates that gender differences in preferences are mutable (Spencer et al. 1999), and that female professors may change female students’ beliefs and preferences toward science and engineering (Carrell, Page, and West 2010).
The paper is organized as follows. Section II outlines the choice model and the identification strategy. Section III describes the institutional setup of Weinberg College of Arts & Sciences, outlines the data collection methodology, and briefly describes the subjective data. Section IV outlines the econometric framework used for estimation. Section V presents the estimation results for the choice model. Section VI conducts analysis to understand the sources of gender differences in choice of major. Finally, Section VII discusses policy implications of the results and concludes.
II. Choice Model
At time t, individual i is confronted with the decision to choose a college major from her choice set Ci. Individuals are forward-looking, and their choice depends not only on the current state of the world but also on what they expect will happen in the future. Individual i derives utility Uit(a,c,Xit) from choosing a college major. Utility is a function of a vector of outcomes a that are realized in college, a vector of outcomes c that are realized after graduating from college, and individual characteristics Xit. I assume the vector a includes the outcomes:
a1 successfully completing (graduating) a field of study in four years
a2 graduating with a GPA of at least 3.5 in the field of study
a3 enjoying the coursework
a4 hours per week spent on the coursework
a5 parents approve of the major
while the vector c consists of:
c1 get an acceptable job immediately upon graduation
c2 enjoy working at the jobs available after graduation
c3 able to reconcile work and family at the available jobs
c4 hours per week spent working at the available jobs
c5 social status of the available jobs
c6 income at the available jobs
The rationale for choosing this particular set of outcomes is as follows: a1 and a2 are outcomes that capture ability in college, an important determinant of major choice (Altonji 1993; Arcidiacono 2004). Outcomes a4 and c4 are measures of effort an individual may have to exert in college and in the workplace, respectively. These are included since effort is an important input in the (education) production function (Stinebrickner and Stinebrickner 2008b), and because there is a tradeoff between leisure and effort. Outcomes a3 and c2 capture tastes for the environment and work in college and in the workplace, respectively. The motivation for including these is to directly incorporate nonprice aspects of major choice, which have been acknowledged to be important determinants of major choice (Arcidiacono 2004; Weinberger 2004). a5 is included because of anecdotal evidence that suggests that parents drive their children to choose certain majors (Lewin 2002). c1 and c6 are included to capture the monetary motivations for choosing a major (Bamberger 1986; Altonji 1993). c5 captures the social signaling value of choosing a particular major (Bagwell and Bernheim 1996). Finally, c3 is included because of gender differences in preferences for work flexibility, with women preferring careers in which labor force interruptions and reduced hours of work are less costly (Polacheck 1981; Bertrand et al. 2010).6
Both vectors, a and c, are uncertain at time t; individual i possesses subjective beliefs Pikt(a,c) about the outcomes associated with choice of major k for all k ∈ Ci.7 If an individual chooses major m, then standard revealed preference argument (assuming that indifference between alternatives occurs with zero probability) implies that:
(1)
The goal is to infer the preference parameters from observed choices. However, the expectations of the individual about the choice-specific outcomes are also unknown. One could infer the preference parameters by imposing assumptions on expectations. This is problematic because we do not know how individuals form expectations, particularly in the context of schooling choices. The information-processing rule has varied considerably among studies of schooling behavior, and most assume that individuals form their expectations in the same way.8 First, there is little reason to think that individuals form their expectations in the same way. Second, different combinations of preferences and expectations may lead to the same choice (Manski 2002). To illustrate this, let us assume that only two majors exist. Let us further assume that it is easier to get a college degree in the first major, but that it offers lower-paying jobs relative to the second major. An individual choosing the first major is consistent with the following two underlying states of the world: (1) she cares only about getting a college degree, or (2) she values only the job prospects but wrongly believes that the first major will get her a high-paying job. If one observes only the choice, then clearly one cannot discriminate between the two possibilities. To cope with the problem of joint inference on preferences and expectations, I elicit subjective probabilities directly from individuals (that is, Pikt(a,c) ∀k ∈ Ci, are elicited directly from the respondent). An additional advantage of this approach is that it allows me to account for the nonpecuniary determinants of the choice, data that are generally not available otherwise.
The exact utility specification is outlined in Section IV, which presents the econometric framework. I first describe the data collection methodology.
III. Data
To estimate the model of college major choice, one needs to elicit the subjective beliefs about the outcomes associated with a major, Pikt(a, c), for each major (∀k ∈ Ci) in individual i’s choice set. Since the range of majors available to students and institutional details vary considerably across colleges, one standard survey cannot be used to collect data in different settings. Therefore, in this paper, as a first step toward understanding how college majors are chosen and what explains the underlying gender differences, I focus on Northwestern University. I collect data on 161 Northwestern University students. This section describes the institutional details at Northwestern, the data collection method, and the nature of the subjective data.
A. Institutional Details
For the purposes of this study, I focus on students who are in the process of choosing a major but have not necessarily chosen one. There are several reasons for this criteria: Students who are in the process of choosing a major are actively thinking about the occurrence of outcomes associated with the major, and hence their responses to subjective questions related to the choice of major are more likely to be meaningful. Second, interviewing students who have already chosen their major raises the issue of cognitive dissonance (Festinger 1957). More specifically, students who have already chosen their major could rationalize their choice of major by devaluing their beliefs for outcomes associated with the majors they considered but rejected, and upgrading their beliefs for outcomes associated with the major that they chose. This systematic measurement error in elicited subjective beliefs would be problematic, and plugging in such beliefs in Equation 1 would result in biased estimates of the preference parameters (see Zafar 2011a, for a discussion of this). Northwestern University requires students to declare their major by the end of their sophomore year. Surveying juniors and seniors would exacerbate issues arising from cognitive dissonance. On the other hand, freshmen may have little idea of what major they want to pursue when they first arrive in college, and may not have thought about the likelihood of the various outcomes conditional on the choice of major (that is, their beliefs may have greater measurement error). Therefore, in order to minimize the above-mentioned biases, I restrict my sample to Northwestern University sophomores.
The study is further restricted to schools at Northwestern University that accord students flexibility in choosing a major. For example, a student in the School of Journalism has to declare her major at the time of admission and can change her major only by a special request to the school. For such a student, the choice of college and major is jointly determined. Since, I model the choice of major conditional on deciding to attend Northwestern University, such students are not eligible for the study. I further assume that the choice set for an individual is exogenous. This eliminates students in smaller schools at Northwestern since this assumption would have to be relaxed for them. Therefore, I restrict the study to the Weinberg College of Arts & Sciences (WCAS) at Northwestern. All sophomores with at least one major in the WCAS were eligible for the study.9
1. Choice Set
At the time of the study, WCAS offered a total of 41 majors. To estimate the choice model, one needs to elicit the subjective probabilities of the outcomes for each major in one’s choice set (that is, for the major that the individual is pursuing, as well as for all the other majors in the individual’s choice set). In order to limit the size of the choice set, I pool similar majors together. Table 1 shows the majors divided into various categories. Categories a through g span the majors offered in WCAS. Categories h through l span undergraduate majors offered by other schools at Northwestern University. There is a tradeoff between the number of categories and the length of the survey. This categorization is fairly fine and also seems reasonable. Also, since degree completion rates are very high at Northwestern University with more than 95 percent of students completing a bachelor’s degree in six years or fewer, it is assumed that dropping out of college is not an element of the choice set.
List of Majors
For a student pursuing a single major in WCAS, it is assumed that her choice set includes all the categories that span WCAS majors (a–g), and category k (that is, the majors offered in the School of Engineering); this was done precisely to elicit subjective beliefs about the outcomes associated with majoring in engineering. Therefore, any student with a single major is assumed to have eight categories in her choice set.10
B. Data Collection
A sample of eligible sophomores and their email addresses was provided by the Northwestern Office of the Registrar. Students were recruited by email, and flyers were posted on campus.11 The emails and flyers explicitly asked for sophomores with an intended major in WCAS. Prospective participants were told that the survey was about the choice of college majors and that they would get $10 for completing the 45-minute electronic survey. It was emphasized that students need not have declared their majors to participate in the study. The survey was conducted from November 2006 to February 2007, which corresponds to the first half of the students’ sophomore year. Respondents were required to come to the Kellogg Experimental Laboratory to take the electronic survey.
A total of 161 WCAS sophomores were surveyed, of whom 92 were females. Table 2 shows how the characteristics of the sample compare with those of the sophomore class. The first column shows that the sample looks similar to the population in most aspects. However, a few differences stand out: (1) students of Asian ethnicity are overrepresented in my sample; (2) 56 percent of the survey takers had declared their major, whereas the corresponding number for the sophomore population was 47 percent;12 and (3) it seems that survey takers, especially male students, have higher GPAs than their population counterparts. Since the focus of the paper is on gender differences, and both males and females in my sample have higher GPAs on average, this should not bias the results in any obvious way. It is also clear that, as shown in Columns 2 and 3 of the table, male and female respondents in the sample are fairly similar along the various dimensions.13
Sample Characteristics
Table A1 in the Appendix shows that the distribution of (intended) WCAS majors for sample respondents is similar to that of an earlier cohort. A few notable patterns stand out in the table: The proportion of males who (intend to) major in social sciences II (which includes economics) is twice the corresponding proportion of women both in my sample and in the graduating class of 2006. This pattern is reversed in the case of Social Sciences I and Literature and Fine Arts. The proportion of females who (intend to) major in literature and fine arts is more than three times the corresponding proportion of males.
In addition to stating their (intended) choice, respondents were also asked to rank the majors in their choice set. The exact question was: “Put yourself in the hypothetical situation where you have not yet chosen a field of study to major in. Rank the following fields of study according to how likely you think you will major in that field of study.” Majors were ranked on a 1–9 scale, where 9 denotes the most preferred major. Table A2 shows that the mean ranking of majors corresponds favorably to the distribution of (intended) majors in the sample (shown in Table A1). For example, Social Sciences I and Social Sciences II receive the highest mean ranking, and also are the most common majors in the sample; similarly, engineering, math, and computer science receive the lowest mean rank and are also the least commonly chosen majors in the sample. Table A2 also highlights the gender differences in major rankings. Consistent with gender differences in major choice documented in the literature as well at those in the sample (Table A1), male respondents, on average, rank math and computer science, social sciences II, and engineering significantly higher than their female counterparts, while the reverse is true for area studies and literature and fine arts. The large standard deviations indicate that there is substantial heterogeneity in our sample in preferences over these majors. However, the preference ranking data seems to be congruent with stated choice data: in fact, for 151 of the 161 respondents, the stated (intended) major is reported as being the most or second most preferred major by the student.
The 45-minute survey collected demographic and background information (including parents’ and siblings’ occupations and college majors, source of college funding, etc.), and data relevant for the estimation of the choice model.
C. Subjective Data
The subjective beliefs, Pikt(a, c) ∀k ∈ Ci, are elicited directly from the respondent for the set of outcomes in vectors a and c, described in Section II. Note that {ar}r={1,2,3,5} and {cq}q={1,2,3} are binary, while outcomes a4 and {cq}q={4,5,6} are continuous. For each major in the individual’s choice set, the survey elicited the probability of the occurrence of the binary outcomes, that is, Pikt(ar = 1) for r = {1, 2, 3, 5}. and Pikt(cq = 1) for q = {1, 2, 3}. Expected value was elicited for the continuous outcomes, that is, Eikt(a4) and Eikt(cq) for q = {4, 6}.14
Questions eliciting the subjective probabilities of major-specific outcomes are based on the use of percentages. An advantage of asking probabilistic questions relative to approaches that employ a Likert-scale or a simple binary response (yes / no; true / false) is that responses are interpersonally comparable, more informative, and allow the respondent to express uncertainty (Juster 1966; Manski 2004).15 As is standard in studies that collect subjective data, a short introduction (similar to the one in Delavande 2008) was read and handed to the respondents at the start of the survey. Respondents had to answer two practice questions before starting the survey to make sure they understood how to answer questions based on the use of percentages. To give a sense of how probabilistic beliefs were elicited, the following question was used to elicit the belief for the binary outcome a2:
“If you were majoring in [X], what do you think is the percent chance that you will graduate with a GPA of at least 3.5 (on a scale of 4)?”16
Wording for the question that elicited expected income was similar to that in Dominitz and Manski (1996). In addition, the subjective beliefs of being active in the full-time labor force at the age of 30 and 40, and E(Y0), the expected income at the age of 30 if one were to drop out of college were also elicited. The questions eliciting beliefs about major-specific outcomes can be viewed in Section A1 of the Appendix. The 15 questions that elicit beliefs about major-specific outcomes were asked for each major category in the student’s choice set.
D. The Data
The use of subjective data in economics is fairly recent, and there is a genuine interest in a descriptive analysis of such data. However, since the main goal of the paper is to understand how expectations combine with preferences to explain the underlying gender gap in college majors, I do not undertake the task of describing the data in detail here. Interested readers are referred to Zafar (2011a), which analyzes the data in detail. Employing the data used in this paper as well as data from a followup survey administered to a sample of the same respondents, it analyzes the extent to which various cognitive biases (cognitive dissonance, insufficient mental effort, undefined expectations) may affect subjective data and overall finds little evidence of such biases. Given the large proportion of respondents who had already declared their major at the time of the survey, a bias that would be of particular concern is cognitive dissonance / ex-post rationalization (Bertrand and Mullainathan 2001; Bound, Brown, and Mathiowetz 2001; Benitez-Silva et al. 2004). For such respondents, this bias would imply that they report expectations that rationalize their choice, that is, they upgrade (devalue) beliefs for outcomes associated with their chosen (not chosen) majors. Then, using such data in choice models would yield estimates of preference parameters that are upward-biased. By analyzing revisions in beliefs for the various outcomes across the various majors, Zafar (2011a) does not find systematically different revision patterns for respondents who had declared their major(s) and those who had not. Therefore, endogeneity of beliefs does not seem to be a concern in this setting, where choice of major was easily reversible when students were first surveyed.17
I next describe the responses to two representative questions. Table 3 presents the gender-specific subjective belief distribution of graduating with a GPA of at least 3.5 in engineering and in literature and fine arts. The table shows that respondents use the entire scale from zero to 100. There is substantial heterogeneity in beliefs both within and between genders. About 60 percent of males and only 30 percent of females think that the percent chance of graduating with a GPA of at least 3.5 in engineering is greater than 50 percent. On the other hand, nearly 95 percent of males and 90 percent of females believe that they would be able to graduate with a GPA of at least 3.5 with a probability of more than 0.5 in literature and fine arts. This is consistent with the fact that it’s harder to do well in engineering than in literature and fine arts; average GPA of Northwestern engineering graduates of 2006 was 3.43, while that of literature and fine arts was 3.56. Whereas the gender-specific belief distributions are similar for literature and fine arts, that is not the case for engineering: The male belief distribution of graduating with a GPA of at least 3.5 in engineering first order stochastically dominates the corresponding female distribution, suggesting that females are less confident than men in their ability in engineering.
Percent Chance of Graduating with a GPA of at least 3.5 if Majoring in:
The top panel of Figure 3 presents the gender-specific distribution of beliefs for being able to reconcile work and family in social sciences II, while the lower panel shows the corresponding distributions in engineering. The male and female belief distributions appear similar in the case of social sciences II, suggesting that males and females perceive similar likelihood of being able to reconcile work and family when employed in jobs available in social sciences II. However, in the case of engineering, the male belief distribution of reconciling work and family first order stochastically dominates the corresponding female distribution. For example, about half of the males believe that the probability of being able to reconcile family and work in jobs in engineering is greater than 0.75, while less than 20 percent of females expect that to be the case.
Gender-specific Distributions of Beliefs of Reconciling Work and Family at Jobs
The different gender-specific belief distributions underscore the heterogeneity in beliefs between the two genders, and the substantial heterogeneity in beliefs (both within and between genders) questions the accuracy of restrictions imposed on expectations in the literature. Responses to these questions are consistent with evidence that women tend to be less confident than men (Weinberger 2004; Niederle and Vesterlund 2007), and that women do not choose certain fields because of the lack of flexibility of associated jobs (Bertrand et al. 2010). However, without estimating the choice model, it would be premature to conclude that these gender differences in beliefs explain the gender gap in choice of college major. Moreover, the purpose of estimating a choice model is to quantify the importance of these differences.
IV. Econometric Model
This section outlines the econometric framework. I focus on the model for single major choice only.18
Recall that utility, Uit(a, c, Xit), is a function of a 5 × 1 vector of outcomes a realized in college, a 6 × 1 vector of outcomes c realized after graduating from college, and individual characteristics Xit. The individual maximizes her current subjective expected utility19; she chooses major m at time t if: m = argmaxk∈Ci ∫Uit(a, c, Xit)dPikt(a, c). As explained in section III.C, the outcomes {ar}r={1,2,3,5} and {cq}q={1,2,3} are binary, while outcomes a4, and {cq}q={4,5,6} are continuous. I change the notation slightly and define b to be a 7 × 1 vector of all binary outcomes, that is, b = {a1, a2, a3, a5, c1, c2, c3}, and d to be a 4 × 1 vector of all continuous outcomes, that is, d = {a4, c4, c5, c6}.20 The utility can now be written as a function of outcomes b, d, and characteristics Xit. Since it would be difficult to elicit the joint probability distribution Pikt(b, d), I assume that utility is additively separable in the outcomes:
(2)
where ur(br, Xit) is the utility associated with the binary outcome br for an individual with characteristics Xit, γq(Xit) is a constant for the continuous outcome dq for an individual with characteristics Xit, and εikt is a random term. The utility is the same for all individuals with identical observable characteristics Xit and identical realizations of outcomes (b,d), up to the random term. Equation 1 can now be written as:
An individual i with subjective beliefs {Pikt(br), Pikt(dq)} for r ∈ {1, .., 7}, q ∈ {1, .., 4} and ∀k ∈ Ci chooses major m at time t with probability:
(3)
For the binary outcomes in b, dPimt(br) = Pimt(br = 1) if br = 1, and dPimt(br) = Pimt(br = 0) otherwise, for r ∈ {1, .., 7}. The likelihood of br equalling one (Pikt(br = 1)) is elicited directly from the respondents for ∀r ∈ {1, .., 7} and ∀k ∈ Ci. Then ∫ ur(br, Xit)dPimt(br) is equivalent to Pimt(br = 1)ur(br = 1, Xit) + [1 – Pimt(br = 1)] ur(br = 0, Xit). I define Δur(Xit) ≡ ur(br = 1, Xit) – ur(br = 0, Xit), that is, it is the difference in utility between outcome br happening and not happening for an individual with characteristics Xit. For the continuous outcomes in d, instead of the probability distribution, the expected value of the outcome Eikt(dq) = ∫ dqdPikt(dq) is elicited ∀q ∈ {1, .., 4}.21,22 Then the expected utility that individual i derives from choosing major m at time t is:
(4)
, and
are the parameters to be estimated; Δur(Xit) is the change in utility from the occurrence of outcome br for an individual with characteristics Xit, while γq(Xit) is the parameter in the utility function for the continuous outcome dq for an individual with characteristics Xit.
,
, and Eikt(I) ∀k ∈ Ci are elicited directly from the respondent. In order to ensure strict preferences between choices, {εikt} are assumed to have a continuous distribution. The exact parametric restrictions on the random terms required for identifying the model parameters are discussed next.
IV. Choice Model Estimation
I initially assume that the utility function does not depend on individual characteristics, that is, the utility function for the binary outcomes ur(br, Xi) and the coefficients on continuous outcomes do not depend on individual characteristics, Xit. I also drop the time subscript in the analysis that follows. Then using Equation 4, Equation 3 becomes:
(5)
Under the assumption that the random terms {εikt} are independent for every individual i and choice k, and that they have a Type I extreme value distribution, {εikt – εimt} has a standard logistic distribution. As mentioned earlier, respondents were asked to rank the majors in their choice set in addition to stating their (intended) choice. The stated preference data provide more information that can be used for estimation of the model parameters.23,24 Under the assumptions of standard logit, the probability of any ranking of alternatives can be written as a product of logits. For example, consider the case where an individual’s choice set is {a, b, c, d}. Suppose she ranks the alternatives b, d, c, a from best to worst. Under the assumption that the εik’s are iid and Type I distributed, the probability of observing this preference ordering can be written as the product of the probability of choosing alternative b from {a, b, c, d}, the probability of choosing d from {a, c, d}, and the probability of choosing c from the remaining {a, c}. If Uij = βxij + εij denotes the utility i gets from choosing j for j ∈ {a, b, c, d}, then the probability of observing b ≻ d ≻ c ≻ a is simply (Luce and Suppes 1965):
(6)
This expression follows from the Independence of Irrelevant Alternatives (IIA) property embedded in the Type 1 extreme value distribution assumption. The elicited subjective probabilities, , and elicited expected values,
, described in Section III are used in estimation. The parameters of interest are
and
, and they are identified under these parametric assumptions. Column 1 of Table 4 presents the maximum-likelihood estimates using stated preference data. The relative magnitudes of
show the importance of the binary outcomes in the choice. The difference in utility levels is largest and positive for enjoying coursework. Enjoying work at the jobs and gaining approval of parents are significant determinants of major choice: Both coefficients are about one-half that of enjoying coursework. Graduating with a GPA of at least 3.5 and status of the jobs are also significant determinants in the choice. The difference in utility levels for other binary outcomes is not significantly different from zero. The coefficient on income is positive but not significantly different from zero, suggesting that it is not important in the choice.
Single Major Choice- Estimation of Homogeneous Preferences
Column 2 of Table 4 presents the estimates using stated choice data. As with the case of estimates using preference data (Column 1 of the table), the difference in utility levels is largest and positive for enjoying coursework. As with preference data, gaining approval of parents, enjoying working at the jobs, and social status of the jobs are significant determinants of the choice. However, graduating within four years is now also a significant determinant of the choice. The coefficient on income is negative but not statistically different from zero. In short, the set of outcomes that are significant determinants of the choice are largely similar when using either preference data or choice data.
I next allow the utility function for the binary outcomes and the coefficients on continuous outcomes to vary by one particular individual characteristic of interest—gender.25 Columns 1 and 2 of Table 5 present the maximum-likelihood estimates based on Equation 6 for the male and female subsamples, respectively. For both genders, the difference in utility levels is largest and positive for enjoying coursework. For males, the second most important outcome for males is the social status of the jobs: A unit increase in the status of the jobs changes the utility by as much as a 6 percent increase in the probability of enjoying coursework. Approval of parents is the third most important outcome for males. For females, enjoying work at the jobs is the second most important outcome. Two other important outcomes for females are gaining approval of parents and graduating with a GPA of at least 3.5. Both have a positive coefficient that is about one-third the magnitude of the coefficient on enjoying coursework. The last two columns show the model estimates using choice data. Results are qualitatively similar to those using preference data for male students. However, for females, while outcomes that were significant using preference data—enjoying coursework, parents’ approval, and enjoying work at the jobs—continue to be significant, graduating within four years is now also a significant outcome. In fact, the coefficient on this outcome is twice in magnitude of the coefficient of the next most important outcome, enjoying coursework. Therefore, the positive coefficient on graduating within four years for the full sample (in Column 2 of Table 4) is driven by the female respondents in our sample. Regardless of whether the model is estimated using preference or choice data, the coefficient on income at age 30 is indistinguishable from zero for both genders.
Major Choice- Estimation of Gender-Specific Preferences
In order to get a measure of the magnitude of the estimated parameters, the natural thing would be to do willingness-to-pay calculations, that is, translate the differences in utility levels into the amount of earnings that an individual would be willing to forgo at the age of 30 in order to experience that outcome.26 However, since expected income at age 30 is not significant, the standard errors on such calculations are huge, and the results are not very meaningful. Instead of presenting the willingness-to-pay calculations, I outline a different decomposition method to gain insight into the relative importance of the various outcomes in the choice. For illustration, suppose that Pr(choice = j) = F(Xjβ) and that X includes two variables, X1 and X2. Given the parameter estimates, and
, the contribution of X1 to the choice is defined as:
(7)
where the first term is the average probability of majoring in choice j as predicted by the model, and the second term is the average predicted probability of majoring in j if outcome X1 were not considered. The difference in the two terms is a measure of the importance of X1 in the choice. The relative contribution of X1 to the choice is then RX1 = MX1 / (MX1 + MX2). Multiple parameters can be set to zero simultaneously to quantify their joint contribution to the choice. However, since the model is not linear, generally MX1+X2 ≠ MX1 + MX2. Table 6 presents the results of this decomposition strategy using the estimates from Table 4. Each cell shows the relative contribution (R) of the outcome to the choice; bootstrap standard errors based on 1,000 repetitions are also reported in parentheses in the table. Column 1 in Panel A of Table 6 shows the decomposition results for the pooled sample using estimates based on preference data. Nearly three-fourths of the choice is driven by the nonpecuniary outcomes.27 Once the decomposition is made finer in Panel B, one can see that gaining parents’ approval and enjoying coursework jointly explain about 45 percent of the choice. Pecuniary outcomes associated with college (hours per week spent on coursework, graduating with a GPA of at least 3.5, and graduating in four years) and workplace (finding a job upon graduation, hours per week spent at work, income at the age of 30, and the social status of the jobs) each account for about 20 percent of the choice. Results are qualitatively similar in Column 2 which uses estimates based on choice data: for example, nonpecuniary outcomes explain about 70 percent of the choice now, compared to 74 percent when using preference data.
Decomposition Analysis
Columns 1a and 1b of Table 7 show the decomposition results using the preference-based estimates from the male and female subsamples, respectively. Nonpecuniary outcomes explain nearly 55 percent of the choices for males, but more than 85 percent of the choice for females. Gaining parents’ approval and enjoying coursework are the most important outcomes for both females and males. However, males and females differ in their preferences for outcomes in the workplace: Relative to the nonpecuniary outcomes in the workplace (reconciling work and family, and enjoying work at the job), males value the pecuniary aspects of the workplace substantially more than females do. Pecuniary aspects of the workplace explain as much as four times of the choice for males than do nonpecuniary aspects of the workplace; for females, the two are equally important. The conclusions are similar when we look at the decomposition results based on choice data in Columns 2a and 2b of the table.
Decomposition Analysis, by Gender
Since results are qualitatively similar when using either preference or choice data and because preference ranking data contain more information than simple choice data, in the analysis that follows, I primarily report and discuss results based on preference data.
A. Robustness Checks
A large proportion of respondents had declared their major at the time of the survey. As alluded to in Section III.D, their beliefs are not found to be biased. However, their preferences may still be different because of some (cognitive) biases. Column 3 of Table 4 shows the model estimates for this subsample only, while Column 3 of Table 6 shows the decomposition results based on these estimates. The parameter estimates and decomposition results are virtually identical to those for the full sample, suggesting that the concern of cognitive dissonance does not seem to apply to the sample.28
As an additional robustness check, since the sample is drawn from the College of Arts and Sciences (WCAS), I reestimate the model under the assumption that the choice set includes majors offered in WCAS only, that is, I drop individual-major observations for all non-WCAS majors (including engineering). Model estimates based on preference and choice data, using only WCAS majors as being in the choice set, are reported in Columns 4 and 5 of Table 4. As can be seen, the estimates are similar to those in Columns 1 and 2 of the table, indicating that including non-WCAS majors in the choice set does not bias the model estimates. Decomposition results based on these estimates, reported in Columns 4 and 5 of Table 6, are qualitatively similar to those obtained by keeping the non-WCAS majors in the choice set, reported in Columns 1 and 2 of Table 6. Gender-specific model estimates using the restricted choice set yield decomposition results that are qualitatively similar to those using the unrestricted choice set (comparing Columns 3a and 3b with Columns 1a and 1b respectively in Table 7).
As additional robustness checks, the preference parameters in the model are also estimated by excluding respondents who are pursuing more than one major. The results remain qualitatively the same.
These estimation results are based on the assumption that the random terms {εik} are independent for every individual i and choice k. I relax this assumption and also estimate a mixed logit model which allows for unobserved heterogeneity in preferences for some outcomes (as in Revelt and Train 1998). Though the estimates show that there is substantial heterogeneity in the preferences for several outcomes, the relative magnitudes of the estimates are similar to those here (results available from the author upon request).
1. Understanding the Unimportance of Future Income in the Choice of Majors
While pecuniary aspects of the workplace are a significant determinant of males’ choices, it is rather surprising that income is not significant for either males or females in Table 5.
One possibility is that the variable “social status of the jobs” may be picking up the effect of income. The Spearman correlation between the two variables is significant but only 0.21, and estimating the model excluding social status does not result in the coefficient on income becoming significant for either gender.
Another possible reason for the insignificance of income could be that students are not aware of earnings differences across majors. Table 8 presents the average and median beliefs of the respondents. Individuals majoring in a field may have better information about their chosen field and may have beliefs different from those of individuals not majoring in it; therefore, I split survey responses by whether the respondent intends to major in the category about which the question is asked. Because Northwestern University does not follow its alumni, I use the 2003 average annual salaries for 1993 college graduates from selective colleges in the Baccalaureate & Beyond Longitudinal Study (B&B: 1993 / 2003) for comparison purposes.29 These statistics are presented in Columns 1 and 2 of Table 8. The average and median beliefs of respondents majoring in the field are similar to those who do not major in that field. Survey respondents, both males and females, seem to be aware of income differences across majors (Columns 3–6). However, both report median and average salaries larger than those for the B&B sample.30 Though the descriptive analysis of respondents’ expectations of income in different majors in Table 8 indicates that students are aware of the income differences across majors, the variation in their subjective responses is much larger than in the actual data (for males in particular). This indicates that the insignificance of income might be driven by the noise in the reported expectations. To mitigate this, the model is estimated using the ordinal ranking of income (instead of expected income). The coefficient on (ranked) income is now significant for the males, but continues to be insignificant for females. The 90 percent confidence interval of Rγ4 is [3.8 percent, 29.2 percent] for males. The overall contribution of income and social status, however, does not change since ranked income picks up a substantial part of the contribution of status toward the choice (ranked income and status are highly correlated). Therefore, none of the conclusions change. However, this seems to suggest that income is a significant determinant for males at least.
Expected Annual Earnings at the Age of 30
Another concern might be that the sample contains few students choosing high-paying majors, and that is driving the result. That, however, is not the case: As shown in Table 8, students perceive social sciences II and natural sciences to be the highest paying majors, and Table A1 shows that nearly half of the sample intends to major in either of these categories.
Finally the insignificance of income in the choice could in part be due to the risk-neutrality assumption embedded in the model specification. This assumption was made so that it would suffice to elicit the expected value for the continuous outcomes. In the absence of this assumption, I would have had to elicit multiple points on the subjective income distribution for each major in one’s choice set (as in Dominitz and Manski 1996), which would not have been feasible for the purposes of this study. Because several studies have concluded that women are more risk averse than men in their choices (Eckel and Grossman 2008; Croson and Gneezy 2009), results in the current study regarding gender differences in income preferences could be a consequence of the risk-neutrality assumption. To test for this, I estimate an alternative model where the utility function in Equation 2 is modified such that it continues to be separable in all outcomes, and linear in all outcomes excluding income, but CRRA in income at age 30. To estimate the model, I use the population variation in major-specific earnings.31 Consistent with the literature, the model estimates do show that women are more risk averse than men. Column 6 of Table 6 shows the decomposition results for this alternate specification. 25.8 percent of the choice is now explained by pecuniary attributes, compared to 24.95 percent when using the linear specification, as shown in Column 1 of the table. The estimates are not statistically different. As can be seen in Panel C of the table, the contribution of age 30 income only increases slightly from 3.76 percent to 4.06 percent. Columns 4a and 4b of Table 7 report the gender-specific decomposition results for this alternate model. When compared to the corresponding estimates in Columns 1a and 1b, the relative contribution of pecuniary attributes remains similar. Panel C of the table shows that assuming risk-neutrality understates the relative importance of income at age 30 for women, and overstates them for men. Given the higher estimated concavity of the utility function for women, this should not be surprising. Overall, it seems that assuming risk neutrality does not introduce any significant bias in the model estimates and the relative contribution of the various attributes, and that the main conclusions remain unchanged. Therefore, in the next section which investigates gender differences, I maintain the risk-neutrality assumption and use the estimates obtained from the preference ranking data.
VI. Understanding Gender Differences
Section V shows that males and females primarily differ in their preferences for the workplace outcomes, with females valuing the nonpecuniary aspects of the job (reconciling work and family, and enjoying work) relatively more than males. Section III.D highlights gender differences in beliefs for a few outcomes in certain fields of study. These results would suggest that gender differences in preferences and / or beliefs about reconciling work and family may be one reason why females do not choose engineering—this would make a convincing case for programs that provide support to female students and faculty in STEM (science, technology, engineering, and math) careers to balance work and family in different forms, such as the NIH Science award, the Association for Women in Science Education fellowship, and the Sloan Foundation pretenure leave program. However, since reconciling work and family is one among many determinants of major choice, without undertaking a decomposition analysis, it is not possible to know how important beliefs and preferences about this particular outcome are in explaining the gender differences in college majors. It is also not clear how much of the gender gap in the choice of college majors is driven by differences in preferences and how much is due to differences in distributions of subjective beliefs. This distinction is important, because males and females identical in their preferences will make different career choices if there are gender differences in beliefs about success in different occupations (Breen and Garcia-Penalosa 2002). Therefore, any policy recommendations will depend on whether the gender gap exists because of innate differences or because of social biases and discrimination. For example, if the gender gap existed because of gender differences in beliefs about ability and self-confidence, then policy interventions like single-sex classes could possibly reduce the gap. Similarly, if the gender gap existed because of females underestimating returns to certain lucrative majors (and not because of differences in preferences for the pecuniary returns), then information campaigns targeted at disseminating information about returns to different college majors may lead to a reduction in the gender gap. On the other hand, if differences in preferences are found to be the main contributing factor behind the gender gap, the relevant policy interventions would be less obvious. In this section, I delve into the underlying causes for the gender gap in more detail.
A common way to explore differences between groups (in this case, the two genders) in a linear framework is to express the difference in the average predicted value of the dependent variable as:
where is the average predicted value of the dependent variable,
is a vector of average values of the independent variables, and
is a vector of the estimated coefficients for gender j ∈ {(M)ale,(F)emale}. The first term on the right-hand side is the gender difference in mean levels of the outcome (where Y corresponds to the probability of choosing a major) due to different observable characteristics (in the context of the model, the characteristics, X, correspond to the subjective beliefs), while the second term is the difference due to different effects of the characteristics, that is, the
(this corresponds to the preference parameters estimated in Section V). This technique is attributed to Oaxaca (1973). However, in the current context, the probability of choosing a given major, Y, is nonlinear in βX, which makes the decomposition less straightforward. To identify the contribution of gender differences in specific variables (beliefs) and coefficients (preferences) to the gender gap in major choice, I use a decomposition method proposed by Fairlie (2005). Details of the decomposition are provided in section A2 of the Appendix. For the purposes of the decomposition, I use the parameter estimates shown in Columns 2 and 3 of Table 4. Results of this decomposition are presented in Table 9 for four different majors. The last row of the table shows that both expectations and preferences contribute to the gender gap for all major categories. The contributions of preferences and beliefs to the gap differ by fields. The majority of the gender gap in literature and fine arts and in social sciences II is due to gender differences in beliefs, while gender differences in preferences explain majority of the gap in engineering and in social sciences I.
Decomposition Analysis to explain gender differences
I discuss the decomposition results for engineering in some detail. These results are presented in Columns 1 and 5 of Table 9. The model predicts that, on average, males are nearly twice as likely as females to major in engineering (an average male probability of 0.104 versus 0.045 for females); 60 percent of this gap is due to gender differences in preferences for various outcomes. Moreover, nearly 27 percent of the gap is due to gender differences in beliefs about enjoying coursework. Interestingly, gender differences in beliefs about future earnings, reconciling work and family, and academic ability are insignificant and constitute less than 5 percent of the gap.32 While Section IIID shows that females’ beliefs about ability and reconciling work and family in engineering are less optimistic and quite different than those of males, these different beliefs do not seem to explain any notable part of the gender gap in choice of engineering.
An alternate to this decomposition strategy is to simulate different environments to determine how the gender gap would change under different scenarios. Column 1 of Table 10 shows the gender gap predicted by the model for the various major categories, and Columns 2–6 of the table show how the gender gap would change if the female subjective belief distribution were replaced by those of the males for the various outcomes.33 For example, the purpose of the simulation in Column 2 is to determine how much of the gap is due to females having less self-confidence in their ability (relative to men). I continue to focus the discussion on engineering. The results are in line with those in Table 9. If female expectations about ability were raised to the same level as those of males through some policy intervention, the gender gap in engineering would decrease by less than 14 percent. The gender gap virtually stays the same if female expectations for either future earnings or reconciling work and family were forced to be the same as those of males. Finally, the gender gap decreases by nearly 50 percent if the female beliefs about enjoying coursework in engineering were replaced with those of males.
Simulations of the Change in Gender Gap under Different Environments
If women being less overconfident than men (Niederle and Vesterlund 2007, and references therein) and women being low in self-confidence (Long 1986; Valian 1998) were the main explanations for the underlying gender gap, one would expect gender differences in beliefs about academic ability to be important in explaining the gender difference in major choices. However, Columns 1–4 of Table 9 and Column 2 of Table 10 show that gender differences in beliefs about ability (more precisely, beliefs about graduating in four years, and beliefs of graduating with a GPA of at least 3.5) are insignificant and explain a small part of the gender gap. Therefore, explanations based entirely on the hypothesis that women are underrepresented in sciences and engineering because they have lower self-confidence can be rejected in my data. Another striking observation is that gender differences in beliefs about future earnings and being able to reconcile work and family explain virtually none of the gender gap, suggesting that gender differences in perceptions of wage discrimination in the job and of work-life balance–the main motivation of several fellowships targeted toward females–are not responsible for the gender gap in fields like engineering.
However, I find that gender differences in beliefs about enjoying coursework in the various fields are significant and explain a large part of the gap. Students may enjoy studying a field if they believe they will do well in it. There is some evidence of this: the Spearman correlation between beliefs of graduating with a GPA of at least 3.5 and of enjoying the field is about 0.7 for females, and 0.6 for males, but well below one. As the estimates in Table 5 show, both graduating with a GPA of at least 3.5 as well as enjoying coursework are significant determinants of major choice, particularly for females—this suggests that enjoying coursework captures something more than simply doing well in the coursework. In section A3 of the Appendix, I show that female students’ beliefs about enjoying coursework and enjoying work at the jobs are related positively to beliefs about the fraction of females taking classes in that field and negatively correlated with the perceptions of females being treated poorly in the jobs. It is not clear how to interpret these correlations: It could be that females prefer fields that value female-specific attributes and where females are treated more favorably (Cejka and Eagly 1999 find that occupations that are female-dominated are those where female-specific attributes are perceived to be essential for success), or it could be that females are treated more favorably at those jobs precisely because those are “female” occupations. Unfortunately, with the available data, its not possible to choose between these competing causal explanations. It is unclear what kind of policy would bring about a change in females’ beliefs about enjoying coursework and enjoying working at the jobs because these gender differences could be a consequence of innate gender differences in attitudes (Baron-Cohen 2003), or due to social biases including discrimination (Valian 1998).34
Table 9 also shows that gender differences in preferences explain bulk of the gender gap in engineering. While preferences are usually taken as primitive and stable in economic analyses (Becker and Stigler 1977), gender differences in them could arise from differences in tastes, as well as gender discrimination. For example, parents who know that females would be discriminated against in male-dominated occupations could try to shape the preferences of their female children so that they are more comfortable in female-dominated occupations (Altonji and Blank 1999). The question of understanding the sources of gender differences in preferences is beyond the scope of this paper.
Overall, these findings suggest that females are less likely to major in engineering not because they are underconfident about their academic ability, low in self-confidence, or fear wage discrimination in the labor market. Instead, it is because they believe that they will not enjoy taking courses in engineering, and because they have different preferences. These results do not directly support policy interventions such as single-sex schools and programs that are targeted to make fields like engineering more flexible in terms of work and life balance. However, one possible channel through which females may start enjoying fields like engineering could be by changing social attitudes and gender stereotypes. For example, Carrell et al. (2010) find that being assigned female professors changes female students’ preferences for math and science (while finding little effect of professor gender on male students’ choices).
VII. Discussion and Conclusion
Gender differences in major choice are extremely complex, and no simple explanation can be provided for them. The analysis presented in this paper attempts to enhance our understanding of these issues. Since little is known about how youth choose college majors and why the observed gender gap exists, I first estimate a model of college major choice with a focus on explaining the gender gap. I find that outcomes most important in choice of major are enjoying coursework, gaining approval of parents, and enjoying work at the jobs. Nonpecuniary determinants explain about half of the choice for males and more than three-fourths of the choice for females. Males and females have similar preferences regarding outcomes at college, but differ in their tastes regarding the workplace. For outcomes in the workplace, nonpecuniary outcomes are valued much more by females.
On the methodology side, this paper shows that elicited subjective data can be used to infer decision rules in environments where expectations are crucial. This is particularly relevant in cases where the goal is to explain group differences in choices under uncertainty and where expectations may differ across groups (in unknown ways).
The paper sheds some light on the reasons for the gender gap in college major choice. Gender differences in beliefs about ability and future earnings are insignificant in explaining the gender gap. A policy intervention that were to raise the expectations of females about ability and future earnings in engineering to the same level as those of males would decrease the gender gap by about only 15 percent. This result has two implications: (1) just raising the expectations of women may not be enough to eradicate the gap, and (2) hypotheses that claim that the gap could be explained by women having low self-esteem and being less overconfident than men can be rejected by my data. Most of the gender gap is due to gender differences in tastes and preferences for various outcomes—simply replacing females’ beliefs about enjoying coursework with those of the males decreases the gender gap in engineering by almost half. Gender differences in beliefs about enjoying coursework as well as in preferences may exist because of differences in tastes or because of gender discrimination. While richer data are needed to answer this question, existing evidence suggests that one possible channel through which this may be accomplished is by female professors acting as role models for female students, and changing their preferences for math and science (Carrell et al. 2010). Therefore, a possible policy implication of the findings in this paper is to encourage policies that increase the representation of females in academic science and engineering, since these female professors may change female students’ beliefs and preferences toward STEM coursework and careers.
The analysis in this paper has some limitations. First, the study is based on data from Northwestern University only. The heterogeneity in subjective expectations underscores the need to elicit similar data at different undergraduate institutions and at a larger scale in order to make policy recommendations. However, since the range of majors available to students and institutional details vary considerably across colleges, this is a challenging task because one cannot simply replicate the survey design employed in this study.35 Second, individuals may find it optimal to experiment with different majors to learn about their ability and match quality (Altonji 1993; Malamud 2010; and Stinebrickner and Stinebrickner 2008a, 2011). Because of insufficient data, this study does not focus on this aspect, assuming instead that individuals maximize current expected utility.36
Finally, more work is needed to understand how students form beliefs. The choice model uses the heterogeneity in subjective responses for identification. For the purposes of the choice model estimation, one does not need to understand the factors driving this heterogeneity in beliefs. But any successful policy intervention to bridge the gender gap in college majors depends on a better understanding of why students have different beliefs and, in particular, why males and females differ in their beliefs for some outcomes. Since progress in understanding how people form and update expectations requires richer longitudinal data, there is limited work in this area. Zafar (2011b), using a panel of subjective beliefs collected from a subsample of students in this study, reaches the conclusion that students have fairly precise beliefs about the various major-specific outcomes and that little updating of beliefs occurs in college (over a one-year period); this suggests that students have well-formed expectations by the time they arrive in college. Similarly, Jacob and Wilder (2011) analyze a panel of expectations of students starting in high school, and argue that policies should be targeted at middle-school students to increase post-secondary enrollment. Hoffmann and Oreopoulos (2009) find that the effect of same-sex instructors on academic achievement is smaller at higher levels of education, suggesting that gender role models may matter more at earlier ages when cognitive and noncognitive abilities are being developed. Xie and Shauman (2003) find that differences in parents’ expectations of their children’s achievement by gender start emerging very early on. All these pieces of evidence indicate that gender differences in beliefs, preferences, and associated gender roles may start developing in earlier years of schooling. Therefore, it may be useful to focus on students’ expectations (and preferences) at earlier stages of their schooling.
Appendix A1 Survey Questionnaire
The following set of questions was asked for EACH of the relevant major categories. For example, the questions below were asked for the category of Natural Sciences.
Q1. If you were majoring in natural sciences, what would be your most likely major?
Q2. If you were majoring in natural sciences, what do you think is the percent chance that you will successfully complete this major in four years (from the time that you started college)? (Successfully complete means to complete a bachelors)
NOTE: In answering these questions fully place yourself in the (possibly) hypothetical situation. For example, for this question, your answer should be the percent chance that you think you will successfully complete your major in natural sciences in four years IF you were (FORCED) to major in it.
Q3. If you were majoring in natural sciences, what do you think is the percent chance that you will graduate with a GPA of at least 3.5 (on a scale of 4)?
Q4. If you were majoring in natural sciences nces, what do you think is the percent chance that you will enjoy the coursework?
Q5. If you were majoring in natural sciences, how many hours per week on average do you think you will need to spend on the coursework?
Q6. If you were majoring in natural sciences, what do you think is the percent chance that your parents and other family members would approve of it?
Q7. If you were majoring in natural sciences, what do you think is the percent chance that you could find a job (that you would accept) immediately upon graduation?
Q8. If you obtained a bachelors in natural sciences, what do you think is the percent chance that you will go to graduate school in natural sciences some time in the future?
Q9. What do you think was the average annual starting salary of Northwestern graduates (of 2006) with bachelor’s degrees in natural sciences?
Now look ahead to when you will be 30 YEARS OLD. Think about the kinds of jobs that will be available for you and that you will accept if you successfully graduate in natural sciences.
NOTE that there are some jobs that you can get irrespective of what your field of study is. For example, one could be a janitor irrespective of their field of study. However, one could not get into medical school (and hence become a doctor) if they were to major in journalism.
Your answers SHOULD take into account whether you think you would get some kind of advanced degree after your bachelors if you majored in natural sciences.
Q10. What kind of jobs are you thinking of?
Q11. Look ahead to when you will be 30 YEARS OLD. If you majored in natural sciences, what do you think is the percent chance that you will enjoy working at the kinds of jobs that will be available to you?
Q12. Look ahead to when you will be 30 YEARS OLD. If you majored in natural sciences, what do you think is the percent chance that you will be able to reconcile work and your social life / family at the kinds of jobs that will be available to you?
Q13. Look ahead to when you will be 30 YEARS OLD. If you majored in natural sciences, how many hours per week on average do you think you will need to spend working at the kinds of jobs that will be available to you?
When answering the next two questions, please ignore the effects of price inflation on earnings. That is, assume that one dollar today is worth the same as one dollar when you are 30 years old and when you are 40 years old.
Q14. Look ahead to when you will be 30 years old. Think about the kinds of jobs that will be available to you and that you will accept if you graduate in natural sciences. What is the average amount of money that you think you will earn per year by the time you are 30 YEARS OLD?
Q15. Now look ahead to when you will be 40 years old. Think about the kinds of jobs that will be available to you and that you will accept if you graduate in natural sciences. What is the average amount of money that you think you will earn per year by the time you are 40 YEARS OLD?
Appendix A2 Decomposition Algorithm
The goal is to quantify how much of the gender gap in college majors, , is due to differences in observable characteristics (subjective beliefs), that is, X and how much is due to preference parameters, that is,
. The probability of choosing a major, Y, equals F(Xβ) where F(.) is a nonlinear function of beliefs and preference parameters. In that case,
generally does not equal
. The gender difference in this nonlinear case can be written as:
where Nj is the sample size of gender j. The first expression in the square brackets represents part of the gender gap that is due to gender differences in distributions of X (that is, the beliefs), and the second expression represents the part due to differences in the group processes determining levels of Y (that is, the preferences). To identify the contribution of gender differences in specific variables (beliefs) and coefficients (preferences) to the gender gap, I use a decomposition method proposed by Fairlie (2005). Contributions of a single variable / coefficient are calculated by replacing the relevant variable of one gender with that of the other gender sequentially, one by one. For illustration, suppose Yj = F(Xjβj) for j{F, M} and that X includes two variables, X1 and X2 Moreover, let NM = NF = N and assume there exists a natural one-to-one matching of female and male observations. The independent contribution of X1 to the gender gap is given as:
and that of X2 is given as:
Thus, the contribution of a variable to the gap is equal to the change in the average predicted probability from replacing the female distribution with the male distribution of that variable while holding the distributions of the other variable constant. One important thing to note is that, unlike in the linear case, the independent contributions of X1 and X2 depend on the value of the other variable. Therefore, the order of switching the distributions can be important in calculating the contribution to the gender gap. Similarly, the independent contribution of β1 to the gap is given by:
Distribution of WCAS Majors
Preference Ranking over Majors
and that of β2 is given as:
The illustration above assumes an equal number of observations for females and males. However, the sample has more females than males. Since the decomposition requires one-to-one matching of female and male observations, I use the following simulation process: From the female subsample, I randomly draw 60 samples with the same number of observations as in the male subsample. Then I sort the female and male data by the predicted probabilities and calculate separate decomposition estimates. The mean value of estimates from the separate decompositions is calculated and used to approximate the results from the entire female sample. As in Fairlie (2005), standard errors are approximated using the delta method.
Appendix A3 Beliefs about Enjoying Coursework and Work
In a quest to understand why females are less likely to enjoy studying and working in fields like engineering, in a followup survey (taken by 117 of the 161 original survey takers), respondents were asked their beliefs about each gender being treated poorly at the jobs that would be available in the different major categories. The question was worded as follows: “What do you think is the percent chance that X (where X = {Male, Female}) would be treated poorly in jobs that are available in each of the following fields?”
Columns 1 and 2 of Table A3 report the fraction of females that survey respondents believe take classes in the various majors. Column 3 reports the average number of females who graduated in the various majors in 2005 and 2006 (source: IPEDS 2005 and IPEDS 2006). Survey respondents seem to be well-informed about the relative fraction of females in the various majors. The responses to the question about males and females being treated poorly are shown in Columns 4–7 of Table A3. Several notable patterns stand out: First, male respondents believe that females are treated more poorly than males in jobs in all fields except education, literature & fine arts, and music studies; these three fields correspond to the three most female-dominated fields (in college) as reported by males in Column 1 of the table. Second, females believe that they would be treated more poorly than males at jobs in all fields except Education—the field that females believe has the highest fraction of females. Third, for both the male and female respondents, the largest difference in females being treated poorly relative to males is for engineering, math, and computer sciences—two categories with the lowest fraction of females (as reported by both males and females). Finally, both males and females believe that Education is the category in which males would be treated the worst. There is a significant correlation of −0.35 between females’ beliefs about being treated poorly at the jobs and the fraction of females in the major’s classes.
Perceptions of Monetary and Nonmonetary Discrimination
I reestimate the single-major choice model initially estimated in Section V to check whether the inclusion of the new variables “females treated poorly at the jobs” and “males treated poorly at the jobs” affect the parameter estimates. The estimates for the determinants that were already included in the initial model stay almost the same, while the new variables are insignificant. This is because beliefs of males and females being treated poorly at the jobs are strongly correlated with beliefs about enjoying coursework and enjoying work at the jobs, and, therefore, their impact on the choice is already being captured indirectly. Indeed, the variable “females treated poorly at the jobs” only shows up significantly for females and the entire sample (at the 1 percent and 10 percent level, respectively) in a model that excludes both enjoying coursework and enjoying work at the jobs. Moreover, the new variables do not improve the model’s explanatory power for the entire sample and for females; relative to the initial model, the Wald χ2 (a measure of goodness-of-fit that compares the likelihood ratio chi-squared of the model to one with the null model) does not change by much.
Footnotes
Basit Zafar is an economist in the Research and Statistics group at the Federal Reserve Bank of New York.
↵1. There is another possible explanation for gender differences in schooling choices: Males and females may differ in their skill accumulation for a given level of innate ability or a given schooling level (Bertrand, Goldin, and Katz 2010).
↵2. Under the assumption that individuals maximize current expected utility, one does need to take into account that individuals may find it optimal to experiment with different majors. However, experimentation could be important in this context to learn about one’s ability and match quality (see Malamud 2010; Stinebrickner and Stinebrickner 2008a, 2011). It is beyond the scope of this paper. In the conclusion, I discuss the implications of this assumption for the results regarding gender differences in major choice.
↵3. However, studies that do model the nonpecuniary determinants generally treat the tastes as a black box. Given the approach used in this paper of collecting data on various nonpecuniary motivations (such as reconciling work and family, enjoying coursework, enjoying working at the jobs), I can shed further light on what these tastes actually include. Secondly, identification of tastes in these models generally requires some parametric assumptions on the underlying distribution, and assumes tastes are orthogonal to other components in the model (for example, see Arcidiacono, Hotz, and Kang 2012). I can relax this assumption, since beliefs for pecuniary as well as nonpecuniary determinants are directly observed.
↵4. There are some differences between the two studies. One, their sample only includes male students, and hence they cannot study gender differences in college majors. Second, while I elicit students’ expected earnings conditional on each major, they collect data on students’ expected earnings for major and career combinations. However, in their choice model, they use the weighted average of expected earnings across careers conditional on major, as I implicitly do in this paper. Finally, it should be pointed out—as their paper also mentions explicitly—they use the setup in this paper to inform their survey design.
↵5. Using an approach similar in spirit to that of this paper, Stinebrickner and Stinebrickner (2011) use longitudinal data on subjective beliefs of undergraduates at Berea College to study the role of learning in college major choice, with a particular focus on math and science.
↵6. This is perhaps one of the main motivations behind certain scholarships, such as the one awarded by the Association for Women in Science Education to graduate students in Science who have interrupted their education to raise a family, or the NSF ADVANCE program that supports projects to ensure that women with STEM (science, technology, engineering, and math) degrees consider academia as a viable and attractive career option.
↵7. Though each major has an objective probability for (a, c), there’s no reason to believe that subjective beliefs will be the same as the objective probabilities. I use the terms “beliefs” and “expectations” interchangeably; both refer to beliefs about outcomes that will only be realized in the future.
↵8. For example, Freeman (1971) assumes that income expectation formation of college students is myopic—that is, the youth believe that they will obtain the mean income realized by the members of a specified earlier cohort who made that choice. Arcidiacono (2004), in his dynamic model of college and major choice, makes several assumptions about various outcomes; for example, in his model, all individuals have the same expectations about the probability of working, conditional on sex and major. The list of studies that explicitly (or implicitly) make assumptions about expectations formation is long, and there is no evidence that prevailing expectations assumptions are correct.
↵9. A student could have a second major in any other school. She could take part in the study as long as she was pursuing a major in WCAS.
↵10. For an individual with a second major, the choice set is conditional on whether both her majors are in WCAS and the School of Engineering, or not. Conditional on the student’s majors being in WCAS and the School of Engineering, the choice set is the same as that of a single major respondent. Conditional on one of the majors being in a school other than WCAS or the School of Engineering, the choice set includes all major categories that span WCAS, category k, and the category which includes the student’s non-WCAS major. For example, the choice set for a student with a major in WCAS and the School of Education would be categories a–g, i, and k.
↵11. Emails advertising the survey also were sent out by WCAS undergraduate advisors, Economics professors teaching large core classes, and deans of some schools (other than WCAS).
↵12. However, the population statistic was for the end of the fall quarter of the sophomore year, while the data collection spanned two quarters (fall and winter). Since students may declare their majors at any time during the academic year, these two numbers were most likely very similar by the end of the data collection period
↵13. There might be a concern that this sampling strategy would yield a selected sample. I do not find much evidence of this based on observables. Moreover, since gender differences are the focus of the paper, results would only be biased if one believes that factors that lead students to take the survey differ between males and females, of which there is no obvious evidence. To the extent that Asians are overrepresented in the sample, all the analysis in the paper is robust to the exclusion of this group. Another concern could be that survey-takers might be motivated by pecuniary incentives to take the survey. Given that I later find that nonpecuniary outcomes explain a majority of the choice, this should only bias the magnitude of the importance of nonpecuniary outcomes downward.
↵14. Social status of the jobs, c5, in the various majors was elicited as an ordinal ranking. The analysis treats these ordinal responses as cardinal. In hindsight, this question should have been asked in terms of subjective expectations of getting a high-status job, since the ordinal ranking does not reveal the respondent’s uncertainty about the outcome.
↵15. Studies that examine the role of nonpecuniary influences in the choice of schooling, Fiorito and Dauffenbach (1982), Daymont and Andrisani (1984), Easterlin (1995), and Weinberger (2004), all use questions that employ a Likert-scale.
↵16. The model does not include the GPA distribution as a variable of interest since it would be infeasible to ask respondents about the subjective major-specific GPA distributions. A threshold of 3.5 was chosen based on feedback from various departments. Since the same threshold is needed across all majors, it had to be such that it was relevant across all majors. For example, if it is extremely easy (hard) to get a GPA of at least three (3.8) in some fields of study, then using such thresholds would not be useful since responses would not vary substantially across respondents.
↵17. However, in Section V, as a robustness check, the choice model is reestimated for the subsample of students who have already declared their major. As we discuss there, results are qualitatively similar for the two groups.
Arcidiacono et al. (2012), who use a similar approach and collect subjective expectations data from students to estimate a choice model of college majors, find no evidence of cognitive dissonance biasing their parameter estimates, in a sample of Duke students that includes students much closer to graduation (juniors and seniors) than those in this study (who are all sophomores).
↵18. 48 percent of the sample respondents claim to pursue more than one major. However, since each respondent submitted a preference ordering over the majors in their choice set, these respondents are included in the sample. To understand why individuals choose more than one major, Zafar (2012) estimates a separate model.
↵19. By assuming that individuals maximize their current expected utility, I rule out that students may experiment with majors in order to learn about the match quality. While experimentation is important in the context of major choice (Altonji 1993; Arcidiacono 2004; Malamud 2010; and Stinebrickner and Stinebrickner 2008a, 2011), that is beyond the scope of this paper (primarily because of data limitations).
↵20. The vectors a and c (as well as b and d are the set of outcomes common to all majors. It is the joint probability distribution of these outcomes Pikt(a,c) (or Pikt(b,d)) which is indexed by major k. The vector b consists of b1 = graduating in four years, b2 = graduating with a GPA of at least 3.5, b3 = enjoying the coursework, b4 = parents approving of the major, b5 = getting a job on graduation, b6 = enjoying work at the jobs, and b7 = being able to reconcile work and family at the jobs. The vector d consists of d1 = average hours per week spent on coursework, d2 = average hours per week spent at the job, d3 = social status of the job, and d4 = expected income at the age of 30.
↵21. A consequence of the linear utility specification is that the individual is risk-neutral with regards to the continuous outcomes. Hence, only the expected value for the continuous outcomes needs to be elicited. This assumption is made for practical considerations—relaxing this would require eliciting multiple points on the respondent’s subjective earnings distribution for each major, which is not feasible.
↵22. The expected income at age 30, Eikt(d4), is a weighted average of the age 30 expected income the respondent believes will be realized if she majors in major k and if she drops out of school, with the weight being the probability of successfully graduating in major k. This is further scaled by the subjective belief of being active in the labor force at age 30. Age 30 dropout earnings and labor force status are elicited directly from the respondent, and are assumed to be independent of one’s field of study.
↵23. Kapteyn et al. (2007) use a similar approach to estimate preference parameters for retirement.
↵24. One concern with using stated preference data is that an individual may not have complete preferences over all alternatives that are available to her. In the case that a complete ranking does not exist, it is possible that the lower end of her preferences is noise. To check the sensitivity of the results, the model was also estimated by using the ranking of the four most preferred choices only. The results (available from the author upon request) are comparable to those obtained from using the complete preference data.
↵25. I do not allow preference parameters to vary by other individual characteristics in this paper. Interested readers are referred to an earlier and longer version of the paper, Zafar (2009), which analyzes heterogeneity in preferences by observables, such as parents’ income.
↵26. For example, the amount that an individual would be willing to forgo in earnings at the age of 30 for a 2 percent change in the probability of outcome j is (0.02 × Δuj) / γ4.
↵27. The classification of outcomes into pecuniary and nonpecuniary is somewhat subjective. Outcomes classified as being nonpecuniary are gaining parents’ approval, enjoying coursework, reconciling work and family, and enjoying work at the jobs. The remaining outcomes are classified as being pecuniary.
↵28. Arcidiacono et al. (2012) also do not find evidence of preference parameters of upperclassmen—students who have declared their major—being different from those who are in earlier years in college.
↵29. Colleges with high selectivity and the same Carnegie Code classification as Northwestern were used for comparison. Assuming students graduate from college at the age of 22, this would be their salary at age 32.
↵30. It could be that the survey respondents are self-enhancing their own salary expectations. However, there are at least three legitimate reasons why respondents’ earnings expectations may be different from the earnings statistics in the B&B sample. First, even though I have restricted the B&B sample to selective institutions, Northwestern graduates may work at jobs very different from those of graduates from comparable institutions. Second, respondents might think that future earnings distributions will differ from the current ones. Third, respondents may have private information (other than gender) about themselves that justifies having different expectations.
↵31. More specifically, I assume that the utility function in Equation 2 is now:
. where ρ(Xit) is the relative risk aversion (RRA) coefficient for age 30 earnings for individual with characteristics Xit. The expected utility from major k in Equation 4 is modified now so that it is:
. To estimate the model, I use the variation in major-specific income in the population of current age 30 college graduates (described in Wiswall and Zafar 2012), and use this variation and the respondent’s expected income to obtain individual field-specific parameters of the income distribution by assuming that income is distributed log-normal. The model parameters are obtained by simulated maximum likelihood, by taking 100 draws from each individual’s income distributions. I estimate the model parameters for the full sample, and separately by gender. The RRA coefficient estimate for males is 3.2, while for females is 4.3 (difference not statistically significant). These estimates are in middle of the range of previous estimates (see Wiswall and Zafar 2012), and also are consistent with women being more risk averse than men.
This approach of using the population variation in earnings is not entirely appropriate since respondents’ subjective distribution of income may be quite different from the population income distribution for various legitimate reasons. However, the purpose of this exercise is primarily a robustness check to test how sensitive model estimates and results are to the risk neutrality assumption.
↵32. I observe only the beliefs about academic ability, not actual academic ability. However, Chemers et al. (2001) show that confidence in one’s ability is strongly related to academic performance. Moreover, it is the beliefs that matter when an individual is making a choice under uncertainty.
↵33. I sort the female and male subsamples according to the predicted probability of majoring in that field and then replace the female subjective belief about ability with that of the corresponding male. Since there are more females than males, I use a simulation method similar to the one used for the decomposition (see Appendix).
↵34. An example of the latter is that women might believe that these fields are not gender-neutral but constructed in accordance with the traditional male role, and that they therefore would be treated poorly in the workplace. For example, Traweek (1988) argues that an aggressive behavior is a necessary ingredient for achieving success in science, and Niederle and Vesterlund (2007) show that women tend to shy away from competitive environments. In that case, even if women perceive no gender difference in ability and compensation, their beliefs about how much they will enjoy studying engineering and science will be affected.
↵35. Arcidiacono et al. (2012), using an approach similar to that in this paper, reach somewhat different conclusions with regards to determinants of college major choice. In their sample of male Duke University undergraduate students, they find that both ability and earnings are important in the choice of major.
↵36. It is unclear how ruling out experimentation with majors affects the results in this paper with regards to the importance of preferences versus beliefs in the choice of majors between the two genders. It is well-documented that students switch down to less challenging majors (Arcidiacono 2004; Stinebrickner and Stinebrickner 2011). The literature on gender differences in overconfidence (Niederle and Vesterlund 2007) would suggest that male students are more likely to be more overconfident about their abilities and prospects in the challenging majors and, relative to females, more likely to start out in them and then eventually switch down. Since students are surveyed relatively early in their college career in this study, this gender bias in overconfidence would possibly lead to larger differences in beliefs between the genders at an earlier stage, and also lead to the pecuniary outcomes explaining a larger proportion of the choice for males earlier in their college years (since the challenging majors, such as engineering and the sciences, are also associated with higher pecuniary returns). How that may affect the relative contribution of beliefs and preferences in the gender gap in choices is then ambiguous.
It also should be pointed out that, to my knowledge, there exists no evidence with regards to gender differences in major switching. While data on major switches for Northwestern students is not available, a simple analysis of major switches in the NLSY97 (spanning data from 1997 to 2009) reveals no significant gender differences in major switches (while females are more likely to switch from their initial major, and also switch more number of times, the gender differences are not statistically significant). Moreover, while the gender distributions of initial and final majors are very different, the changes in the distribution of majors do not vary systematically by gender. This would then suggest that the assumption of respondents maximizing current expected utility and hence ruling out switching of majors is not an overly restrictive one in the sense that it is unlikely to affect the conclusions reached in the study.
- Received September 2011.
- Accepted June 2012.