Abstract
This paper exploits temporal and spatial variation in the implementation of nine city- and four state-level U.S. sick pay mandates to assess their labor market consequences. We use the synthetic control group method and traditional difference-in-differences models along with the Quarterly Census of Employment and Wages to estimate the causal effects of mandated sick pay on employment and wages. We do not find much evidence that employment or wages were significantly affected by the mandates that typically allow employees to earn one hour of paid sick leave per work week, up to seven days per year. Employment decreases of 2 percent lie outside the 92 percent confidence interval and wage decreases of 3 percent lie outside the 95 percent confidence interval.
I. Introduction
Paid sick leave was an integral part of the first social insurance scheme in the world. The Sickness Insurance Law of 1883 implemented federally mandated employer-provided health insurance in Germany, which covered up to 13 weeks of paid sick leave along with medical care. Insurance against wage losses due to health shocks was a crucial element of health insurance at that time, and it was valued by employees and unions alike. Given the limited availability of expensive medical treatments in the 19th century, expenditures for paid sick leave initially accounted for more than half of all health insurance expenditures (Busse and Riesberg 2004). Other European countries followed soon after and also implemented paid sick leave. Today, virtually every European country provides universal access to paid sick leave.
The United States, Canada, and Japan are the only industrialized countries that do not provide universal access to paid sick leave. In these countries, sick pay is largely provided as a fringe benefit by employers on a voluntary basis (Heymann et al. 2009). In the United States, coverage rates are around 65 percent among full-time workers, while low-income, part-time, and service sector workers have coverage rates of less than 20 percent (Lovell 2003; Boots, Martinson, and Danziger 2009; Susser and Ziebarth 2016). Susser and Ziebarth (2016) estimate that, in a given week of the year, the total demand for paid sick leave sums to 10 percent of the workforce in the United States. In addition to concerns about inequality, worker well-being, and work productivity, a lack of sick leave coverage can induce contagious employees to work while sick and spread diseases (Pichler and Ziebarth 2017).
In the past decade, support for sick pay mandates has grown substantially in the United States. On the city level, the first sick pay mandates were implemented in San Francisco (2007), Washington, DC (2008), Seattle (2012), New York City (2014), Portland (2014), Newark (2014), Philadelphia (2015), and Oakland (2015). Several dozen cities, including Pittsburgh, Santa Monica, Los Angeles, or Chicago, have followed more recently (for an overview, see A Better Balance 2018).
On the state level, Connecticut was first to mandate paid sick leave in 2012. However, the bill excludes businesses with less than 50 full-time employees and only applies to the service sector; it only covers about 20 percent of the workforce (Miller and Williams 2015; Connecticut Department of Labor 2015). In contrast, California passed a much more comprehensive bill—covering all employees—effective July 1, 2015. Massachusetts and Oregon also passed relatively comprehensive sick pay mandates, effective July 2015 and January 2016, but these exempt small businesses. In addition, Vermont, Arizona, and Washington State passed sick leave legislation very recently. Table 1 lists all citywide (nine in total) and statewide (four in total) sick pay mandates that we evaluate here.
Overview of Employer Sick Pay Mandates in the United States
On the federal level, reintroduced in Congress in 2015, the Healthy Families Act proposes a federal sick pay mandate that would cover employees in businesses with more than 15 employees (U.S. Congress 2015). Similar to the mandates already in place at the state or city level, the Healthy Families Act proposes that employees “earn” one hour of paid sick leave per 30 hours worked, up to 56 hours (or seven days) per year. Paid sick leave—at the standard wage rate of 100 percent—could then be taken in the case of own sickness or sickness of a relative, in most cases children.
The main source of controversy is the possibility that government-mandated sick pay could hurt employment or wage growth. The standard economics textbook example of mandated benefits argues (Summers 1989): Employer mandates may be more efficient than a direct provision of benefits by the government (funded by higher taxes), as long as employees value the benefit and would accept lower wages in return. Gruber (1994) studies the impact of maternity leave mandates on employment and wages in the United States. He argues that the case for a group-specific mandate may deviate from the textbook example because antidiscrimination laws or social norms may prohibit the free downward adjustment of wages for a specific identifiable group. Using the CPS, Gruber (1994) finds significant wage decreases for women of childbearing age, but no significant impact on labor supply.
The case of mandated sick pay may also deviate from the textbook example. Assuming flexible wages and absent administrative costs, earning one hour of paid sick leave per 30 hours worked equals a wage increase of 1/30 or 3.3 percent per week for full-time employees. However, such a static calculation assumes that all employees would fully exhaust their annual sick leave credit and would have worked sick with full productivity (or taken unpaid leave) in the counterfactual scenario. Empirically assessing and directly measuring labor productivity under the two scenarios is extremely challenging (if not impossible). To our knowledge, empirical causal evidence on how work productivity changes when employees gain access to paid sick leave is lacking. It seems likely that sick employees cannot maintain full work productivity when working sick and that employees on sick leave will (partly) compensate for their lost productivity after their recovery. Hence, the calculated static wage increase of 3.3 percent appears to be an upper bound for marginal firms.
When ignoring administrative costs, changes in work productivity, and psychological costs or benefits, the textbook example predicts that sick pay mandates would reduce wage growth. However, if wages cannot flexibly adjust because of social norms, antidiscrimination laws, minimum wages, or because employees do not value sick leave, marginal employees might not get hired or might even get fired. In addition, when small businesses are exempt from the mandate, some employers could reduce their workforce or split up their firms. In sum, under several plausible scenarios, the standard textbook example may not hold up in reality. Then, it becomes essentially an empirical question whether wages and employment would be significantly affected by sick pay mandates.
This study empirically assesses how city- and state-level sick pay mandates affected wages and employment in the United States. We use the Quarterly Census of Employment and Wages (QCEW) by the Bureau of Labor Statistics (BLS) for this evaluation. The QCEW is a census of all establishments that are covered by Unemployment Insurance and contains 97 percent of nonfarm employment in the United States. Our first QCEW data set records total monthly employment and quarterly wages at the county(-industry) level from January 2001 to June 2016. The second data set records total monthly employment and quarterly wages at the state–industry–firm-size level from January 2001 to June 2016. Econometrically, we exploit the quasi-random nature of the implementation of the sick pay mandates across U.S. regions and over time. To mimic pretreatment trends as closely as possible, we follow Abadie and Gardeazabal (2003) and Abadie, Diamond, and Hainmueller (2010) and build synthetic control groups using untreated regional units. To test hypotheses with single and multiple events, we use the approach in Dube and Zipperer (2015) and Firpo and Possebom (2018). In a recent review of the state of applied econometrics, Athey and Imbens (2017) call the synthetic control group method (SCGM) the most important innovation in program evaluation in the last 15 years.1
The setting of this paper is well suited for the application of the SCGM. First, when evaluating reforms at the county level, we can build synthetic controls using a large pool of more than 3,000 U.S. counties. To our knowledge, this is one of the very first papers to select donors out of the total pool of U.S. counties. It allows us to replicate the labor market dynamics of the treated counties very closely. Second, because the treated units are rather small and geographically dispersed, the assumption of no general equilibrium or spillover effects to neighboring regions seems justified. Third, we can match the labor market dynamics of the treated units for a long prereform time period. Fourth, we evaluate sick pay mandates in nine counties and four states. All these U.S. regions were treated with similar reforms, and the policies were implemented subsequently over a decade. Moreover, the regions are heterogeneous in terms of size and local labor markets and thus provide broad common support. As a result, the findings should have external validity for other U.S. counties with a similar industrial structure and policy environment. Finally, the main identification assumption of no systemic unobserved postreform labor market shocks is weak(er) when evaluating 13 reforms over a decade.
Our findings do not provide much evidence that either employment or wages significantly and systematically decreased (or increased) postreform. The main point estimates have ambiguous signs and are relatively small in size. Joint tests let us exclude that employment decreases of 2 percent or more lie within the 92 percent confidence interval. Wage decreases of 3 percent or more lie outside the 95 percent confidence interval, which let us exclude that administrative and psychological costs are large (see above for a static back-of-the envelope calculation). Moreover, the results are very robust to alternative SCGM matching algorithms and when focusing on the most affected industries (such as the construction or hospitality sector). The findings are also supported by traditional difference-in-differences (DD) models and event studies. In sum, this paper follows the call of Abadie (2018) for “a visible reporting and discussion of non-significant results in empirical practice.”
The next section summarizes the literature. Section III discusses the U.S. sick pay mandates in more detail, and Section IV explains the data. The empirical approach and identifying assumptions are in Section V. Section VI discusses the empirical findings, and Section VII concludes.
II. Research on Sick Leave
Economic research on sick leave almost exclusively focuses on countries outside the United States. In the past, the simple reason has been a lack of policy variation and a lack of appropriate data. For example, high-quality administrative sick leave data exist in most Scandinavian countries (Andrèn 2007; Markussen et al. 2011; Dale-Olsen 2014), but, in the United States, actual sick leave behavior is largely unobservable. There are a few exceptions. One exception is Gilleskie (1998), who exploits 1987 MEPS data to structurally model work absence behavior and simulate the effects of alternative policies. According to Gilleskie (1998), about a quarter of all male employees would not take sick leave when ill. Susser and Ziebarth (2016) use the representative 2011 ATUS Leave Supplement to estimate that, in a given week of the year, two percent of U.S. employees—mostly low-income female employees—would go to work sick. In almost one-half of all cases, the reasons indicated for such presenteeism behavior were directly related to a lack of sick leave coverage. Ahn and Yelowitz (2016) confirm that U.S. employees take more sick leave when they have sick leave coverage. And Colla et al. (2014) find that, in San Francisco, 73 percent of all firms offered sick pay voluntarily before the mandate in 2006 and that this share had increased to 91 percent by 2009. Some reports suggest that the early mandates in San Francisco and Washington, DC did not have negative employment effects (Boots, Martinson, and Danziger 2009; Petro 2010). Using 2009–2012 data from the American Community Survey, Ahn and Yelowitz (2015) come to a similar conclusion for Connecticut.2
Van Kammen (2015) also uses the QCEW to evaluate U.S. sick pay mandates, but a different methodological approach. His findings also differ from the findings in this paper. Using data from 2003–2014, Van Kammen (2015) restricts the sample to counties with more than 10,000 employees and 144 county–month observations, uses higher-order lags of employment as instruments, and finds evidence for generally negative employment effects (although they partly lack precision). However, using a balanced sample with select county–industry observations and interacting the treatment effects with the average number of sick days at the industry level in 2010, he concludes, “the effect of paid sick days mandates is to shift employment from the least constrained industries [utilities] to the most constrained [accommodation and food services], having a practically small effect on overall county employment” (Van Kammen 2015, page 17).
Outside the United States, several empirical papers estimate the causal effects of variation in sick pay. These studies find that employees adjust their intensive labor supply in response (Johansson and Palme 2005; Ziebarth and Karlsson 2010, 2014; De Paola, Scoppa, and Pupo 2014; Dale-Olsen 2014; Fevang, Markussen, and Røed 2014). The focus of these papers naturally differs from others that study extensive labor supply effects of disability insurance (Autor and Duggan 2006; Kostol and Mogstad 2014; Borghans, Gielen, and Luttmer 2014; Burkhauser, Daley, and Ziebarth 2016). Rather, it is closer to U.S. studies on work-related accidents and diseases covered by Workers’ Compensation (Meyer, Viscusi, and Durbin 1995; McInerney and Bronchetti 2012; Hansen 2016; Powell and Seabury 2018).
Other papers on sick leave investigate the role of general determinants (Markussen et al. 2011; Dale-Olsen 2014), probation periods, known to reduce absenteeism (Riphahn 2004; Ichino and Riphahn 2005), culture (Ichino and Maggi 2000), gender (Ichino and Moretti 2009; Gilleskie 2010; Herrmann and Rockoff 2012), income taxes (Dale-Olsen 2013), union membership (Goerke and Pannenberg 2015), and unemployment (Askildsen, Bratberg, and Nilsen 2005; Nordberg and Røed 2009; Pichler 2015). There is also research on the impact of sick leave on earnings (Sandy and Elliott 2005; Markussen 2012). In addition, some papers study the phenomenon of presenteeism explicitly (Brown and Sessions 2004; Pauly et al. 2008; Barmby and Larguem 2009; Pichler 2015; Pichler and Ziebarth 2017).
Finally, note that paid sick leave differs from paid vacation or paid maternity leave in both scope and aim (Rossin-Slater, Ruhm, and Waldfogel 2013; Lalive et al. 2014; Baum and Ruhm 2016; Dahl et al. 2016; Thomas 2018). Whereas sick leave coverage is an insurance against wage losses due to health shocks, paid vacation and maternity leave mostly aim at balancing family and work and address gender inequality in the workplace. Sick pay mandates can also be justified from a public health perspective because access to paid sick leave reduces contagious presenteeism and the negative externalities associated with the spread of contagious diseases (Pichler and Ziebarth 2017; Stearns and White 2018).
III. U.S. Sick Pay Mandates
The United States is one of three OECD countries without universal access to paid sick leave. About half of the workforce lacks access to paid sick leave, particularly low-income employees in the service sector (Heymann et al. 2009; Susser and Ziebarth 2016).
The only existing federal law is The Family and Medical Leave Act of 1993 (FMLA). It provides unpaid leave in the case of pregnancy, own disease, or disease of a family member to employees who work at least 1,250 hours annually in businesses with at least 50 employees (see, for example, Tominey 2016). Jorgensen and Appelbaum (2014) find that 49 million U.S. employees are ineligible for FMLA, 44 percent of all private sector employees. The findings of Susser and Ziebarth (2016) also suggest that many low-wage and service sector employees are either not aware of their FMLA rights or that they are not covered by the law. As a result of the sick pay mandates analyzed in this paper, most employees without firm-provided sick pay gained access to sick leave coverage.
Table 1 provides a summary of the mandates evaluated by this paper. The details of the bills differ from city to city and state to state, but basically all sick pay mandates are employer mandates. Several mandates exclude small firms or offer exemptions. Employees “earn” a sick pay credit (typically one hour per 30–40 hours worked up to seven days per year), and, if unused, the credit rolls over to the next calendar year. Because employees need to accrue sick pay credit, most mandates explicitly state a 90-day accrual period in addition to waiting periods when changing jobs. Moreover, several bills that exempt small businesses do require them to let their employees accrue unpaid sick days instead of paid sick days (Massachusetts Attorney General’s Office 2016). Several laws explicitly prohibit employers from requiring a doctor’s note (Polsky 2016; New York City Consumer Affairs 2016).
As Table 1 shows, San Francisco was the first city to mandate paid sick leave effective February 5, 2007.3 Washington, DC enacted its mandate effective November 13, 2008 and expanded the mandate on February 22, 2014 to include temporary workers and tipped employees. Seattle (September 1, 2012), Portland (January 1, 2014), New York City (April 1, 2014), and Philadelphia (May 13, 2015) followed.
Connecticut was the first state to mandate paid sick leave on January 1, 2012. However, the law only applies to service sector employees in nonsmall businesses and covers only about 20 percent of the workforce. The mandates of California (July 1, 2015), Massachusetts (July 1, 2015), and Oregon (January 1, 2016) are much more comprehensive (see Table 1).
IV. Quarterly Census of Employment and Wages (QCEW)
The paper makes use of publicly available data from the QCEW, which is provided by the Bureau of Labor Statistics (BLS) (2018). The QCEW is based on an establishment census. It includes all establishments covered by Unemployment Insurance (UI)—97 percent of all U.S. civilian employment.4 Using the quarterly UI contribution reports filed by the establishments, the BLS calculates the number of actually filled jobs per month, as well as the average weekly wage per quarter.
The BLS reports the data at different levels of spatial and timely disaggregation. To evaluate reforms at the county and state level (see Table 1), we generate two data sets, one at the county level and one at the state level. Both the county- and state-level data are available from January 2001 to June 2016. The raw data are reported by industry. Because the mandates mostly apply to the private sector, we generate variables that measure private sector employment and private sector wages. While the QECW is available at the state level by county–industry–firm size, at the county level it is only available by county–industry.
A. County-Level Data
Table 2 provides the summary statistics for the county-level data. The table shows summary statistics for 3,062 counties.5 For employment, the data are at the monthly level, yielding a total of 548,992 county–month observations.6 For wages, the data are at the quarterly level with a total of 182,992 county–quarter observations. Population counts are at the annual level, with a total of 44,267 county–year observations (U.S. Census Bureau 2016a). $8.50 in 2004 to $9.79 in 2009. Second, the Health Care Security Ordinance set minimum rates for employee healthcare spending by employers (those vary by firm size and for-profit status).
Quarterly Census of Employment and Wages (QCEW), County Level: 2001–2016Q2
We generate several outcome variables for the county-level analysis. The first main outcome variable is “Private Sector Employment,” which we obtain by dividing the total number of filled jobs at the monthly county level by the annual county level population. This yields private sector jobs as a share of the county population for each U.S. county on a monthly basis. Table 2 shows that the average private sector employment share is 27.1 percent; the average public sector employment share is 7.7 percent. This means that, on average, for every 100 residents in a county in the United States, 27 private sector jobs paying UI contributions are officially reported.
Note that individuals who hold multiple jobs are counted for every job that they hold. In addition, filled jobs are assigned to counties by the physical address of the establishment, not by the county of residence of the jobholder. These are the two reasons (in addition to economic prosperity) that some counties have significantly higher employment ratios than others and even employment ratios above 100 percent. Whereas the minimum value for Private Sector Employment is only 1.1 percent, the maximum value is 404 percent (Table 2).
As shown in Table 2, while the county-level data do not allow us to differentiate by firm size, we can differentiate by industry. Consequently, in extended analyses, we will test whether mandated sick pay resulted in weaker job growth in the construction (1.5 jobs per 100 pop.) and the leisure and hospitality industries (3.3 jobs per 100 pop.), as these two sectors were most affected by the mandates—prereform, around 70 percent of all employees in these sectors did not have access to paid sick leave (Susser and Ziebarth 2016).
The second main outcome variable is “Weekly Wages.” Employers report total quarterly gross compensations, including bonus payments and stock options.7 Gross wages are then derived by dividing the total quarterly compensation by the total quarterly employment. Dividing additionally by the number of weeks yields 182,992 county wage observations at the weekly level in Table 2. The average weekly wage is $599 (or $31,200 per year), but the variation ranges from $155 to $4,542. Because consumer price indexes are not available at the county level, throughout the paper we use nominal wages. However, we net out wage seasonalities by regressing the wage dynamics of each county on a full set of quarter–year fixed effects.
In addition to these two main outcome measures, we use the industry structure variables in Table 2 to build synthetic control counties that resemble the treatment counties as closely as possible.
B. State-Level Data
Table 3 provides the summary statistic for the state-level data. When considering all 51 states, we obtain 8,981 state–month observations for employment and 2,992 state–quarter observations for wages. Using the state-level data, this paper evaluates the sick pay mandates in Connecticut, California, Massachusetts, and Oregon.8 The Connecticut mandate only applies to private firms with more than 49 employees in the service sector, and the mandates in Massachusetts and Oregon only apply to private firms with more than nine employees (Table 1). Because the QCEW data are broken down by industry and firm size (at the state level), we carve out employment and wage dynamics for private firms with more than 49 employees in the service sector in Connecticut. For Oregon and Massachusetts, we generate all outcome variables for private firms with more than nine employees.9
Quarterly Census of Employment and Wages (QCEW), State Level: 2001–2016Q2
Analogous to the county-level data, the upper panel of Table 3 shows that, overall, private sector employment was 37.3 and public sector employment 8.0 per 100 population at the state level. The average weekly wage was $805, and the state population was on average 5.6 million.
The lower panel of Table 3 lists “Private Service Sector Employment >49 Employees” as one main outcome variable for Connecticut. Across all U.S. states, for every 100 residents of a state, 15.2 people worked in the service sector and in establishments with more than 49 employees. “Private Sector Employment >9 Employees” (31.2 per 100 pop.) is a main outcome variable for Oregon and Massachusetts. In contrast, general “Private Sector Employment” is one main outcome measure for California, where the mandate does not exempt small businesses.
C. Treatment and Control Regions
1. Treatment regions
Table 1 lists treated cities and states. Whereas we evaluate all regions listed and also provide graphs for all regions, some regions (for example, employment in Washington, DC, or wages in New York City and Hudson County) provide examples of where the SCGM is not a valid evaluation method due to a poor fit. For example, in the case of Washington, DC, the fit is poor because of the following three reasons. First, Washington, DC has a very unique employment structure, with many nonprofit, public sector, and lobbying jobs. Thus, finding appropriate control counties for Washington, DC is very challenging. Second, the original mandate in Washington, DC had many exemptions that are difficult to model with our data (for example, no healthcare or restaurant workers). Moreover, Washington, DC extended the mandate in September 2014, but retrospectively effective for February 2014. Third, the first Washington, DC mandate became effective shortly after the Great Recession hit in October 2008. This makes it very challenging to disentangle labor market effects due to the mandate from the confounding effects of the Great Recession. Because of the first reason, the recession also affected Washington, DC differently than most other U.S. counties.10 To deal with special cases such as Washington, DC, we experiment with several alternative SCGM modeling approaches but remain cautious when drawing conclusions.
Table 1 lists all city mandates along with the relevant counties. However, city and county boundaries are not always identical. First, the case for San Francisco is clear given that the city boundaries equal the county boundaries.
Second, we do not separately evaluate the five boroughs of New York City (NYC),11 but, using the simple employment and wage averages of all five, aggregate them to one regional unit for three reasons. (i) The five boroughs together represent the entire area where the law formally applied. (ii) Employment ratios and wages in Manhattan are extremely high, and they are relatively low in the other boroughs. Moreover, most people who work in NYC live in one of the four surrounding counties and commute to Manhattan. (iii) New York City can be seen as one integrated labor market and not five separate ones. For these reasons, we treat NYC as one statistical unit.
Third, in the case of Portland, Seattle, Newark, and Jersey City, the county boundaries are not identical to the city boundaries where the mandate formally applied. Portland almost entirely lies within Multnomah County, but small portions fall into Clackamas and Washington Counties, which also include large(r) parts that do not belong to Portland. Seattle, Newark, and Jersey City all lie within the county that we use as treatment unit. For example, in 2014, King County had 2,079,967 residents, but Seattle had only 668,342. Essex County had 795,723 residents, but Newark had only 280,579. And Hudson County had 669,115 residents in 2014, but Jersey City had only only 262,146 (U.S. Census Bureau 2016b). The fact that these three cities only make up one-third of the total county population simply means that we evaluate the intend-to-treat (ITT) effect for the entire county rather than just the core cities as in the cases of San Francisco and NYC.
If businesses relocated (due to the mandate) just outside the city boundaries but within the treatment county boundaries, our method would not be able to identify such “border jumping” for Portland, Seattle, Newark, and Jersey City. However, comparing the results for these treatment counties with San Francisco and NYC indirectly tests whether firms relocated just outside the city boundaries to circumvent the mandate. This hypothesis would be reinforced, for example, if we found negative employment effects for the core cities (San Francisco and NYC) but no impact for entire counties that surround core cities (Portland, Seattle, Newark, Jersey City).
2. Control regions
We employ the SCGM to model an ideal hypothetical control region for each treatment region. Table 2 lists private sector employment, public sector employment, production employment, and service sector employment, along with employment shares of specific industries, such as manufacturing, education and health services, or leisure and hospitality. In some modeling approaches, we will use all of these industry structure variables to find suitable control “donor” counties. In other words, in addition to identical prereform outcome dynamics, the SCGM algorithm selects control counties with similar labor market and population structures than the treatment counties. Tables A1, A2, A6, and A7 list all donor counties and states chosen to replicate the pretreatment employment and wage dynamics of each treatment region as closely as possible. Section V below provides more details on the estimation procedure. We also provide evidence from traditional DD models that use all nontreated counties or states jointly as control units.
D. Sample Selection
The baseline data sets in Tables 2 and 3 are already restricted as follows. For each treatment region, we focus on four pretreatment years (48 months or 16 quarters). Moreover, depending on when exactly the mandate was enacted (Table 1), the postreform periods differ by treatment regions.
The county-level data set in Table 2 contains 3,062 unique counties. For the county-level SCGM analysis (not the state-level SCGM analysis or the traditional DD analysis), we additionally preselect suitable donor counties. (We do this because running the SCGM with 3,062 donor counties would technically be unfeasible due to multiple equilibria and too many degrees of freedom.) Specifically, we separately rank all 3,062 available counties for the following three dimensions: county population, private sector employment, and private sector wages. Then, we select counties ranked above and below the treated county using a bandwidth of 500 ranks for the first dimension “county population.” Next, we proceed with the same procedure on the second and third dimensions using the private sector employment and private sector wages variables. Finally, we use the counties that overlap on all three dimensions and fall within a ranking bandwidth of ±500 ranks on each dimension. This preselection procedure results in about 200 potential control counties for each treatment county (for exact values, see the denominator in Column 5 of Table 4).
Synthetic Control Group Method—the Effect of Sick Pay Mandates on Employment
V. The Synthetic Control Group Method
To assess the causal effects of the sick pay mandates on employment and wages, we use the Abadie and Gardeazabal (2003) SCGM, along with traditional DD models in robustness checks. The SCGM uses fractions of several natural control units to build an ideal—synthetic—control group whose prereform outcome dynamics mimic those of the treatment group (Abadie, Diamond, and Hainmueller 2010). Given the assumptions discussed below, differences in postreform outcome dynamics between the treatment and the synthetic control group then yield evidence on causal reform effects (Athey and Imbens 2017).
In our context, following Table 1, the treatment units are counties or states that implemented sick pay mandates; the potential control units consist of the remaining U.S. counties or states. Because we analyze each treatment unit separately, the notation below refers to a single treatment and J control units.
Let denote the natural logarithm of the outcome
that would have been observed in region i at time t in the absence of the sick pay mandate. Moreover,
denotes the natural logarithm of the outcome for the treated region i at time t, where the sick pay mandate was implemented at time T0 + 1. We assume
.
Abadie, Diamond, and Hainmueller (2010) suggest that the following factor model represents the counterfactual :
(1)
where δt is a common time effect, θt is a vector of time-dependent coefficients, λt is a vector of unobserved common factors, and μi is a vector of unknown factor loadings.
The SCGM allows for some degree of treatment endogeneity because the treatment can be correlated with unobservables. However, the method still requires several identification assumptions.
First, in our case, one necessary assumption is that employment rates and wages in the control regions are not affected by the treatment. This implies the absence of spatial labor market spillovers. When evaluating counties, the treatment units are rather small and thus unlikely to trigger large labor market spillover effects. Also, in most cases, the treated counties are geographically distant from the control counties. Tables A1, A2, A6, and A7 list all counties and states used to build the synthetic control units. For example, the donor counties to evaluate employment dynamics in King County (WA) illustrate that the “no spatial labor market spillover” assumption is rather weak: to replicate King County (WA), the control donors are Fulton (GA), Somerset (NJ), Mecklenburg (NC), Durham (NC), Denver (CO), Madison (AL), Harris (TX), Midland (TX), Winnebago (WI), and Mercer (NJ).
Second, similar to traditional DD models, no unobserved shocks should affect the outcome differently for treatment and control groups in postreform periods. In our case, shocks violating this assumption would be other labor market policies that are correlated with sick pay mandates in treated regions (but not in control regions). The SCGM may consider such shocks (better than traditional methods) because the synthetic control units are, by construction, built to replicate the outcome dynamics of the treated unit (which includes unobservables affecting such outcome dynamics).
Third, again similar to traditional DD models, treatment-induced migration could lead to biases. If employment prospects worsened due to sick pay mandates and employees lost their jobs, they might migrate to more prosperous counties. Likewise, firms could relocate in response to mandates. For several reasons, economic migration is unlikely to be a major confounder in our context. First, our data and outcome measures allow us to directly test for such possible migration pattern. In fact, it is precisely one objective of this paper to test for changes in employment rates. Recall that we use official population data and normalize employment. Because we stratify the effects by the time since implementation, we would identify negative employment effects due to migration over time. When evaluating county effects, the treatment counties are unlikely to contaminate the donor counties (which are chosen out of a total of 3,062 U.S. counties) by worker or firm relocation. In robustness checks, we also test for spillover effects on neighboring counties and states.
Finally, in most SCGM settings, only one single treatment unit is evaluated. We rely on 13 different treatment regions—counties and states of different sizes. Single unobserved shocks may confound one county or state. But it is very unlikely that 13 treatment units, with staggered treatments from 2007 to 2015, were all coincidentally affected by random unobserved labor market shocks unrelated to the mandates.
A. Implementation
SCGM requires the estimation of two matrices: (i) V is the weighting matrix determining the relative predictive power of Zi and of , and (ii) W is a vector of nonnegative weights attached to the J control countries. The criterion to be minimized is:
(2)
where
and
are vectors of averages over the pretreatment elements of Zi and yi for treated and control units, respectively. In our case,
and
include the variables in Tables 2 and 3. This means that, for the main county and state-level analysis,
and
include private sector employment and its subcategories, service sector employment and production sector employment, along with public sector employment as well as private sector wages. To avoid criticism of overfitting, we only include these variables at the following points in time before the treatment: 36 months, 24 months, 12 months, and one month. In industry-structure robustness checks, we additionally use the employment shares in manufacturing, professional and business services, education and health services, trade, transportation and utilities, as well as leisure and hospitality.12
We obtain an optimal weight matrix W*(V) among all diagonal positive definite matrices, where the elements of V minimize the distance to the outcome. This optimal weight matrix minimizes the root of the mean squared prediction error (RMSPE) for prereform periods:
(3)
where T0 represents the number of prereform time periods, that is, in our case 48 months or 16 quarters. In alternative specifications, we use six instead of four prereform years and stop minimizing the RMSPE 24 months prior to the treatment.
B. Treatment Effects and Inference
In addition to calculating the RMSPE for the prereform periods, we also calculate the RMSPEs for the postreform periods and take the ratio of the two, as suggested by Abadie, Diamond, and Hainmueller (2010). Whereas the prereform RMSPE is an indicator to assess the fit of the synthetic control group, the ratio between post and prereform RMSPE indicates the size of a possible treatment effect. Assuming a stable model fit over time, a RMSPE post/RMSPE pre > 1 would indicate a larger post than prereform RMSPE and thus a treatment effect.
However, the RMSPE Ratio is only a measure of the relative treatment effect. The sign of the treatment effect remains ambiguous. Therefore, we calculate the Percent Treatment Effect (PTE) as
(4)
and the Level Treatment Effect (LTE) as
(5)
In principle, the sign of the treatment effect could change over time. Then positive and negative effects would cancel each other out. Still, then the PTE and LTE would provide evidence on the cumulative sign and size of the long-run effect over all postreform periods.
To conduct inference, we follow Abadie, Diamond, and Hainmueller (2010) and run placebo estimates.13 Because we assess multiple treatments at different points in time, we first construct placebo estimates for each treatment unit. Then we rank the treated and all placebo estimates by their RMSPE Ratios. Following Abadie, Diamond, and Hainmueller (2010), the rank of the true treatment estimate relative to the N placebo estimates then determines the p-value of the H0 hypothesis of no treatment effect (H0 : RMPSE RatioTreat ≤ RMPSE RatioPlacebo). Mathematically, the p-value results from the percentile rank for the event e, where
stands for the empirical cumulative distribution of all RMSPE Ratios, as obtained by the placebo estimates. For example, if the true treatment county had the highest rank among 99 + 1 (placebo + treatment) counties, the p-value would be 1/100 = 0.01, the treatment effect would be highly significant, and the H0 of no treatment effect could be rejected. In the results section, we carry out this testing procedure for the RMPSE Ratio (Firpo and Possebom 2018). Finally, we follow Dube and Zipperer (2015) and calculate joint p-values based on the sum of the single p-values using the Irwin–Hall distribution.
As in the standard parametric case, p-values can be statistically insignificant for two reasons: either there is no effect, or we do not have enough statistical power to identify an effect. To assess the statistical power of our estimates, we test the p-value of alternative hypotheses to analyze how narrow the confidence intervals are. To do so, we follow the basic procedures in Dube and Zipperer (2015) and Firpo and Possebom (2018). We set the hypothetical average treatment effect over all postreform periods equal to z percent. Next, we recalculate the RMSPE Ratio with this hypothetical average treatment effect of z percent.14 Then we carry out all N placebo estimates as above to assess the probability that our treated unit (with the artificially set z percent treatment effect) originates from that distribution. Accordingly, we calculate p-values and test the null hypothesis that the treatment effect equals zero. Using the notation above, this means that we calculate modified p-values, namely . 15 To provide additional intuition, in the SCGM setting, placebo estimates are usually produced to check whether the treated unit differs from the placebo units. The placebo units are, by definition, nontreated units with a treatment effect of zero. Here, this basic idea is modified, and we assign an artificial treatment effect of z percent. Then, as in the standard case, we assess the likelihood that the artificially treated unit stems from the distribution of nontreated placebo units. This procedure then shows whether we have enough power to reject the null of no effect, given that we assigned a known treatment effect of z percent.
VI. Results
Section VI.A begins by evaluating the overall employment and wage effects of the city-level mandates using our county-level QCEW data set and the SCGM. For falsification exercises, we run four alternative SCGM modeling approaches that yield robust findings and provide better fits for a few counties (and worse fits for others). Next, we provide complementary evidence from traditional DD models and event studies. The subsequent subsection provides heterogeneity tests by specifically investigating effects in the construction and hospitality sectors at the county level; those sectors were particularly affected by the mandates. We also test for spillover effects of the policy to counties neighboring the treatment counties. Then, Section VI.B provides evidence on labor market effects at the state level. Analogous to the city-level case, we also investigate alternative modeling approaches, the construction and hospitality sector, and evidence for spillover effects. Section VI.C discusses potential explanations for possible heterogeneity in effect sizes.
A. Labor Market Effects of City-Level Mandates
1. County-level employment and wage effects using the standard SCGM approach
Figure 1 shows the evolution of county-level employment in five treatment counties as listed in Table 1. (The equivalent graphs for the remaining four counties are in the Online Appendix, in Figure A1.) In the left column of Figure 1, the solid lines represent the treatment counties, and the dashed lines represent the synthetic control counties. The composition of each synthetic control county—the weights Wof the J control counties—are in Table A1. The solid vertical lines at point zero on the x-axes represent the months when the sick pay mandates went into effect and were enforced. The dotted lines to the left indicate when the bills were passed; they test whether there is evidence of anticipation effects. The dotted lines to the right indicate the end of the accrual periods.
Employment Ratios in Treated vs. Synthetic Control Counties
Notes: The left column compares treated counties (solid lines) to the synthetic control counties (dashed lines). The composition of the synthetic control counties is in Table A1 in the Online Appendix. All SCGM analyses are in logs; graphs in the left column display the exponentiated values in levels. The right column shows the difference of the logarithm of the employment ratios between treatment and synthetic control groups along with placebo estimates for counties with prereform RMSPEs smaller than two times the prereform RMSPE of the treated county (gray lines). The left dashed vertical lines indicate when the law was passed, the middle solid vertical lines indicate when the law became effective, and the right dashed vertical lines indicate when the probation period was over. For more information about the sick pay reforms, see Table 1.
Source: QCEW (Bureau of Labor Statistics (BLS) 2018) and own calculation and illustration.
Figures 1 and A1 illustrate, first, substantial differences in employment rates. Whereas San Francisco and King County have employment rates of around 50 percent of the population, the rates for NYC and Philadelphia are below 40 percent. Second, the employment dynamics of treated and synthetic control counties are basically identical in the prereform periods, suggesting that the SCGM produces valid counterfactuals. (One obvious exception is Washington, DC, in Figure A1.) Third, visually, it is difficult to identify sizable and systematic reform-related employment effects. In postreform periods, the employment dynamics appear to be identical for all cities displayed. Fourth, to quantitatively evaluate the SCGM fit between treated and controls, to assess potential employment effects, and to conduct inference, we follow Abadie and Gardeazabal (2003) and Abadie, Diamond, and Hainmueller (2010) and show all relevant statistics in Panel A of Table 4.
Column 1 of Table 4 shows the Employment Ratio, Yit1, defined as private sector employment as a share of the county population—averaged over all prereform periods.
Column 2 shows the RMSPEs for prereform years as specified in Equation 3. Note that we take the logarithm of the outcome variable before minimizing. Thus, the values in Column 2 can be interpreted as percentages of the outcome variable. With the exception of Washington, DC (which we disregard due to a poor fit but show for completeness in Figure A1), all prereform RMSPEs are very low—at around 1 percent of the outcome measure. This implies a very successful replication of the employment dynamics of the treatment counties by the SCGM. As a comparison, evaluating the effects of a tobacco control program on cigarette consumption in California, Abadie, Diamond, and Hainmueller (2010) have a prereform RMSPE of 3 relative to a mean of about 100.
Column 3 shows the RMSPEs for postreform years. They appear to be slightly larger than prereform. This conjecture is confirmed by the RMPSE Ratios, which are shown in Column 4 and divide Column 3 by Column 2. The RMPSE Ratios lie between 1 for Alameda County and 4 for San Francisco. (As comparison, Abadie, Diamond, and Hainmueller (2010) report significant results and a ratio of 11.4.)
Next, we conduct inference using placebo methods (see Section V). For each treatment county, as described in Section IV.D, we select nontreated placebo counties with similar labor markets and demographics. Then we replicate the standard SCGM procedure with each placebo county pretending it had been treated at the same time as the real treatment county. Column 5 illustrates the calculation of the p-values for the hypothesis H0 : RMSPE RatioTreat ≤ RMSPE RatioPļacebo, which is simply the rank of the RMSPE Ratio of the treated county divided by the number of Total Counties Assessed. In other words, after calculating the RMPSE Ratio for each placebo county and ranking all of them, we can assess the position of the true RMPSE Ratio for the treated county in the test statistic distribution (Abadie, Diamond, and Hainmueller 2010). As seen in Column 5, the total number of SCGM runs for each treatment county varies between 83 and 199 (placebo + 1). Moreover, the ranks of the true treatment county lie between 23 (NYC) and 139 (Alameda). Accordingly, except for NYC (p = 0.13), none of the p-values is even close to being statistically significant at conventional levels.
We also calculate the sum of all p-values (excluding Washington, DC because of a poor pre-RMSPE fit) and then evaluate their joint p-value based on the Irwin–Hall distribution (Dube and Zipperer 2015). The overall p-value for the county-level employment effect is 0.25.
The right columns of Figures 1 and A1 display the permutation inference using placebo tests. Following the convention in the literature, the graphs plot the differences in the logarithms of the employment ratios (solid black) along with the differences for all placebo SCGM runs (gray) with good fit (RMSPEPlacebo ≤ RMSPETreat × 2). For prereform periods, the solid black line fluctuates very closely around the horizontal zero line, implying that the synthetic control units very closely map the employment dynamics of the treatment units. After the reform, which is indicated by the black solid vertical line, employment differentials between treated and control counties remain very small and flat for most counties. One exception is San Francisco, where the differential appears to be even positive, although it is not significant in a statistical sense.
Column 6 of Table 4 shows the Percent Treatment Effect (PTE), and Column 7 shows the Level Treatment Effect (LTE) for the postreform periods; the LTE is private sector employment as a share of the county population. As seen, the signs of the calculated treatment effects are ambiguous (four are negative and four are positive); none are statistically significant at conventional levels.
Finally, Columns 8 and 9 test whether we have enough statistical power to reject potential employment decreases of 3 and 2 percent, respectively (see Section V.B). Both columns provide p-values for the hypotheses H0 : PTETreat = 0; inference is again based on the rank of the RMSPE Ratio (Firpo and Possebom 2018). The bottom of Panel A provides the joint p-values for all counties (excluding Washington, DC), which is 0.08 in Column 8, implying that we could identify employment decreases of 2 percent with 92 percent statistical probability.
Next we evaluate wage effects using graphs and test statistics in Figures 2 and A2 and in Panel A of Table 5. The structure follows that of employment effects. Recall that the wages are quarterly nominal wages that have been detrended of seasonal fluctuations (Section IV).
Weekly Wages in Treated vs. Synthetic Control Counties
Notes: The left column compares treated counties (solid lines) to the synthetic control counties (dashed lines). The composition of the synthetic control counties is in Tables A2 in the Online Appendix. All SCGM analyses are in logs; graphs in the left column display the exponentiated values in levels. The right column shows the difference of the logarithm of the weekly wages between treated and synthetic control groups along with placebo estimates for counties with prereform RMSPEs smaller than two times the prereform RMSPE of the treated county (gray lines). The left dashed vertical lines indicate when the law was passed, the middle solid vertical lines indicate when the law became effective, and the right dashed vertical lines indicate when the probation period was over. For more information about the sick pay reforms, see Table 1.
Source: QCEW (Bureau of Labor Statistics (BLS) 2018) and own calculation and illustration.
Synthetic Control Group Method—Effect of Sick Pay Mandates on Weekly Wages
Figures 2 and A2 show positive wage dynamics representing rising nominal wages. Not only do the wage levels differ substantially between local labor markets, but so do the slopes representing wage growth. This is why we decided against further manipulation of the raw data, for example, correcting for the consumer price index. First, the SCGM is able to precisely replicate local and time-variant differences in wage dynamics. Actually, it is a method that is very well suited for such purposes. Second, because no monthly (or quarterly) county-level CPI measure is available, one would have to convert nominal wages into presumably “real” wages using a common discount rate, which, however, would not capture the properties of the local labor markets appropriately.
As illustrated by the many prereform RMSPEs below 0.03 (Column 2 of Table 5), most treatment regions show very good pretreatment fits among the treated and the synthetic control counties. However, using this standard SCGM approach, it was impossible to find synthetic control groups with “acceptable” fits for NYC and Hudson County.16 The reasons are the nonrepresentative wage levels in NYC (by far the highest wages among all treatment regions, Column 1 of Table 5), as well as in Jersey City (Hudson County).
Furthermore, Table 5 shows the statistically insignificant RMSPE Ratios (Column 5) and that the PTEs fluctuate without any clear trend between −0.5% (Essex County) and +5.4% (Alameda County) in Column 6. Overall, there is not much evidence for significantly weaker wage growth as a result of mandating sick pay. Visually, it is hard to detect substantial and systematic wage effects (Figures 2 and A2). According to the county joint tests (which exclude NYC and Hudson County because of the poor pre-RMSPE fit), we could identify wage decreases of 3 percent with a statistical probability of 95 percent. Below, in Section VI.C, we provide a detailed discussion of expected effect sizes. Because the mandates force employers to provide one hour of paid sick leave for every 30–40 hours worked, a static calculation that ignores administrative and psychological (“business climate”) costs would yield wage decreases of up to 3.3 percent for marginal firms.
2. Alternative SCGM modeling approaches
Figure 3 visually compares our standard modeling approach with four alternative SCGM modeling approaches for five counties and employment (left columns) as well as wages (right column). Figure A3 shows the results for the remaining four treatment counties. The black solid lines are the benchmark and depict the differences in outcome dynamics using our standard SCGM procedure; that is, they equal the right columns of Figures 1 and 2. (Note that we reset the SCGM algorithm and the selection of donor counties at the beginning of each of these alternative modeling approaches.)
Alternative SGCM Modeling—Employment (Left) and Wage (Right) Effects
Notes: The lines always show the difference of the employment ratios (left column) and weekly wages (right column) between treatment and synthetic control groups. The solid black lines show our standard modeling approach (right columns of Figures 1 and 2). The black dashed lines select synthetic control counties based on additional industry structure variables (Tables 2 and 3). The gray solid lines use six instead of four pretreatment years; the gray dashed lines use six instead of four years but stop applying the SGCM algorithm two years before the law’s enactment. The light gray solid lines use employment in levels, relative to employment in T–1. The left dashed vertical lines indicate when the law was passed and the solid vertical lines when the law became effective.
Source: QCEW (Bureau of Labor Statistics (BLS) 2018) and own calculation and illustration.
The black dashed lines represent an approach that uses additional covariates on the industry-structure of the county to select synthetic control and placebo counties (see Table 2 for the covariates and Section V.B for the procedure). The gray solid lines use six instead of four pretreatment years, and the gray dashed lines use six instead of four pretreatment years but stop applying the SGCM algorithm two years before the law’s enactment. Finally, the light gray solid lines use a modified outcome variable based on the log difference between the outcome in the current period and the period before the law’s enactment. The main statistical indicators for each approach are in Table A3 (Online Appendix).
In summary, the findings of our main modeling approach are fairly robust to these alternative modeling approaches. All lines fluctuate closely and mostly in a parallel fashion. However, there are a few instances where these alternative approaches, in particular taking the log difference between the current and the pre-reform period, help to improve the modeling fit for counties with a bad fit. This applies to employment in Washington, DC (Figure A3, left column, third row) and wages in Hudson County (Figure A3, right column, last row). In terms of content, these alternative modeling approaches with improved fit corroborate the main findings of no employment or wage effects. Also, while the modeling fit of a few treatment counties can be improved with alternative SCGM approaches, it worsens for other treatment counties. We conclude that the standard modeling approach performs reasonably well for the majority of counties, but alternative SCGM modeling techniques can help to improve the fit if this is not the case.
3. Traditional Difference-in-Differences Approaches
This subsection runs traditional DD models as robustness checks. We use the baseline QCEW data set as in Table 2 and keep all nontreated counties as control counties in the sample. Then we exploit variation in the implementation of city-level mandates across counties and over time by estimating the following model:
(6)
where yit = ln(Yit) is the logarithm of the outcome variables in county i at time t as above. TreatedCountyi is a dummy that indicates counties that implemented sick pay mandates, and Lawt is a postreform dummy. Thus, β represents the standard average DD treatment effect for postreform periods. In some specifications, we additionally estimate a second treatment coefficient, g, that provides information about the slope of the treatment effect over time for postreform periods (for example, to consider the possibility that the effects increase slowly over time; see Lafortune, Rothstein, and Schanzenbach 2018 for a similar application). δt represents month–year or quarter–year fixed effects, ρi are county fixed effects, and ρs × t are state-specific time trends. Zit is a vector of county–year specific control variables.
Table 6 shows the results for 14 DD models, where each column in each panel represents on model as in Equation 6. Panel A shows the findings for employment, and Panel B shows the findings for wages. Even and uneven columns differ by the sets of covariates included, as indicated in the table notes. We run three main specifications for both wages and employment, where LawEffectivet represents the month when the mandate became effective, LawPassedt the month when the bill was passed, and ProbationOvert the month when the accrual period was over (see Section III). In addition, Column 7 reports two specifications where we simultaneously control for TreatedCountyi × LawPassedt and TreatedCountyi × LawEffectivet.17
Traditional DD Models—Effect of Mandates on Employment and Wages at the County Level
None of the 14 main DD coefficient estimates is statistically significant from zero. Moreover, the approximate point estimates in percent of the outcome are relatively small and fluctuate around 0.5 percent. In addition, the estimates’ signs are not consistently positive or negative but alternate.
Finally, we plot two standard event studies in Figure 4. Technically, we replace the binary Lawt indicator in Equation 6 by a continuous time indicator counting the months (or quarters) up to and since the reform became effective. (Note that we also control for state-specific time trends in the event study specification.) The point estimates for these time dummies are then plotted in Figure 4, where zero on the x-axis indicates when the mandate became effective.
Event Studies from Traditional DD Models for County-Level Estimates
Notes: The graphs show event studies based on traditional DD models similar to Equation 6, where the treated county dummy is replaced by a time indicator that counts from 48 months before, up to 36 months after the enactment of the city-level sick pay mandates. The errors terms are clustered at the county level, and the gray areas depict 95% confidence intervals. For more information about the sick pay reforms, see Table 1. Source: QCEW (Bureau of Labor Statistics (BLS) 2018) and own calculation and illustration.
In line with the findings in Table 6, Figure 4 shows relatively smooth estimates without much trending. Almost none of the point estimates, either before or after the reform, are statistically significant from zero. This is illustrated by the gray bars, representing the 95 percent confidence interval, which fully cover the horizontal x-axis.
4. Evidence from the construction and hospitality sectors
Now, we zoom into specific sectors of the economy. So far, we have evaluated employment and wage effects for entire counties and all sectors. However, it is known that some sectors were more affected by the mandates than others. In particular, in the construction and service sector industries, prereform sick leave coverage rates had been very low—only at around 30 percent (Susser and Ziebarth 2016).
Figure 5 and Figures A4–A7 show results for employment in construction and hospitality. Due to space constraints, we graphically show only results for San Francisco, Philadelphia, King County, NYC, Multnomah, and Essex County (the remaining graphs are available upon request). The test statistics for all counties and states are in Table A4. For each sector and outcome variable, we reset the SCGM algorithm and the selection of donor counties using only labor market outcomes of the specific sector under consideration.
Construction and Hospitality—Employment in San Francisco and Philadelphia
Notes: The left column compares treated counties (solid lines) to the synthetic control counties (dashed lines). All SCGM analyses are in logs; graphs in the left column display the exponentiated values in levels. The right column shows the difference of the logarithm of the employment ratios between treatment and synthetic control groups along with placebo estimates for counties with prereform RMSPEs smaller than two times the prereform RMSPE of the treated county (gray lines). The left dashed vertical lines indicate when the law was passed, the middle solid vertical lines indicate when the law became effective, and the right dashed vertical lines indicate when the probation period was over. Table A4 in the Online Appendix shows the according statistics. For more information about the sick pay reforms, see Table 1.
Source: QCEW (Bureau of Labor Statistics (BLS) 2018) and own calculation and illustration.
First of all, focusing on specific industries within counties comes at the cost of slightly worse, but still acceptable, prereform RMSPE fits (Columns 2 and 7 of Table A4). The graphical evidence confirms this impression. Particularly for employment in the construction sectors of Philadelphia and King County, the SCGM does not perform well. But, given the very low levels of construction in these two counties (<1 percent), and the fact that we had to drop counties with zero construction employment (because we take the log before applying SCGM), this is perhaps not surprising. As for the other counties, we again do not see much evidence that employment systematically and significantly increased or decreased.
As for employment in the hospitality sector, the prereform RMSPE fit is better. However, the results still show no consistent pattern. Judged by the test statistics, most PTEs are positive (Table A4). The statistic for Washington, DC, is statistically significant at the 6 percent level (with a mediocre fit), and the statistic for San Francisco is statistically significant at the 9 percent level. Graphically, there appears to be suggestive evidence that employment in the hospitality sector may have even increased in San Francisco (Figure 5, right column, third row). No such evidence exists for the other counties.
The case is very similar for wages dynamics in both sectors, as shown by Figures 6, A6, and A7, as well as Table A4. When focusing on cases with a good pre-RSMPE fit, the evidence suggests either no effects (Philadelphia, Essex, and King County for construction) or suggestive light evidence for rising wages (King and Essex County for hospitality).
Construction and Hospitality—Weekly Wages in San Francisco and Philadelphia
Notes: The left column compares treated counties (solid line) to the synthetic control counties (dashed line). All SCGM analyses are in logs; graphs in the left column display the exponentiated values in levels. The right column shows the difference of the logarithm of the weekly wage between treated and synthetic control groups along with placebo estimates for counties with prereform RMSPEs smaller than two times the prereform RMSPE of the treated county (gray lines). The left dashed vertical lines indicate when the law was passed, the middle solid vertical lines indicate when the law became effective, and the right dashed vertical lines indicate when the probation period was over. Table A4 in the Online Appendix shows the according statistics. For more information about the sick pay reforms, see Table 1.
Source: QCEW (Bureau of Labor Statistics (BLS) 2018) and own calculation and illustration.
5. Testing for spillover effects on neighboring counties
The final robustness check tests whether there is any evidence for spillover effects of the mandates on neighboring counties. Although we do not find much evidence for systematic employment or wage effects, it is conceivable that some businesses relocated just outside the county boundaries to circumvent the mandate. It is also conceivable that hypothetical (positive or negative) labor market effects spread to neighboring counties.
Table A5 in the Online Appendix shows the results of the spillover tests for neighboring counties. First, although there are a few exceptions, Columns 1 and 5 show very good prereform RMSPE fits. Second, as shown by Columns 3 and 7, the PTEs do not have consistent signs: 18 out of 38 tested neighboring counties have negative employment signs, and the remainder have positive employment signs.
In terms of statistical significance, to conduct thorough statistical inference, one would have to calculate placebo estimates for all 38 neighboring counties and around 150 placebos for each neighbor (resulting in roughly 5,700 SCGM estimations). To avoid excessive calculations, we rely on the empirical distributions of the placebo estimates from Tables 4 and 5. The empirical distributions suggest that the cutoff for a p-value of 0.05 is at an RMSPE Ratio of 5.1 for employment and at an RMSPE Ratio of 6.7 for wages. The largest RMSPE Ratios in Table A5 are Contra Costa, CA (7.5) and Santa Clara, CA (6.7) for employment (with no values exceeding five for wages); these two estimates are probably significant in a statistical sense. The point estimates suggest a negative employment effect for these counties, which could imply that firms (and/or employees) relocated to the neighboring counties San Francisco or Alameda, where the mandates applied. On the other hand, we advise caution as these are only two out of 38 cases, which lies within a conventional 6 percent statistical error probability rate.
B. Labor Market Effects of State-Level Mandates
1. State-level employment and wage effects
The graphical evidence for the state-level results is in Figure 7 (employment) and Figure 8 (wages). Panels B of Tables 4 (employment) and 5 (wages) show the test statistics analogous to the city-level case. Note that we are able to differentiate by firm size and industry and only focus on employment and wage effects in treated firms and industries, that is, private service sector firms with more than 49 employees in Connecticut and private sector firms with more than nine employees in Massachusetts and Oregon. As above, Figures A8 and A9 (Online Appendix) provide robustness checks using four different modeling approaches.
Employment Ratios in Treated vs. Synthetic Control States
Notes: The left column compares treated states (solid line) to synthetic control states (dashed line). The composition of the synthetic control states is in Table A6 in the Online Appendix. All SCGM analyses are in logs; graphs in the left column display the exponentiated values in levels. The right column shows the difference of the logarithm of the employment ratios between treatment and synthetic control groups along with placebo estimates for counties with prereform RMSPEs smaller than two times the prereform RMSPE of the treated state (gray lines). The left dashed vertical lines indicate when the law was passed, the middle solid vertical lines indicate when the law became effective, and the right dashed vertical lines indicate when the probation period was over. In Connecticut, the treatment group consists of private sector firms with at least 50 employees; in Oregon and Massachusetts, the treatment group consists of private sector firms with at least 10 employees. For more information about the sick pay reforms, see Table 1.
Source: QCEW (Bureau of Labor Statistics (BLS) 2018) and own calculation and illustration.
Weekly Wages in Treated vs. Synthetic Control States
Notes: The left column compares treated states (solid line) to the synthetic control states (dashed line). The composition of the synthetic control states is in Table A7 in the Online Appendix. All SCGM analyses are in logs; graphs in the left column display the exponentiated values in levels. The right column shows the difference of the logarithm of the weekly wages between treatment and synthetic control groups along with placebo estimates for counties with prereform RMSPEs smaller than two times the prereform RMSPE of the treated state (gray lines). The left dashed vertical lines indicate when the law was passed, the middle solid vertical line indicates when the law became effective, and the right dashed vertical lines indicate when the probation period was over. In Connecticut, the treatment group consists of private sector firms with at least 50 employees; in Oregon and Massachusetts, the treatment group consists of private sector firms with at least 10 employees. For more information about the sick pay reforms, see Table 1.
Source: QCEW (Bureau of Labor Statistics (BLS) 2018) and own calculation and illustration.
Again, the graphs and test statistics provide clear and consistent evidence. First, the standard SCGM modeling approach performs well in most cases. In the few instances when the fit is not great (for example, California and wages), the alternative modeling approaches clearly improve the fit. Second, visually and when studying the test statistics, there is no evidence for systematic labor market effects of substantial size and significance. None of the RMSPE Ratios are statistically significant at conventional levels (Panel B of Tables 4 and 5).18 Third, if the effect sizes were significant, most of them would suggest that the employment and wage effects are rather positive than negative.
2. Evidence from the construction and hospitality sectors
In the Online Appendix, we show separate findings for the construction and hospitality industry in Connecticut (Figure A10), California (Figure A11), Massachusetts (Figure A12), and Oregon (Figure A13).
Again, the findings from the city-level mandates hold up. With some exceptions (for example, Massachusetts and wages), the standard SCGM performs well and is able to closely replicate the labor market dynamics of the treatment states using fractions of other states, as shown in Table A4. Moreover, the visual and analytic analysis does not provide evidence for significant employment or wage effects in any of the states or sectors.
3. Testing for spillover effects on exempt firms and sectors
As a very final test we investigate whether exempt firms within a state (because of size or industry) may have been affected by the mandates. In other words, we replicate the spillover analysis from above but do not test effects on neighboring states but rather on exempt firms and sectors in the same state (results are in Table A8). The largest RMSPE Ratio is 4.0 for Connecticut. This is comparable to other insignificant ratios in Tables 4 and 5. Thus, the results are again robust in the sense that we are unable to identify statistically significant and systematic labor market effects for exempt industries and firms.
C. Discussion of Effect Sizes
As shown by Tables 4 and 5, overall, there is very little evidence that employment or wages varied systematically as a result of the city or state-level sick pay mandates. The SCGM inference procedure (see Section V.B) almost never allows us to conclude that employment and wage dynamics have been significantly different in treated cities or states. Moreover, the sign and sizes of the PTEs and LTEs (Columns 6 and 7) do not follow a consistent pattern that, in our opinion, corroborates the main conclusion of no systematic employment or wage effects.
As discussed in the Introduction, the standard textbook example would predict negative wage effects as a result of mandated sick pay. A static calculation would yield wage decreases of up to 3.3 percent under several assumptions. Relaxing these assumptions would predict ambiguous wage effects, depending on the assumption. For example, how and whether sick leave affects work productivity is crucial, but there is no empirical causal evidence on this question. It could be reasonable to assume either, that overall work productivity increases or decreases when employees gain access to paid sick leave. Similarly, assumptions about (unobserved) administrative and psychological (“business climate”) costs appear to be crucial when making predictions about employment effects. Ultimately, we take the view that employment and wage effects are an empirical question. And we do not find evidence for systematic employment and wage effects.
There may be one or two exemptions, though. In Section VI.B, we find suggestive evidence for positive wage effects in the construction sector of Hudson County (NJ) and Alameda County (CA). Both effects are marginally significant and have good SCGM fits. Similarly, we find suggestive evidence for positive employment effects in the hospitality sector of San Francisco (CA)19 and positive wage effects in the hospitality sector of King County (WA). (On the other hand, one could argue that these are only four marginal cases out of 52 in Table A4, which lies entirely within a 10 percent false positive rate.)
There are several possibilities to rationalize positive employment and wage effects as a result of sick pay mandates. First, it is possible that sick pay mandates correct market inefficiencies and effectively reduce negative externalities, such as infection rates among coworkers or customers (Pichler and Ziebarth 2017). In fact, paradoxically, overall sick leave rates may fall when employees gain access to paid sick leave (Stearns and White 2018). When overall firm productivity goes up as a result of the mandate, it could explain stronger wage growth. Second, it could simply be the case that wages cannot flexibly adjust downward due to, for example, minimum wage laws. Third, in a standard labor supply model, a higher wage (and higher employment) can be a result of a downward-shifting labor supply curve (for example, because jobs become more attractive for employees) and an upward-shifting labor demand curve (for example, because customers demand more services), see Boeri and van Ours (2008). Finally, there exists anecdotal evidence from qualitative employer surveys that were conducted primarily in San Francisco after the first mandate was implemented in 2007. Boots, Martinson, and Danziger (2009) interviewed 26 employers and found that most of those implemented the mandate with “minimal to moderate effects on their overall business and their bottom line.” Moreover, “about half of the employers […] tried to offset or minimize their recent increased labor costs” by “changes in other benefits or delayed wage increases [… ]” (page 8).
VII. Discussion and Conclusion
Using the SCGM, this paper systematically evaluates the labor market consequences of nine city-level and four state-level sick pay mandates in the United States. The setting is well suited for the SCGM. First, especially when evaluating counties, we have a very rich pool of donor counties—in fact, thousands of them—that we can exploit to build synthetic control counties that map the labor market dynamics of the treated counties very closely. We also rely on many pretreatment observations. Matching treated–control labor market dynamics over long prereform time periods strengthens the identifying assumptions of the SCGM. Because several of our treated units are very small and geographically dispersed, we can also plausibly assume the absence of general equilibrium and spillover effects from treated to control regions. Additionally, because we rely on many different treatment units with diverse labor markets, our findings have a broad range of common support and arguably high external validity. Moreover, many treatment regions reduce the likelihood that unobserved shocks confounded postreform labor market dynamics systematically.
Opponents of sick pay mandates are mainly concerned with negative employment or wage effects. We do not find much evidence that employment and wage growth have been substantially and significantly dampened by mandating employers to allow employees to earn paid sick leave. This may be a function of how the U.S. laws are designed. In fact, they seem to be more incentive compatible than their European counterparts and minimize shirking behavior, a main concern of opponents. The reason for this incentive compatibility is that paid sick days are personalized, and employees “earn” them. For every 30–40 hours worked—that is, for every week a full-time employee works— employees earn one hour of paid sick leave. This means that employees earn about one day of paid sick leave for every two months worked, up to (typically) seven days per year. Unused sick days roll over to the next year. Because earned sick days represent a personalized insurance credit for future health shocks (similar to health savings accounts) that are likely to occur (for example, flu or disease of child), we expect shirking to play a minimal role for most employees.
However, wages and employment could still be significantly affected due to administrative burdens or psychological effects when employers overestimate the actual relevance for their businesses. We show, however, that this was very likely not the case. Our estimates let us exclude employment losses of more than 2 percent and wage reductions of more than 3 percent at conventional statistical levels. While an even higher statistical precision would always be desirable, we agree with Abadie (2018) and believe that much can be learned from such nonsignificant findings, especially in this policy-relevant context. In our opinion, the overall findings from nine city-level and four state-level mandates, in conjunction with a lack of systematically positive or negative point estimates (and rather small effect sizes), further corroborate our null findings.
Our findings suggest that neither employment nor wage growth has been significantly affected by U.S. sick pay mandates. However, the limitations of this study should be kept in mind, and more research is required. Although we evaluate nine city-level and four state-level mandates, these regions are not random subsamples of all U.S. regions. They tend to be relatively prosperous regions, governed by Democrats, and have also more labor market regulations, higher minimum wages, and stricter employment projections. It is thus unclear whether the conclusions would also hold up in less prosperous regions and regions with fewer labor market regulations.
Footnotes
The authors thank Philip Armour, Jonathan H. Cantor, Katherine Carman, Alexander Colvin, Davide Dragone, Italo Lopez Garcia, Rick Geddes, Laszlo Goerke, Peter Hudomiet, Peter Kuhn, Rafael Lalive, Martin Karlsson, Joao Montez, Kathleen Mullen, Sean Nicholson, Sarah Prenovitz, Stephanie Rennane, Dominic Rohner, Seth Seabury, Troy D. Smith, Eric Sjöberg, Stefan Staubli, Pascal St-Amour, J.H. Verkerke, Norman Waitzman, Jeffrey Wenger, Aaron Yelowitz, and the anonymous referees for excellent comments and suggestions that helped to improve the quality of this paper significantly. In particular, they thank Eric Chyn, Lauren Hersch Nicholas, and Stewart J. Schwab for excellent discussions of this paper, as well as participants in research seminars at Cornell University (PAM), HEC Lausanne, the University of Linz (Economics Department), the University of Innsbruck, RAND Corporation in SantaMonica, CA, the University of Utah, the 12th Annual Conference on Empirical Legal Studies in Ithaca, NY, the 2017 Essen Health Conference, the Nordic Health Economics Study Group meeting (NHESG) in Uppsala, the 2017 meetings of the Southern Economic Association in Tampa, FL, and the Verein for Socialpolitik (VfS) in Münster for their helpful comments and suggestions. The authors thank Katherine Wen for editing this paper. Generous funding from the Robert Wood Johnson Foundation’s Policies for Action Program (#74921) and the W.E. Upjohn Institute for Employment Research’s Early Career Research Awards (ECRA) program #17-155-15 is gratefully acknowledged. Neither the authors nor their employers have relevant or material financial interests that relate to the research described in this paper. They take responsibility for all remaining errors in and shortcomings of the paper. The data used in this article are available online: Quarterly Census of Employment and Wages, https://www.bls.gov/cew/datatoc.htm (accessed July 31, 2019).
Supplementary materials are freely available online at: http://uwpress.wisc.edu/journals/journals/jhr-supplementary.html
↵1. Other papers that apply the SCGM or variants include Billmeier and Nannicini (2013); Bohn, Lofstrom, and Raphael (2014); Bauhoff (2014); Bassok, Fitzpatrick, and Loeb (2014); Karlsson and Pichler (2015); and Restrepo and Rieger (2016).
↵2. Similar to the findings in this paper, Colla, Dow, and Dube (2017) do not find evidence that the 2008 employer health benefit mandate for nonsmall employers had a substantial effect on employment and wages in San Francisco.
↵3. In the case of San Francisco, two laws that went into effect January 2008 could potentially confound a clean assessment of the sick pay mandate. First, the minimum wage increased in predetermined steps annually from
↵4. Not included are the self-employed, army members, railroad employees, most elected officials, and most farm workers.
↵5. In total, the United States has 3,143 counties or county-equivalents. The missing counties in our data are counties without any official establishment location, for example, in very rural counties in Alaska.
↵6. To obtain one consistent baseline data set, we do not include all available data points from January 2001 to June 2016 but only include observations that we also use in the traditional difference-in-differences (Section VI.A), where we only consider data points up to 48 months prior to the treatment.
↵7. We sent an inquiry to the Bureau of Labor Statistics to double-check which fringe benefits are included in the reported quarterly wage and received the following response on May 30, 2018: “Covered employers’ contributions to old-age, survivors, and disability insurance; health insurance; UI; workers compensation; and private pension and welfare funds are not reported as wages. Employee contributions for the same purposes, however, as well as money withheld for income taxes, union dues, and so forth, are reported, even though they are deducted from the worker’s gross pay.”
↵8. We do not include Washington, DC in the state-level analysis because the synthetic control group fit was superior with counties.
↵9. Because the data by industry and firm size are only reported for the first quarter of each year, we impute values for the other quarters assuming that the first quarter ratios of, for example, <50 employees vs. >49 employees, remain stable in the other three quarters. For two firm size categories in Delaware, we impute missing values for 2014.
↵10. As another example, Jersey City (Hudson County) has many small entrepreneurial businesses and a large finance, insurance, and real estate industry. It lies just across the Hudson River opposite Manhattan.
↵11. These are Manhattan, Kings County (Brooklyn), Bronx County, Richmond County (Staten Island), and Queens County. We experimented with excluding Manhattan when averaging. The results are very similar and are available upon request.
↵12. In our main analysis, we do not consider the full set of industry-structure variables in Xj due to memory and computing constraints.
↵13. An alternative would be subsampling methods (Politis and Romano 1994; Saia 2017).
↵14. Because we only set the average effects to z percent, this method implicitly keeps the original variance between treated and synthetic control units.
↵15. Dube and Zipperer (2015) propose a similar test based on elasticities. Moreover, in a previous version (see Pichler and Ziebarth 2016), we construct the test statistic using the LTE and the PTE. However, as pointed out by an anonymous referee, Firpo and Possebom (2018) show in a simulation exercise that using the RMSPE Ratio is preferable because of statistical power.
↵16. There is no firm threshold for an “acceptable” fit defined by the literature. We consider the fit acceptable for an RMSPE < 0.1. Using our “alternative modeling approaches” below, we find acceptable fits. The same is true when we disregard Manhattan when evaluating NYC (available upon request). These alternative approaches also yield no evidence for treatment effects.
↵17. This is feasible as the time elapsed between the passage and the implementation of the mandates varies across regions. However, we abstained from additionally controlling for TreatedCountyt×ProbationOvert as the accrual period is almost always 90 days; thus we would run into multicollinearity issues, particularly with the quarterly wage data.
↵18. Note that the Irwin–Hall joint tests for counties and states at the bottom of Tables 4 and 5 implicitly assume that the cities and states were similarly affected by the mandates. However, because the industry structures and exceptions differ across regions (Table 1), this is not necessarily the case.
↵19. See Footnote 3 for a discussion of potential confounding factors.
- Received January 2017.
- Accepted August 2018.