Tidy Tuesday: Union Membership in the United States

Tidy Tuesday
Author

Adam Cseresznye

Published

September 5, 2023

Photo by Claudio Schwarz

Happy Labor Day! 🛠️ As we celebrate the achievements of workers and the contributions they’ve made to society, what better way to delve into the world of labor and employment than with a fresh Tidy Tuesday dataset? This week’s data comes from the comprehensive “Union Membership, Coverage, and Earnings from the CPS” dataset, courtesy of Barry Hirsch from Georgia State University, David Macpherson from Trinity University, and William Even from Miami University.

This dataset is a rich source of information, shedding light on the intricate dynamics of union membership, coverage, and earnings. So, grab your coffee, get comfortable, and let’s embark on this Tidy Tuesday journey together, exploring the intricate tapestry of labor data!

Import data

Code
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
import geopandas
from lets_plot.geo_data import *

from lets_plot import *
from lets_plot.mapping import as_discrete

LetsPlot.setup_html()

There are three distinct datasets to investigate: one focusing on demographic information, another centered on wage data, and the third associated with various states. Let’s read them all in.

Code
demographic_df = pd.read_csv(
    "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-09-05/demographics.csv"
)
wages_df = pd.read_csv(
    "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-09-05/wages.csv"
)
states_df = pd.read_csv(
    "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-09-05/states.csv"
)

Data dictionaries

Here are the corresponding data dictionaries:

Demographics data: Data sources:

  • 1973-1981: May Current Population Survey (CPS)
  • 1982: No union questions available
  • 1983-2022: CPS Outgoing Rotation Group (ORG) Earnings Files

The definition of union membership was expanded in 1977 to include “employee associations similar to a union”.

variable class description
year double When the data was collected.
sample_size double The number of wage and salary workers ages 16 and over who were surveyed.
employment double Wage and salary employment in thousands.
members double Employed workers who are union members in thousands.
covered double Workers covered by a collective bargaining agreement in thousands.
p_members double Percent of employed workers who are union members.
p_covered double Percent of employed workers who are covered by a collective bargaining agreement.
facet character The sector or demographic group contained in this row of data.

Wages data: Data sources:

  • 1973-1981: May Current Population Survey (CPS)
  • 1982: No union questions available
  • 1983-2022: CPS Outgoing Rotation Group (ORG) Earnings Files

The definition of union membership was expanded in 1977 to include “employee associations similar to a union”.

variable class description
year double When the data was collected.
sample_size double The number of wage and salary workers ages 16 and over who were surveyed and provided earnings and hours worked information.
wage double Mean hourly earnings in nominal dollars.
at_cap double Percent of workers with weekly earnings at the top code of $999 through 1988, $1923 in 1989-97, and $2885 beginning in 1998, with individuals assigned mean earnings above the cap based on annual estimates of the gender-specific Pareto distribution.
union_wage double Mean wage among union members.
nonunion_wage double Mean wage among nonunion workers.
union_wage_premium_raw double The percentage difference between the union and nonunion wage.
union_wage_premium_adjusted double Estimated as exp(b)-1 where b is the regression coefficient on a union membership variable (equal to 1 if union and 0 otherwise) from a semi-logarithmic wage equation, with controls included for worker/job characteristics. Included in the all-worker wage equation are the control variables: years of schooling, potential years of experience [proxied by age minus years of schooling minus 6] and its square [both interacted with gender], and categorical variables for marital status, race and ethnicity, gender, part-time, large metropolitan area, state, public sector, broad industry, and broad occupation. Controls are omitted, as appropriate, for estimates within sectors or by demographic group [i.e., by class, gender, race, or industry sector]. Workers who do not report earnings but instead have them imputed [i.e., assigned] by the Census are removed from the estimation samples in all years, except 1994 and 1995 when imputed earners cannot be identified. Inclusion of imputed earners causes union wages to be understated, nonunion wages overstated, and union-nonunion wage differences understated. For 1994-95, the sample includes imputed earners and estimates in those years have been adjusted to remove the bias from imputation.
facet character The sector or demographic group contained in this row of data.

States data:

Data source: Current Population Survey (CPS) Outgoing Rotation Group (ORG) Earnings Files

variable class description
state_census_code double Census state code used in CPS
state character State name.
sector character Employment sector.
observations double CPS sample size.
employment double Wage and salary employment in thousands.
members double Employed workers who are union members in thousands.
covered double Workers covered by a collective bargaining agreement in thousands.
p_members double Percent of employed workers who are union members.
p_covered double Percent of employed workers who are covered by a collective bargaining agreement.
state_abbreviation character State abbreviation.
year double Year of the survey.

As in our previous Tidy Tuesday blog, I believe we can address the following questions using this dataset:

  • What are the overarching trends in the labor force, particularly regarding union memberships?
  • Do specific demographic groups or occupations display a higher likelihood of union membership?
  • Do union members experience any financial advantages or benefits compared to non-union workers?
  • Which states have the highest number of union members or affiliated unions?

Do specific demographic groups or occupations display a higher likelihood of union membership?

Next, our investigation will focus on identifying potential demographic disparities among workforce members and uncovering the professions with the highest likelihood of union membership. We’ll utilize the “p_members” column, which represents the percentage of employed workers who are union members, to delve into these aspects.

As evident in Figure 2, a substantial decrease in union membership is observable across nearly all demographic groups over the decades.

Code
(
    demographic_df.query("facet.str.contains('demographics')")
    .assign(facet=lambda df: df.facet.replace({"demographics:": ""}, regex=True))
    .pipe(
        lambda df: ggplot(df, aes("year", "p_members", color="facet"))
        + geom_line(
            size=1.5,
        )
        + labs(
            title="Demographic Trends in Union Membership as a Percentage of the Total Workforce Over Time",
            subtitle="Almost all demographic groups exhibit a significant decline in union membership over the decades.",
            x="",
            y="Percent of employed workers who are union members",
            caption="© 2023 by Barry T. Hirsch, David A. Macpherson, and William E. Even.",
        )
        + theme(
            plot_subtitle=element_text(size=12, face="italic"),
            plot_title=element_text(size=15, face="bold"),
        )
        + ggsize(800, 500)
    )
)
Figure 2: Evolution of Workforce Size and Union Membership Across the Years

While certain professions exhibit declining trends in union memberships, it’s noteworthy that some public sector occupations, such as postal service, police, and local government jobs, maintain the highest and most consistent levels of union participation (Figure 3).

Code
(
    demographic_df.query(
        "(~facet.str.contains('demographics')) and (~facet.str.contains('all'))"
    ).pipe(
        lambda df: ggplot(df, aes("year", "p_members", color="facet"))
        + geom_line(
            size=1.5,
        )
        + labs(
            title="Workforce Sector-Specific Trends in Union Membership",
            subtitle="Public sector occupations demonstrate higher union participation rates.",
            x="",
            y="Percent of employed workers who are union members",
            caption="© 2023 by Barry T. Hirsch, David A. Macpherson, and William E. Even.",
        )
        + theme(
            plot_subtitle=element_text(size=12, face="italic"),
            plot_title=element_text(size=15, face="bold"),
        )
        + ggsize(800, 500)
    )
)
Figure 3: Workforce Sector-Specific Trends in Union Membership

Do union members experience any financial advantages or benefits compared to non-union workers?

The next figure is both straightforward and highly impactful, providing a direct comparison of average wages for union members versus non-union members.

Code
(
    wages_df.query("facet == 'all wage and salary workers'")
    .drop(
        columns=[
            "wage",
            "sample_size",
            "at_cap",
            "union_wage_premium_raw",
            "union_wage_premium_adjusted",
            "facet",
        ]
    )
    .melt(id_vars="year", var_name="Description")
    .pipe(
        lambda df: ggplot(df, aes("year", "value", color="Description"))
        + geom_line(
            size=1.5,
        )
        + labs(
            title="Differences in Mean Hourly Earnings for Union and Non-Union Workers",
            subtitle="Union workers consistently display higher mean wages in comparison to non-union members.",
            x="",
            y="Mean hourly earnings in nominal dollars",
            caption="© 2023 by Barry T. Hirsch, David A. Macpherson, and William E. Even.",
        )
        + theme(
            plot_subtitle=element_text(size=12, face="italic"),
            plot_title=element_text(size=15, face="bold"),
        )
    )
)
Figure 4: Differences in Mean Hourly Earnings for Union and Non-Union Workers

As with our previous figures, we will now delve into demographic variations and job sector disparities. To unveil these potential distinctions, we will use the “union_wage_premium_raw” column, which indicates the percentage difference between union and non-union wages.

The most significant disparities in hourly wages (Figure 5) between unionized and non-unionized workers are observed within the construction sector. These disparities have fluctuated over the decades, ranging from below 40% to as high as 80%. It’s important to note that almost all sectors exhibit higher earnings for union members, except for federal, manufacturing, and wholesale/retail workers (particularly in recent times).

Code
(
    wages_df.query(
        "(~facet.str.contains('demographics')) and (~facet.str.contains('all'))"
    ).pipe(
        lambda df: ggplot(df, aes("year", "union_wage_premium_raw", color="facet"))
        + geom_line(
            size=1.5,
        )
        + labs(
            title="Sector-specific Differences in Mean Hourly Earnings for Union and Non-Union Workers",
            subtitle="The disparities between wages range from -20% to +80%.",
            x="",
            y="The percentage difference between the union and nonunion wage",
            caption="© 2023 by Barry T. Hirsch, David A. Macpherson, and William E. Even.",
        )
        + theme(
            plot_subtitle=element_text(size=12, face="italic"),
            plot_title=element_text(size=15, face="bold"),
        )
        + ggsize(800, 500)
        + geom_hline(yintercept=0, linetype=5, color="#6c6c6c", size=1)
    )
)
Figure 5: Sector-specific Differences in Mean Hourly Earnings for Union and Non-Union Workers

In addition to gender-specific differences, it’s intriguing to observe the relationship between education levels and wage disparities between unionized and non-unionized workers. Workers with education levels below college tend to benefit more from union membership, whereas, in general, individuals with higher education levels tend to have lower wages when affiliated with a union.

Code
(
    wages_df.query("facet.str.contains('demographics')")
    .assign(facet=lambda df: df.facet.replace({"demographics:": ""}, regex=True))
    .pipe(
        lambda df: ggplot(df, aes("year", "union_wage_premium_raw", color="facet"))
        + geom_line(
            size=1.5,
        )
        + labs(
            title="Demographics-specific Differences in Mean Hourly Earnings for Union and Non-Union Workers",
            subtitle="It appears that females tend to benefit more from union membership in comparison to males, \nparticularly in terms of the percentage difference between union and non-union wages.",
            x="",
            y="The percentage difference between the union and nonunion wage",
            caption="© 2023 by Barry T. Hirsch, David A. Macpherson, and William E. Even.",
        )
        + theme(
            plot_subtitle=element_text(size=12, face="italic"),
            plot_title=element_text(size=15, face="bold"),
        )
        + ggsize(800, 500)
        + geom_hline(yintercept=0, linetype=5, color="#6c6c6c", size=1)
    )
)
Figure 6: Demographics-specific Differences in Mean Hourly Earnings for Union and Non-Union Workers

Which states have the highest number of union members or affiliated unions?

To illustrate variations in union membership across states, we will leverage Lets Plot’s capabilities for visualizing spatial data. Given the extensive dataset, our focus will be on highlighting a subset of the data, specifically the most recent trends in 2022 across all sectors.

As evident in Figure 7, certain states like New York, Alaska, Hawaii, Washington, California, and Oregon have a significant portion of their workforce as union members. Conversely, states like Utah, South Dakota, and the Carolinas display lower percentages of union membership among their workforce.

Code
filtered_data = states_df.query("sector == 'Total' and year == 2022")
boundaries = (
    geocode_states(filtered_data["state"])
    .scope("USA")
    .inc_res(4)
    .get_boundaries(resolution=4)
)

(
    ggplot()
    + geom_livemap()
    + geom_polygon(
        aes(color="p_members", fill="p_members"),
        data=states_df.query("sector == 'Total' and year == 2022"),
        map=boundaries,
        alpha=0.9,
        map_join=[["state"], ["state"]],
        show_legend=False,
        size=0.01,
    )
    + theme(
        axis_title="blank", axis_text="blank", axis_ticks="blank", axis_line="blank"
    )
    + scale_fill_brewer(palette="Greens")
    + ggsize(800, 500)
)
Figure 7: Percentage of Union Members Across States

There you have it! I hope you found this week’s Tidy Tuesday analysis insightful. I would encourage everyone to dive deeper into this dataset as I’ve only scratched the surface; there’s a wealth of knowledge waiting to be uncovered here.

Happy coding, and until next time, see you in our next exploration!