This repository contains the data and R code for our PS170A final project, which analyzes whether the demographic characteristics of state legislators elected in 2022 influenced the demographic composition of candidate pools in 2024 state house races.
Does electing legislators from underrepresented groups (women and non-white individuals) lead to more diverse candidate pools in subsequent elections? This study examines state house districts in Georgia, Michigan, and Nevada to test theories of descriptive representation and candidate emergence.
PS170A/
├── code/
│ ├── 1_County_FIPS_Extraction.R # Assigns counties to districts
│ ├── 2_Census_Demographic_Data.R # Pulls district demographics from Census
│ ├── 3_Race_Gender_Algorithm.R # Predicts race/gender of candidates
│ ├── 4_Candidate_Demographic_Percentages.R # Aggregates candidate demographics
│ ├── 5_Final_Merge.R # Merges all datasets
│ ├── 6_Regression_Analysis.R # Runs regression models
│ └── Random_Sample.R # Validation sampling
├── data/
│ ├── original_data/ # Raw input data
│ └── modified_data/ # Processed intermediate files
├── results/
│ ├── plot_full_nonwhite_legislator.png # Race effects, full sample
│ ├── plot_full_woman_legislator.png # Gender effects, full sample
│ ├── plot_noninc_nonwhite_legislator.png # Race effects, non-incumbent sample
│ ├── plot_noninc_woman_legislator.png # Gender effects, non-incumbent sample
│ ├── regression_table_full.pdf # Regression results, full sample
│ ├── regression_table_nonincumbent.pdf # Regression results, non-incumbent sample
├── .Renviron.example # Template for API keys
└── PS170A_Descriptive_Representation.pdf # Final paper (PDF)
└── PS170A_Final_Poster.pdf # Final presentation (PDF)
└── install_packages.R # R package installation script
plot_full_nonwhite_legislator.png Reorganize project: rename folders to lowercase, move results to top … 1 minute ago plot_full_woman_legislator.png Reorganize project: rename folders to lowercase, move results to top … 1 minute ago plot_noninc_nonwhite_legislator.png Reorganize project: rename folders to lowercase, move results to top … 1 minute ago plot_noninc_woman_legislator.png Reorganize project: rename folders to lowercase, move results to top … 1 minute ago regression_table_full.pdf Reorganize project: rename folders to lowercase, move results to top … 1 minute ago regression_table_nonincumbent.pdf
The easiest way is to run the provided installation script:
source("install_packages.R")Or install manually:
# Install CRAN packages
install.packages(c(
"tidyverse",
"here",
"wru",
"tidycensus",
"sf",
"tigris",
"readxl",
"writexl",
"modelsummary",
"modelr",
"devtools"
))
# Install genderizeR from GitHub (archived from CRAN)
devtools::install_github("kalimu/genderizeR")Note: The genderizeR package was archived from CRAN and must be installed from GitHub.
This project requires two API keys:
-
Genderize.io API Key - For gender prediction from first names
- Sign up at: https://genderize.io
-
Census API Key - For pulling demographic data and WRU race prediction
- Sign up at: https://api.census.gov/data/key_signup.html
To configure:
- Copy
.Renviron.exampleto.Renvironin the project root - Add your API keys to the
.Renvironfile:GENDERIZE_API_KEY=your_key_here CENSUS_API_KEY=your_key_here - Restart R for changes to take effect
Important: Never commit your .Renviron file to version control.
The scripts use the here package for portable file paths. Before running:
- Open R in the project root directory
- Run
here::here()to verify it points to the correct location - If needed, create a
.herefile or.Rprojfile in the root to anchor the project
Execute the R scripts in the following order:
- County_FIPS_Extraction.R - Downloads shapefiles and assigns each district to its primary county
- Census_Demographic_Data.R - Pulls district-level race and gender demographics from the Census ACS
- Race_Gender_Algorithm.R - Predicts race and gender for all candidates and legislators using genderize.io and WRU BISG
- Candidate_Demographic_Percentages.R - Calculates the percentage of non-white and female candidates per district
- Final_Merge.R - Merges legislator data, candidate data, competitiveness, and demographics
- Regression_Analysis.R - Runs OLS regression models and generates output tables/plots
- Candidate/Legislator Data: State election records for GA, MI, NV
- District Competitiveness: Election margin of victory data
- District Demographics: American Community Survey (ACS) 5-year estimates (2022)
- Geographic Data: TIGER/Line Shapefiles from U.S. Census Bureau
- Gender Prediction: Uses the genderize.io API based on first names
- Race Prediction: Uses the WRU package's Bayesian Improved Surname Geocoding (BISG) method, incorporating surname, geography, and party registration
- Statistical Analysis: OLS regression with controls for district demographics and electoral competitiveness
The analysis produces:
- Regression tables comparing bivariate and full models (HTML format)
- Scatter plots with regression lines showing the relationship between legislator diversity and candidate pool diversity
- Separate analyses for full sample and non-incumbent candidates only
- Jessica Persano
- Xuanting Fan
- Jolie Anderson
This project is licensed under the MIT License - see the LICENSE file for details.
- UCLA Political Science Department (PS 170A)
- U.S. Census Bureau for demographic data
- genderize.io for gender prediction API
- Authors of the
wrupackage for race prediction methodology