Skip to content

chrisconlon/Liquor_Replication_JPE

Repository files navigation

Replication Package: Conlon and Rao

"The Cost of Curbing Externalities with Market Power: Alcohol Regulations and Tax Alternatives" Christopher T. Conlon and Nirupama S. Rao Forthcoming, Journal of Political Economy

Overview -- Replication options

This repository contains all code and non-proprietary data to replicate the tables and figures in the paper. The analysis estimates demand for distilled spirits in Connecticut using NielsenIQ retail scanner data and DISCUS shipment data, then simulates counterfactual tax policies. The replication package allows for three different entry points:

  1. Entry Point #1: this starts in cleaning_code/ and requires access to the NielsenIQ data from the Kilts Center at Chicago Booth. It processes the raw data using kiltsreader package. We include some sufficiently obfuscated DISCUS data, as well as the code in cleaning_code/reference that illustrates how the DISCUS and NielsenIQ data are combined in the original paper.

  2. Entry Point #2: this can be started WITHOUT access to the NielsenIQ data. It skips the cleaning_code/ steps and starts with the contents of estimation_data/, proc_data/ pre-populated, and begins with the estimation (in pyblp) of the demand model. Estimation of the full demand model can take around 24 hours on a standard computer.

  3. Entry Point #3: this skips the estimation of the demand model, and begins with the contents of estimation_outputs pre-populated. Because slow_paper.pickle is too large for GitHub, it must be manually downloaded and placed in estimation_outputs/ before starting. This entry point focuses on simulating counterfactuals (which can still take several hours) and begins in counterfactual_code.

Summary

If you do not have access to the raw NielsenIQ/DISCUS data, you can still replicate all estimation and counterfactual results:

  1. Clone this repository
  2. Download estimation results from Dataverse ([DOI link TBD]):
    • slow_paper.pickle → place in estimation_outputs/
  3. Install dependencies: pip install -r requirements.txt
  4. Run counterfactuals:
    cd counterfactual_code
    bash run_counterfactuals.sh
  5. Compile the paper PDF:
    bash paper/compile_paper.sh

This reproduces all tables, figures, and the paper PDF. See Detailed Instructions below.

Requirements

Software

  • Python 3.9+
  • R 4.0+ (only needed for Entry Point 1: cleaning step 1d)
  • PyBLP 0.13+

Install Python dependencies:

pip install -r requirements.txt

Install R dependencies (only needed for Entry Point 1):

Rscript install_r_dependencies.R

R packages used: tidyverse, arrow, fixest, broom, ranger, latex2exp

Proprietary Data (Entry Point 1 only)

The cleaning pipeline requires NielsenIQ data under a separate licensing agreement:

DISCUS shipment data (used in the original analysis) has been pre-processed into obfuscated firm-level shares provided in our_inputs/. See cleaning_code/reference/ for the original DISCUS processing scripts.

NielsenIQ data is not required for Entry Point 2.

Estimation Results (Dataverse)

The estimation pickle files (~100MB each) are hosted on Dataverse rather than in this repository:

  • Dataverse DOI: [TBD]
  • Download slow_paper.pickle and place in estimation_outputs/

Repository Structure

PH_replication/
├── cleaning_code/          # Stage 1: Data processing pipeline
├── estimation_code/        # Stage 2: Demand estimation (PyBLP)
├── counterfactual_code/    # Stage 3: Counterfactuals, tables & figures
├── paper/                  # LaTeX source + compile_paper.sh to build PDF
├── our_inputs/             # Author-constructed crosswalks, prices, product info
├── estimation_data/        # Estimation-ready datasets
├── estimation_outputs/     # Estimation results (.pickle from Dataverse, .parquet)
├── proc_data/              # Processed data (2 files provided; rest generated by Stage 1)
├── results/                # Counterfactual simulation results
├── tables_and_figures/     # Final tables (.tex) and figures (.pdf)
├── app_figures/            # Appendix tables and figures
├── raw_data/               # Public raw data (Census, TTB, NIH)
├── requirements.txt        # Python dependencies
├── install_r_dependencies.R # R dependencies
├── LICENSE                 # MIT License
└── README.md

Replication Instructions

The pipeline has three stages. Entry Point 2 users start at Stage 2 or Stage 3.

Stage 1: Data Cleaning (cleaning_code/)

Requires: NielsenIQ data (Entry Point 1 only)

  1. Configure data paths in cleaning_code/lp2023_config.py:
    • Set nielsen_root to your Kilts Center data directory
  2. Run:
    cd cleaning_code
    bash run_cleaning.sh

Pre-computed quantity data (monthly for prices, DISCUS-calibrated quarterly for estimation) and obfuscated DISCUS firm shares are provided in our_inputs/, so raw DISCUS data is not required. The original DISCUS processing scripts are archived in cleaning_code/reference/ for transparency.

Script Description
0_read_kilts.py Read raw NielsenIQ data (scanner + panel). Configure paths in lp2023_config.py. Run separately before the pipeline.
1c_monthly_prices.py Merge price data (wholesale, manufacturer, retail) with pre-computed quantities.
1d_impute_manuf.R Impute missing manufacturer prices (R).
1e_combine_monthly.py Combine monthly price and quantity data.
2_combine_quarterly.py Merge pre-computed quarterly quantities with price data.
3_process_agents.py Process NielsenIQ panelist data; estimate income distributions.
4_micro_moments_by_year.py Calculate micro moments for demand estimation.
5_data_prep_quarterly.py Prepare final estimation datasets, instruments, and ownership matrices.
6_appendix_checks.py Validation checks (appendix tables).

Outputs: estimation_data/, proc_data/

Stage 2: Demand Estimation (estimation_code/)

Requires: estimation_data/ (provided in repo)

Estimates demand using PyBLP. The main specification is slow (10+ hours, ~4GB of Memory) but runs on a laptop.

Script Description Resources
estimate_demand_logit.py Plain logit specification ~15 seconds, laptop OK
estimate_demand_nested.py Nested logit specification ~5 minutes, laptop OK
estimate_demand_pi.py Logit with demographic interactions ~20 minutes, laptop OK
estimate_demand_MAIN.py --fast Full RCNL model, fast mode ~1.5 hours, laptop OK (M2 Mac)
estimate_demand_MAIN.py Full RCNL model (main specification) ~10+ hours (M2 Mac), laptop marginal
post_estimation.py Compute marginal costs and markups Minutes

All scripts accept command-line arguments (see --help). Example:

cd estimation_code
bash run_estimation.sh       # runs all specs (logit, nested, pi, RCNL fast & slow, post-estimation)

All scripts accept --help for options. Results are saved to estimation_outputs/. Pre-computed results are available on Dataverse.

Stage 3: Counterfactuals and Figures (counterfactual_code/)

Requires: estimation_outputs/ (download from Dataverse), estimation_data/, proc_data/, results/

cd counterfactual_code
bash run_counterfactuals.sh

The script runs five stages:

Stage 3.1: Descriptive tables and figures

Script Output
0_nielsen_figures.py Price index figure (Figure 1)
1_descriptive_tables.py Summary statistics (Tables 1--2)
2_manufacturer_plots.py Manufacturer pricing (Figures 2--3)
2b_price_plots.py Wholesale/retail price comparison (Figures 4--5)

Stage 3.2: Estimation result tables

Script Output
4_parameter_tables.py Parameter estimates (Tables 3--4)
4b_appendix_micro_tables.py Micro moment fit and κ distribution (App Tables A1--A2)
5_diversion_plot.py Diversion ratios (Figure 6, App Tables A3, A7)
12_panel_regressions.py NIAAA consumption and CBP retail regressions (App Tables A9--A10). Requires pyfixest.

Stage 3.3: Counterfactual simulations (~4.5 hours on M2 Mac laptop)

Script Output
3a_compute_counterfactual.py --all Tax counterfactuals for all 24 markets
3b_collate_cf.py Collate parallel results
3c_compute_counterfactual_both_tax.py Combined ad valorem + specific tax

Stage 3.4: Counterfactual figures and tables

Script Output
6_counterfactual_plots.py Counterfactual figures (Figures 7--8), welfare tables (Tables 6--8)

Stage 3.5: Appendix notebooks

Notebook Output
7_appendix_sales_prices.ipynb Sales and price distributions
8_appendix_time_variation.ipynb Time variation in estimates
9_appendix_plot_both_tax.ipynb Combined tax counterfactual plots
10_tagging.ipynb Product tagging regressions
11_extra_response_plots.ipynb Response heatmaps and additional plots

All output is written to tables_and_figures/ and app_figures/.

Stage 4: Compiling the Paper PDF (paper/)

After running the counterfactual pipeline, compile the paper:

bash paper/compile_paper.sh

This script:

  1. Copies all pipeline-generated figures and tables from tables_and_figures/ and app_figures/ into paper/Figures/ and paper/Tables/
  2. Runs pdflatex + bibtex to produce paper/00_main.pdf

To sync outputs without recompiling:

bash paper/compile_paper.sh --no-compile

Three tables are manually maintained in paper/Tables/ and are not regenerated by the pipeline:

  • table5_taxes.tex — policy tax rate descriptions
  • ph_panelreg.tex — panel regression (run 12_panel_regressions.py separately; requires pyfixest)
  • app_tab_rho_sensitivity.tex — ρ sensitivity (deferred; requires long rho-grid computation)

Prerequisites: pdflatex and bibtex (e.g. via TeX Live or MacTeX).

Key Input Files

our_inputs/

File Description
master_product_info.parquet Product characteristics (size, proof, category, manufacturer). Key: master_prod_id.
crosswalk_nielsen.parquet Maps (upc, upc_ver_uc) to master_prod_id.
crosswalk_wholesale.parquet Maps (firm_id, firm_prod_id) to master_prod_id.
all_quantity_monthly.parquet Pre-computed monthly quantity data for price processing (step 1c).
all_quantity_quarterly.parquet DISCUS-calibrated quarterly quantities for estimation (step 2).
discus_shares_quarterly.parquet Obfuscated DISCUS firm-level shares for ownership matrix computation.
prices_monthly_cleaned.parquet Monthly posted wholesale prices from CT DCP.
manuf_prices_raw.parquet Raw manufacturer pricing data.
manufacturer_info.parquet Manufacturer-level information.
firm_ids.xlsx Wholesaler ID to company name mapping.
state_taxes.xlsx 2013 state-level excise tax rates.
clearance_list.xlsx Product-periods flagged as clearance items.
default_prices.xlsx Default prices for missing entries.

raw_data/

File Description
pcyr1970-2020.txt NIAAA per-capita alcohol consumption by state and beverage type (1970--2020).
cps_asec_age.parquet IPUMS CPS-ASEC: age and person weights, 1983--2010.
cps_asec_income.parquet IPUMS CPS-ASEC: household income and weights, 1983--2010.
cbp_liquor_state.parquet Census County Business Patterns: state-level liquor retail, 1986--2010.

License

MIT License. See LICENSE.

Citation

@article{conlon_rao_liquor,
  title={The Cost of Curbing Externalities with Market Power: Alcohol Regulations and Tax Alternatives},
  author={Conlon, Christopher T. and Rao, Nirupama S.},
  journal={Journal of Political Economy},
  year={Forthcoming}
}

About

Replication package for Conlon & Rao 'The Cost of Curbing Externalities with Market Power' (JPE)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors