"The Cost of Curbing Externalities with Market Power: Alcohol Regulations and Tax Alternatives" Christopher T. Conlon and Nirupama S. Rao Forthcoming, Journal of Political Economy
This repository contains all code and non-proprietary data to replicate the tables and figures in the paper. The analysis estimates demand for distilled spirits in Connecticut using NielsenIQ retail scanner data and DISCUS shipment data, then simulates counterfactual tax policies. The replication package allows for three different entry points:
-
Entry Point #1: this starts in
cleaning_code/and requires access to the NielsenIQ data from the Kilts Center at Chicago Booth. It processes the raw data usingkiltsreaderpackage. We include some sufficiently obfuscated DISCUS data, as well as the code incleaning_code/referencethat illustrates how the DISCUS and NielsenIQ data are combined in the original paper. -
Entry Point #2: this can be started WITHOUT access to the NielsenIQ data. It skips the
cleaning_code/steps and starts with the contents ofestimation_data/,proc_data/pre-populated, and begins with the estimation (inpyblp) of the demand model. Estimation of the full demand model can take around 24 hours on a standard computer. -
Entry Point #3: this skips the estimation of the demand model, and begins with the contents of
estimation_outputspre-populated. Becauseslow_paper.pickleis too large for GitHub, it must be manually downloaded and placed inestimation_outputs/before starting. This entry point focuses on simulating counterfactuals (which can still take several hours) and begins incounterfactual_code.
If you do not have access to the raw NielsenIQ/DISCUS data, you can still replicate all estimation and counterfactual results:
- Clone this repository
- Download estimation results from Dataverse ([DOI link TBD]):
slow_paper.pickle→ place inestimation_outputs/
- Install dependencies:
pip install -r requirements.txt - Run counterfactuals:
cd counterfactual_code bash run_counterfactuals.sh - Compile the paper PDF:
bash paper/compile_paper.sh
This reproduces all tables, figures, and the paper PDF. See Detailed Instructions below.
- Python 3.9+
- R 4.0+ (only needed for Entry Point 1: cleaning step
1d) - PyBLP 0.13+
Install Python dependencies:
pip install -r requirements.txtInstall R dependencies (only needed for Entry Point 1):
Rscript install_r_dependencies.RR packages used: tidyverse, arrow, fixest, broom, ranger, latex2exp
The cleaning pipeline requires NielsenIQ data under a separate licensing agreement:
- NielsenIQ: Retail scanner and consumer panel data for Connecticut (2007--2014). Available through the Kilts Center at Chicago Booth.
DISCUS shipment data (used in the original analysis) has been pre-processed into obfuscated firm-level shares provided in our_inputs/. See cleaning_code/reference/ for the original DISCUS processing scripts.
NielsenIQ data is not required for Entry Point 2.
The estimation pickle files (~100MB each) are hosted on Dataverse rather than in this repository:
- Dataverse DOI: [TBD]
- Download
slow_paper.pickleand place inestimation_outputs/
PH_replication/
├── cleaning_code/ # Stage 1: Data processing pipeline
├── estimation_code/ # Stage 2: Demand estimation (PyBLP)
├── counterfactual_code/ # Stage 3: Counterfactuals, tables & figures
├── paper/ # LaTeX source + compile_paper.sh to build PDF
├── our_inputs/ # Author-constructed crosswalks, prices, product info
├── estimation_data/ # Estimation-ready datasets
├── estimation_outputs/ # Estimation results (.pickle from Dataverse, .parquet)
├── proc_data/ # Processed data (2 files provided; rest generated by Stage 1)
├── results/ # Counterfactual simulation results
├── tables_and_figures/ # Final tables (.tex) and figures (.pdf)
├── app_figures/ # Appendix tables and figures
├── raw_data/ # Public raw data (Census, TTB, NIH)
├── requirements.txt # Python dependencies
├── install_r_dependencies.R # R dependencies
├── LICENSE # MIT License
└── README.md
The pipeline has three stages. Entry Point 2 users start at Stage 2 or Stage 3.
Requires: NielsenIQ data (Entry Point 1 only)
- Configure data paths in
cleaning_code/lp2023_config.py:- Set
nielsen_rootto your Kilts Center data directory
- Set
- Run:
cd cleaning_code bash run_cleaning.sh
Pre-computed quantity data (monthly for prices, DISCUS-calibrated quarterly for estimation) and obfuscated DISCUS firm shares are provided in our_inputs/, so raw DISCUS data is not required. The original DISCUS processing scripts are archived in cleaning_code/reference/ for transparency.
| Script | Description |
|---|---|
0_read_kilts.py |
Read raw NielsenIQ data (scanner + panel). Configure paths in lp2023_config.py. Run separately before the pipeline. |
1c_monthly_prices.py |
Merge price data (wholesale, manufacturer, retail) with pre-computed quantities. |
1d_impute_manuf.R |
Impute missing manufacturer prices (R). |
1e_combine_monthly.py |
Combine monthly price and quantity data. |
2_combine_quarterly.py |
Merge pre-computed quarterly quantities with price data. |
3_process_agents.py |
Process NielsenIQ panelist data; estimate income distributions. |
4_micro_moments_by_year.py |
Calculate micro moments for demand estimation. |
5_data_prep_quarterly.py |
Prepare final estimation datasets, instruments, and ownership matrices. |
6_appendix_checks.py |
Validation checks (appendix tables). |
Outputs: estimation_data/, proc_data/
Requires: estimation_data/ (provided in repo)
Estimates demand using PyBLP. The main specification is slow (10+ hours, ~4GB of Memory) but runs on a laptop.
| Script | Description | Resources |
|---|---|---|
estimate_demand_logit.py |
Plain logit specification | ~15 seconds, laptop OK |
estimate_demand_nested.py |
Nested logit specification | ~5 minutes, laptop OK |
estimate_demand_pi.py |
Logit with demographic interactions | ~20 minutes, laptop OK |
estimate_demand_MAIN.py --fast |
Full RCNL model, fast mode | ~1.5 hours, laptop OK (M2 Mac) |
estimate_demand_MAIN.py |
Full RCNL model (main specification) | ~10+ hours (M2 Mac), laptop marginal |
post_estimation.py |
Compute marginal costs and markups | Minutes |
All scripts accept command-line arguments (see --help). Example:
cd estimation_code
bash run_estimation.sh # runs all specs (logit, nested, pi, RCNL fast & slow, post-estimation)All scripts accept --help for options. Results are saved to estimation_outputs/. Pre-computed results are available on Dataverse.
Requires: estimation_outputs/ (download from Dataverse), estimation_data/, proc_data/, results/
cd counterfactual_code
bash run_counterfactuals.shThe script runs five stages:
Stage 3.1: Descriptive tables and figures
| Script | Output |
|---|---|
0_nielsen_figures.py |
Price index figure (Figure 1) |
1_descriptive_tables.py |
Summary statistics (Tables 1--2) |
2_manufacturer_plots.py |
Manufacturer pricing (Figures 2--3) |
2b_price_plots.py |
Wholesale/retail price comparison (Figures 4--5) |
Stage 3.2: Estimation result tables
| Script | Output |
|---|---|
4_parameter_tables.py |
Parameter estimates (Tables 3--4) |
4b_appendix_micro_tables.py |
Micro moment fit and κ distribution (App Tables A1--A2) |
5_diversion_plot.py |
Diversion ratios (Figure 6, App Tables A3, A7) |
12_panel_regressions.py |
NIAAA consumption and CBP retail regressions (App Tables A9--A10). Requires pyfixest. |
Stage 3.3: Counterfactual simulations (~4.5 hours on M2 Mac laptop)
| Script | Output |
|---|---|
3a_compute_counterfactual.py --all |
Tax counterfactuals for all 24 markets |
3b_collate_cf.py |
Collate parallel results |
3c_compute_counterfactual_both_tax.py |
Combined ad valorem + specific tax |
Stage 3.4: Counterfactual figures and tables
| Script | Output |
|---|---|
6_counterfactual_plots.py |
Counterfactual figures (Figures 7--8), welfare tables (Tables 6--8) |
Stage 3.5: Appendix notebooks
| Notebook | Output |
|---|---|
7_appendix_sales_prices.ipynb |
Sales and price distributions |
8_appendix_time_variation.ipynb |
Time variation in estimates |
9_appendix_plot_both_tax.ipynb |
Combined tax counterfactual plots |
10_tagging.ipynb |
Product tagging regressions |
11_extra_response_plots.ipynb |
Response heatmaps and additional plots |
All output is written to tables_and_figures/ and app_figures/.
After running the counterfactual pipeline, compile the paper:
bash paper/compile_paper.shThis script:
- Copies all pipeline-generated figures and tables from
tables_and_figures/andapp_figures/intopaper/Figures/andpaper/Tables/ - Runs
pdflatex+bibtexto producepaper/00_main.pdf
To sync outputs without recompiling:
bash paper/compile_paper.sh --no-compileThree tables are manually maintained in paper/Tables/ and are not regenerated by the pipeline:
table5_taxes.tex— policy tax rate descriptionsph_panelreg.tex— panel regression (run12_panel_regressions.pyseparately; requirespyfixest)app_tab_rho_sensitivity.tex— ρ sensitivity (deferred; requires long rho-grid computation)
Prerequisites: pdflatex and bibtex (e.g. via TeX Live or MacTeX).
| File | Description |
|---|---|
master_product_info.parquet |
Product characteristics (size, proof, category, manufacturer). Key: master_prod_id. |
crosswalk_nielsen.parquet |
Maps (upc, upc_ver_uc) to master_prod_id. |
crosswalk_wholesale.parquet |
Maps (firm_id, firm_prod_id) to master_prod_id. |
all_quantity_monthly.parquet |
Pre-computed monthly quantity data for price processing (step 1c). |
all_quantity_quarterly.parquet |
DISCUS-calibrated quarterly quantities for estimation (step 2). |
discus_shares_quarterly.parquet |
Obfuscated DISCUS firm-level shares for ownership matrix computation. |
prices_monthly_cleaned.parquet |
Monthly posted wholesale prices from CT DCP. |
manuf_prices_raw.parquet |
Raw manufacturer pricing data. |
manufacturer_info.parquet |
Manufacturer-level information. |
firm_ids.xlsx |
Wholesaler ID to company name mapping. |
state_taxes.xlsx |
2013 state-level excise tax rates. |
clearance_list.xlsx |
Product-periods flagged as clearance items. |
default_prices.xlsx |
Default prices for missing entries. |
| File | Description |
|---|---|
pcyr1970-2020.txt |
NIAAA per-capita alcohol consumption by state and beverage type (1970--2020). |
cps_asec_age.parquet |
IPUMS CPS-ASEC: age and person weights, 1983--2010. |
cps_asec_income.parquet |
IPUMS CPS-ASEC: household income and weights, 1983--2010. |
cbp_liquor_state.parquet |
Census County Business Patterns: state-level liquor retail, 1986--2010. |
MIT License. See LICENSE.
@article{conlon_rao_liquor,
title={The Cost of Curbing Externalities with Market Power: Alcohol Regulations and Tax Alternatives},
author={Conlon, Christopher T. and Rao, Nirupama S.},
journal={Journal of Political Economy},
year={Forthcoming}
}