Replication Package: Conlon and Rao

"The Cost of Curbing Externalities with Market Power: Alcohol Regulations and Tax Alternatives" Christopher T. Conlon and Nirupama S. Rao Forthcoming, Journal of Political Economy

Overview -- Replication options

This repository contains all code and non-proprietary data to replicate the tables and figures in the paper. The analysis estimates demand for distilled spirits in Connecticut using NielsenIQ retail scanner data and DISCUS shipment data, then simulates counterfactual tax policies. The replication package allows for three different entry points:

Entry Point #1: this starts in cleaning_code/ and requires access to the NielsenIQ data from the Kilts Center at Chicago Booth. It processes the raw data using kiltsreader package. We include some sufficiently obfuscated DISCUS data, as well as the code in cleaning_code/reference that illustrates how the DISCUS and NielsenIQ data are combined in the original paper.
Entry Point #2: this can be started WITHOUT access to the NielsenIQ data. It skips the cleaning_code/ steps and starts with the contents of estimation_data/, proc_data/ pre-populated, and begins with the estimation (in pyblp) of the demand model. Estimation of the full demand model can take around 24 hours on a standard computer.
Entry Point #3: this skips the estimation of the demand model, and begins with the contents of estimation_outputs pre-populated. Because slow_paper.pickle is too large for GitHub, it must be manually downloaded and placed in estimation_outputs/ before starting. This entry point focuses on simulating counterfactuals (which can still take several hours) and begins in counterfactual_code.

Summary

If you do not have access to the raw NielsenIQ/DISCUS data, you can still replicate all estimation and counterfactual results:

Clone this repository
Download estimation results from Dataverse ([DOI link TBD]):
- slow_paper.pickle → place in estimation_outputs/
Install dependencies: pip install -r requirements.txt

Run counterfactuals:

cd counterfactual_code
bash run_counterfactuals.sh

Compile the paper PDF:
```
bash paper/compile_paper.sh
```

This reproduces all tables, figures, and the paper PDF. See Detailed Instructions below.

Requirements

Software

Python 3.9+
R 4.0+ (only needed for Entry Point 1: cleaning step 1d)
PyBLP 0.13+

Install Python dependencies:

pip install -r requirements.txt

Install R dependencies (only needed for Entry Point 1):

Rscript install_r_dependencies.R

R packages used: tidyverse, arrow, fixest, broom, ranger, latex2exp

Proprietary Data (Entry Point 1 only)

The cleaning pipeline requires NielsenIQ data under a separate licensing agreement:

NielsenIQ: Retail scanner and consumer panel data for Connecticut (2007--2014). Available through the Kilts Center at Chicago Booth.

DISCUS shipment data (used in the original analysis) has been pre-processed into obfuscated firm-level shares provided in our_inputs/. See cleaning_code/reference/ for the original DISCUS processing scripts.

NielsenIQ data is not required for Entry Point 2.

Estimation Results (Dataverse)

The estimation pickle files (~100MB each) are hosted on Dataverse rather than in this repository:

Dataverse DOI: [TBD]
Download slow_paper.pickle and place in estimation_outputs/

Repository Structure

PH_replication/
├── cleaning_code/          # Stage 1: Data processing pipeline
├── estimation_code/        # Stage 2: Demand estimation (PyBLP)
├── counterfactual_code/    # Stage 3: Counterfactuals, tables & figures
├── paper/                  # LaTeX source + compile_paper.sh to build PDF
├── our_inputs/             # Author-constructed crosswalks, prices, product info
├── estimation_data/        # Estimation-ready datasets
├── estimation_outputs/     # Estimation results (.pickle from Dataverse, .parquet)
├── proc_data/              # Processed data (2 files provided; rest generated by Stage 1)
├── results/                # Counterfactual simulation results
├── tables_and_figures/     # Final tables (.tex) and figures (.pdf)
├── app_figures/            # Appendix tables and figures
├── raw_data/               # Public raw data (Census, TTB, NIH)
├── requirements.txt        # Python dependencies
├── install_r_dependencies.R # R dependencies
├── LICENSE                 # MIT License
└── README.md

Replication Instructions

The pipeline has three stages. Entry Point 2 users start at Stage 2 or Stage 3.

Stage 1: Data Cleaning (`cleaning_code/`)

Requires: NielsenIQ data (Entry Point 1 only)

Configure data paths in cleaning_code/lp2023_config.py:
- Set nielsen_root to your Kilts Center data directory
Run:
```
cd cleaning_code
bash run_cleaning.sh
```

Pre-computed quantity data (monthly for prices, DISCUS-calibrated quarterly for estimation) and obfuscated DISCUS firm shares are provided in our_inputs/, so raw DISCUS data is not required. The original DISCUS processing scripts are archived in cleaning_code/reference/ for transparency.

Script	Description
`0_read_kilts.py`	Read raw NielsenIQ data (scanner + panel). Configure paths in `lp2023_config.py`. Run separately before the pipeline.
`1c_monthly_prices.py`	Merge price data (wholesale, manufacturer, retail) with pre-computed quantities.
`1d_impute_manuf.R`	Impute missing manufacturer prices (R).
`1e_combine_monthly.py`	Combine monthly price and quantity data.
`2_combine_quarterly.py`	Merge pre-computed quarterly quantities with price data.
`3_process_agents.py`	Process NielsenIQ panelist data; estimate income distributions.
`4_micro_moments_by_year.py`	Calculate micro moments for demand estimation.
`5_data_prep_quarterly.py`	Prepare final estimation datasets, instruments, and ownership matrices.
`6_appendix_checks.py`	Validation checks (appendix tables).

Outputs: estimation_data/, proc_data/

Stage 2: Demand Estimation (`estimation_code/`)

Requires: estimation_data/ (provided in repo)

Estimates demand using PyBLP. The main specification is slow (10+ hours, ~4GB of Memory) but runs on a laptop.

Script	Description	Resources
`estimate_demand_logit.py`	Plain logit specification	~15 seconds, laptop OK
`estimate_demand_nested.py`	Nested logit specification	~5 minutes, laptop OK
`estimate_demand_pi.py`	Logit with demographic interactions	~20 minutes, laptop OK
`estimate_demand_MAIN.py --fast`	Full RCNL model, fast mode	~1.5 hours, laptop OK (M2 Mac)
`estimate_demand_MAIN.py`	Full RCNL model (main specification)	~10+ hours (M2 Mac), laptop marginal
`post_estimation.py`	Compute marginal costs and markups	Minutes

All scripts accept command-line arguments (see --help). Example:

cd estimation_code
bash run_estimation.sh       # runs all specs (logit, nested, pi, RCNL fast & slow, post-estimation)

All scripts accept --help for options. Results are saved to estimation_outputs/. Pre-computed results are available on Dataverse.

Stage 3: Counterfactuals and Figures (`counterfactual_code/`)

Requires: estimation_outputs/ (download from Dataverse), estimation_data/, proc_data/, results/

cd counterfactual_code
bash run_counterfactuals.sh

The script runs five stages:

Stage 3.1: Descriptive tables and figures

Script	Output
`0_nielsen_figures.py`	Price index figure (Figure 1)
`1_descriptive_tables.py`	Summary statistics (Tables 1--2)
`2_manufacturer_plots.py`	Manufacturer pricing (Figures 2--3)
`2b_price_plots.py`	Wholesale/retail price comparison (Figures 4--5)

Stage 3.2: Estimation result tables

Script	Output
`4_parameter_tables.py`	Parameter estimates (Tables 3--4)
`4b_appendix_micro_tables.py`	Micro moment fit and κ distribution (App Tables A1--A2)
`5_diversion_plot.py`	Diversion ratios (Figure 6, App Tables A3, A7)
`12_panel_regressions.py`	NIAAA consumption and CBP retail regressions (App Tables A9--A10). Requires `pyfixest`.

Stage 3.3: Counterfactual simulations (~4.5 hours on M2 Mac laptop)

Script	Output
`3a_compute_counterfactual.py --all`	Tax counterfactuals for all 24 markets
`3b_collate_cf.py`	Collate parallel results
`3c_compute_counterfactual_both_tax.py`	Combined ad valorem + specific tax

Stage 3.4: Counterfactual figures and tables

Script	Output
`6_counterfactual_plots.py`	Counterfactual figures (Figures 7--8), welfare tables (Tables 6--8)

Stage 3.5: Appendix notebooks

Notebook	Output
`7_appendix_sales_prices.ipynb`	Sales and price distributions
`8_appendix_time_variation.ipynb`	Time variation in estimates
`9_appendix_plot_both_tax.ipynb`	Combined tax counterfactual plots
`10_tagging.ipynb`	Product tagging regressions
`11_extra_response_plots.ipynb`	Response heatmaps and additional plots

All output is written to tables_and_figures/ and app_figures/.

Stage 4: Compiling the Paper PDF (`paper/`)

After running the counterfactual pipeline, compile the paper:

bash paper/compile_paper.sh

This script:

Copies all pipeline-generated figures and tables from tables_and_figures/ and app_figures/ into paper/Figures/ and paper/Tables/
Runs pdflatex + bibtex to produce paper/00_main.pdf

To sync outputs without recompiling:

bash paper/compile_paper.sh --no-compile

Three tables are manually maintained in paper/Tables/ and are not regenerated by the pipeline:

table5_taxes.tex — policy tax rate descriptions
ph_panelreg.tex — panel regression (run 12_panel_regressions.py separately; requires pyfixest)
app_tab_rho_sensitivity.tex — ρ sensitivity (deferred; requires long rho-grid computation)

Prerequisites: pdflatex and bibtex (e.g. via TeX Live or MacTeX).

Key Input Files

`our_inputs/`

File	Description
`master_product_info.parquet`	Product characteristics (size, proof, category, manufacturer). Key: `master_prod_id`.
`crosswalk_nielsen.parquet`	Maps `(upc, upc_ver_uc)` to `master_prod_id`.
`crosswalk_wholesale.parquet`	Maps `(firm_id, firm_prod_id)` to `master_prod_id`.
`all_quantity_monthly.parquet`	Pre-computed monthly quantity data for price processing (step 1c).
`all_quantity_quarterly.parquet`	DISCUS-calibrated quarterly quantities for estimation (step 2).
`discus_shares_quarterly.parquet`	Obfuscated DISCUS firm-level shares for ownership matrix computation.
`prices_monthly_cleaned.parquet`	Monthly posted wholesale prices from CT DCP.
`manuf_prices_raw.parquet`	Raw manufacturer pricing data.
`manufacturer_info.parquet`	Manufacturer-level information.
`firm_ids.xlsx`	Wholesaler ID to company name mapping.
`state_taxes.xlsx`	2013 state-level excise tax rates.
`clearance_list.xlsx`	Product-periods flagged as clearance items.
`default_prices.xlsx`	Default prices for missing entries.

`raw_data/`

File	Description
`pcyr1970-2020.txt`	NIAAA per-capita alcohol consumption by state and beverage type (1970--2020).
`cps_asec_age.parquet`	IPUMS CPS-ASEC: age and person weights, 1983--2010.
`cps_asec_income.parquet`	IPUMS CPS-ASEC: household income and weights, 1983--2010.
`cbp_liquor_state.parquet`	Census County Business Patterns: state-level liquor retail, 1986--2010.

License

MIT License. See LICENSE.

Citation

@article{conlon_rao_liquor,
  title={The Cost of Curbing Externalities with Market Power: Alcohol Regulations and Tax Alternatives},
  author={Conlon, Christopher T. and Rao, Nirupama S.},
  journal={Journal of Political Economy},
  year={Forthcoming}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Replication Package: Conlon and Rao

Overview -- Replication options

Summary

Requirements

Software

Proprietary Data (Entry Point 1 only)

Estimation Results (Dataverse)

Repository Structure

Replication Instructions

Stage 1: Data Cleaning (`cleaning_code/`)

Stage 2: Demand Estimation (`estimation_code/`)

Stage 3: Counterfactuals and Figures (`counterfactual_code/`)

Stage 4: Compiling the Paper PDF (`paper/`)

Key Input Files

`our_inputs/`

`raw_data/`

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app_figures		app_figures
cleaning_code		cleaning_code
counterfactual_code		counterfactual_code
estimation_code		estimation_code
estimation_data		estimation_data
estimation_outputs		estimation_outputs
our_inputs		our_inputs
paper		paper
proc_data		proc_data
raw_data		raw_data
raw_nielsen_data		raw_nielsen_data
results		results
tables_and_figures		tables_and_figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install_r_dependencies.R		install_r_dependencies.R
paper_output_checklist.md		paper_output_checklist.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Replication Package: Conlon and Rao

Overview -- Replication options

Summary

Requirements

Software

Proprietary Data (Entry Point 1 only)

Estimation Results (Dataverse)

Repository Structure

Replication Instructions

Stage 1: Data Cleaning (cleaning_code/)

Stage 2: Demand Estimation (estimation_code/)

Stage 3: Counterfactuals and Figures (counterfactual_code/)

Stage 4: Compiling the Paper PDF (paper/)

Key Input Files

our_inputs/

raw_data/

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Stage 1: Data Cleaning (`cleaning_code/`)

Stage 2: Demand Estimation (`estimation_code/`)

Stage 3: Counterfactuals and Figures (`counterfactual_code/`)

Stage 4: Compiling the Paper PDF (`paper/`)

`our_inputs/`

`raw_data/`

Packages