adding weights to regressions with up to 2 FE dimensions#12
adding weights to regressions with up to 2 FE dimensions#12jamesbrandecon wants to merge 5 commits intomainfrom
Conversation
|
@grantmcdermott if you can, it'd help to confirm that weighted regs match feols on your end. Next task is to figure out if I can refactor the dbplyr stuff to avoid writing temp tables, which I think will be hard but probably necessary. This is also the second step of #11, at least for TWFE. More work for HDFE. |
|
Awesome. I think the next step is to set up a test suite and CI so that we can iterate more systemically and securely. I've got a couple of things on today, but will try to do that when I get sec. Then we can merge into this branch and add weighted tests too before diving into the code. |
|
I agree! Already it was unwieldy to make sure I haven't broken anything. Happy to defer to you on test structure/content. |
|
Tests and CI added. If you merge from main locally and then push, they should auto propagate. (You can also add additional tests as part of this PR. They need to be saved as |
|
Thanks! Any attachment to the current test structure? Locally, I'm simulating one small-ish dataset and then running regs with many different options and comparing to feols as source of truth. So |
|
No attachment, but it does need to follow the |
|
Just a quick note: the CI will fail because you've got new undeclared dependencies |
|
To clarify: Just add (It doesn't need to be in |
|
I'll try harder on testing later today -- sorry for the CI breaking emails. Need to learn local testing in R! |
|
No worries! I actually don't get the breaking emails (since it's your PR). The workflow is quite nice once you get used to it. I might add a Makefile when I get a chance, but two tips for local testing in the meantime:
|
|
Thanks! Now I get the flow. I hadn't realized we weren't importing dplyr yet -- is that something you want to avoid? I was relying heavily on dplyr/dbplyr and pipes to get alternating projections to work. |
I'd prefer not to add it as a dependency if we can avoid it. At the same time, I obviously don't want to make our lives harder than it needs be. Regardless, maybe it's a good idea to touch base over a call at this point? The weights functionality sounds very useful regardless of where we land on HDFE. I don't want to lose that through too much integration with alternating projects functionality yet.... Pure SQL alternating projectionsSpeaking of which, here is my largely vibe-coded "pure SQL" implementation of alternating projections. Right now, this is just a standalone function that does the AP (using temp tables) and then calls
|
|
Ok yeah let's talk live! We can email to schedule. Fwiw, three quick points:
|
|
@grantmcdermott for now, can you unblock me on adding dplyr as a dependency and let me build this stuff out (weights and AP)? Always able to go back and take other approaches if I make it modular |
126e86d to
d1a1092
Compare
d1a1092 to
f18f12b
Compare
|
@grantmcdermott I am still making sure the speed is ok, but would be interested in any comments you have here. I needed weights for a couple of the things I want to add next, so I went back and had codex undo all the dplyr stuff. Hopefully that lets us merge this finally |
|
Thanks @jamesbrandecon. Realistically, I won't be able to review under the weekend (at least). QQ, though: what do you mean "alternating projections for demean strategy" in commits above? Is this full blown, FWL style iterative AP? Or something else? |
|
@grantmcdermott If I am thinking correctly (I independently came to this conclusion in the previous version above, which I mostly discarded here), the demeaning with num_FEs>2 is best done via alternating projections. I guess I've never looked deeply into AP algos so maybe I am missing something but I just loop over each FE dimension and take out the mean, repeating until convergence. |
There was a problem hiding this comment.
Hi @jamesbrandecon,
I'm finally getting around to reviewing this. There are a lot of file changes (although most of these are cosmetic) and the main dbreg.R file has a lot of changes (which makes me a bit nervious).
However, reading through I'm realising that a much of the main diff is inflated. Specifically, the rebase appears to have inlined several shared helpers (parse_vcov_args, setup_db_connection, parse_regression_formula) that previously (and still?) live in utils.R. Can we revert to calling the helpers and just add the new weight validation logic on top? It would make the PR much easier to review and keeps dbreg() / dbbinsreg() sharing the same code paths.
| vcov = vcov_parsed$vcov_type | ||
| cluster = vcov_parsed$cluster_var | ||
|
|
||
| # Parse vcov: can be string or formula (for clustering) |
There was a problem hiding this comment.
Why replace the previous convenience function? This whole chunk used to be much shorter:
# Parse vcov/cluster arguments using shared helper
vcov_parsed = parse_vcov_args(vcov, cluster)
vcov = vcov_parsed$vcov_type
cluster = vcov_parsed$cluster_var
| term_labels = fml_parsed$term_labels | ||
| has_interactions = fml_parsed$has_interactions | ||
| fe = fml_parsed$fe | ||
| own_conn = FALSE |
There was a problem hiding this comment.
same issue for this chunk as the previous one. why inline all of the code instead of using the previous convenience function(s)? the old code was:
# Set up database connection and data source using shared helper
db_setup = setup_db_connection(conn, table, data, path)
conn = db_setup$conn
own_conn = db_setup$own_conn
from_statement = db_setup$from_statement
# Parse formula using shared helper
fml_parsed = parse_regression_formula(fml)
fml = fml_parsed$fml
yvar = fml_parsed$yvar
xvars = fml_parsed$xvars
term_labels = fml_parsed$term_labels
has_interactions = fml_parsed$has_interactions
fe = fml_parsed$fe
First step of #11
I am running local tests of every combination of no FEs, 1 FE, weights, and no weights, and the output matches feols exactly.