-
Notifications
You must be signed in to change notification settings - Fork 2
Performance on 3+ fixed effects slow #50
Copy link
Copy link
Open
Description
Performance on some data with compress is very slow, compared to feols. I'm not sure how much this is an "issue" with this package, versus a testament to how quick fixest is. The main issue seems to be the 3rd fixed effect, since up to 2 we can use strategy = within/demean, and its either a bit quicker or much quicker.
I was also getting "matrix too big for memory" type errors on dbreg compared to some regressions which took ~20m on feols, also perhaps related to strategy = "compress"?
strategy = "mundlak" works fast on everything I've tested, but it would be nice to have a drop in replacement for feols.
(This was run on #49 , since I also got matrix inversion errors when running on master: might be related?)
# From PR 49: on main it errors out with matrix inversion.
# pak::pak("grantmcdermott/dbreg#49")
library(duckdb)
library(fixest)
library(dplyr)
library(dbplyr)
library(dbreg)
# From 100k to 500k rows, increasing by 100k
n_rows <- (1:5)*1e5
benchmark_single_regression <- function(n_rows) {
set.seed(44)
df <- data.frame(
fe1 = rep(1:(n_rows / 100), each = 100),
fe2 = rep(1:(n_rows / 200), each = 200),
fe3 = rep(1:(n_rows / 500), each = 500),
x = rnorm(n_rows),
y = rbinom(n_rows, 1, 0.2)
)
con <- dbConnect(duckdb())
df_db <- copy_to(con, df, overwrite = TRUE)
t1 <- Sys.time()
dbreg(
table = df_db,
strategy = "auto",
verbose = TRUE,
y ~ x | fe1 + fe2 + fe3,
)
t2 <- Sys.time()
time_dbreg <- as.numeric(t2 - t1, units = "secs")
t1 <- Sys.time()
feols(
data = df,
y ~ x | fe1 + fe2 + fe3,
)
t2 <- Sys.time()
time_feols <- as.numeric(t2 - t1, units = "secs")
return(data.frame(
n_rows = n_rows,
type = c("feols", "dbreg"),
time = c(time_feols, time_dbreg)
))
}
results <- lapply(n_rows, benchmark_single_regression) |> dplyr::bind_rows()
| Number of rows | feols (seconds) | dbreg (seconds) |
|---|---|---|
| 1e+05 | 0.018 | 1.451 |
| 2e+05 | 0.035 | 30.890 |
| 3e+05 | 0.044 | 105.122 |
| 4e+05 | 0.057 | 258.883 |
| 5e+05 | 0.077 | 550.779 |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels