Skip to content

Performance on 3+ fixed effects slow #50

@wklimowicz

Description

@wklimowicz

Performance on some data with compress is very slow, compared to feols. I'm not sure how much this is an "issue" with this package, versus a testament to how quick fixest is. The main issue seems to be the 3rd fixed effect, since up to 2 we can use strategy = within/demean, and its either a bit quicker or much quicker.

I was also getting "matrix too big for memory" type errors on dbreg compared to some regressions which took ~20m on feols, also perhaps related to strategy = "compress"?

strategy = "mundlak" works fast on everything I've tested, but it would be nice to have a drop in replacement for feols.

(This was run on #49 , since I also got matrix inversion errors when running on master: might be related?)

# From PR 49: on main it errors out with matrix inversion.
# pak::pak("grantmcdermott/dbreg#49")

library(duckdb)
library(fixest)
library(dplyr)
library(dbplyr)
library(dbreg)

# From 100k to 500k rows, increasing by 100k
n_rows <- (1:5)*1e5

benchmark_single_regression <- function(n_rows) {
  set.seed(44)

  df <- data.frame(
    fe1 = rep(1:(n_rows / 100), each = 100),
    fe2 = rep(1:(n_rows / 200), each = 200),
    fe3 = rep(1:(n_rows / 500), each = 500),
    x = rnorm(n_rows),
    y = rbinom(n_rows, 1, 0.2)
  )

  con <- dbConnect(duckdb())
  df_db <- copy_to(con, df, overwrite = TRUE)

  t1 <- Sys.time()
  dbreg(
    table = df_db,
    strategy = "auto",
    verbose = TRUE,
    y ~ x | fe1 + fe2 + fe3,
  )
  t2 <- Sys.time()
  time_dbreg <- as.numeric(t2 - t1, units = "secs")

  t1 <- Sys.time()
  feols(
    data = df,
    y ~ x | fe1 + fe2 + fe3,
  )
  t2 <- Sys.time()
  time_feols <- as.numeric(t2 - t1, units = "secs")

  return(data.frame(
    n_rows = n_rows,
    type = c("feols", "dbreg"),
    time = c(time_feols, time_dbreg)
  ))
}

results <- lapply(n_rows, benchmark_single_regression) |> dplyr::bind_rows()
Number of rows feols (seconds) dbreg (seconds)
1e+05 0.018 1.451
2e+05 0.035 30.890
3e+05 0.044 105.122
4e+05 0.057 258.883
5e+05 0.077 550.779

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions