Conversation
|
Wow this is pretty cool, thanks! Going to tinker with this some more when I have time. I wonder if we can even go further and share the in-memory data object without copying. |
c0ae624 to
48c1264
Compare
|
Hm, I tried a bit, but apparently this is not so simple due to the sandbox in v8. All the hints I could find, caused the sandbox to bark while terminating the session due to a potential security violation. But I asked Gemini to check the existing functions, and this is what I see tmp <- tempfile(fileext = ".js")
curl::curl_download(
url = "https://unpkg.com/underscore@1.13.7/underscore-min.js",
destfile = tmp)
test <- function(backend = NULL, tmp = NULL) {
ct <- V8::v8(backend = backend)
ct$source(tmp)
ct$call("_.filter", nycflights13::flights, V8::JS("function(x){return x.arr_delay > 720}"))
}
res <- microbenchmark::microbenchmark(
test(backend = "arrow", tmp = tmp),
test(backend = "jsonlite", tmp = tmp),
times = 25, unit = "ms"
); res
#> Warning in microbenchmark::microbenchmark(test(backend = "arrow", tmp = tmp), :
#> less accurate nanosecond times to avoid potential integer overflows
#> Unit: milliseconds
#> expr min lq mean median
#> test(backend = "arrow", tmp = tmp) 287.8694 303.9558 354.6052 316.310
#> test(backend = "jsonlite", tmp = tmp) 2621.3725 2676.6829 2761.8196 2703.873
#> uq max neval
#> 322.2923 1267.931 25
#> 2750.2416 4039.942 25But this is a better comparison, the call function above is tweaked to avoid the construction of the function with the entire data included in the function body. The following should be a fair one to one comparison. This creates identical objects in v8, so both are directly comparable. Therefore the arrow backend creates a table from the Arrow table sourced from the ipc stream and when importing it creates an Arrow table to return the ipc stream. test <- function(backend = NULL) {
ct <- V8::v8(backend = backend)
ct$assign("flights", nycflights13::flights)
ct$get("flights")
}
res <- microbenchmark::microbenchmark(
test(backend = "arrow"),
test(backend = "jsonlite"),
times = 5, unit = "ms"
); res
#> Warning in microbenchmark::microbenchmark(test(backend = "arrow"), test(backend
#> = "jsonlite"), : less accurate nanosecond times to avoid potential integer
#> overflows
#> Unit: milliseconds
#> expr min lq mean median uq
#> test(backend = "arrow") 6986.343 7040.548 7451.106 7641.929 7677.567
#> test(backend = "jsonlite") 7421.892 7603.722 7905.523 7674.219 7795.122
#> max neval
#> 7909.143 5
#> 9032.658 5 |
|
getting the data is the slow part (probably due to the fact that the conversion from javascript table to Arrow table is happening in js). Might be faster using a wasm Arrow. test <- function(backend = NULL) {
ct <- V8::v8(backend = backend)
ct$assign("flights", nycflights13::flights)
# ct$get("flights")
}
res <- microbenchmark::microbenchmark(
test(backend = "arrow"),
test(backend = "jsonlite"),
times = 5, unit = "ms"
); res
#> Warning in microbenchmark::microbenchmark(test(backend = "arrow"), test(backend
#> = "jsonlite"), : less accurate nanosecond times to avoid potential integer
#> overflows
#> Unit: milliseconds
#> expr min lq mean median uq
#> test(backend = "arrow") 180.601 186.3381 384.6102 195.7693 219.9445
#> test(backend = "jsonlite") 2322.650 2326.7051 2604.6440 2390.2020 2410.7080
#> max neval
#> 1140.398 5
#> 3572.955 5 |
|
avoiding json speeds things up test <- function(backend = NULL) {
ct <- V8::v8(backend = backend)
ct$assign("flights", nycflights13::flights)
ct$get("flights")
}
res <- microbenchmark::microbenchmark(
test(backend = "arrow"),
test(backend = "jsonlite"),
times = 5, unit = "ms"
); res
#> Warning in microbenchmark::microbenchmark(test(backend = "arrow"), test(backend
#> = "jsonlite"), : less accurate nanosecond times to avoid potential integer
#> overflows
#> Unit: milliseconds
#> expr min lq mean median uq
#> test(backend = "arrow") 4645.056 4645.183 4741.127 4655.773 4667.055
#> test(backend = "jsonlite") 7717.207 8181.782 8765.938 8965.483 9028.921
#> max neval
#> 5092.569 5
#> 9936.294 5 |
Hi @jeroen ,
this is a draft for an arrow backend. I just tried to include it from what I tried yesterday in #6. I make it pass the tests, but not everything can be done with this arrow backend. Since it can be called without any PR this is just elaborated toying around. I have no real world use case for this.