experiment: stream CSV import via DuckDB Appender to show progress

Opening a large archive (100GB+) can take 10+ minutes with no meaningful feedback during the DuckDB import step, which is likely the dominant bottleneck. The current `create_from_core_files` uses a single `CREATE TABLE AS SELECT * FROM read_csv(...)` — one atomic SQL call with no progress API.

## Proposed approach

Replace the `read_csv` SQL with Rust-side CSV streaming into DuckDB's `Appender` API in batches. Thread a `progress: f64` callback (0.0–1.0) up through `create_from_core_files` → `Archive::open` → `open_archive`, and surface it as real percentage progress on the loading screen.

### Key changes

- **`src-tauri/Cargo.toml`**: add `csv = "1.3.1"`
- **`ArchiveOpenProgress::CreatingDatabase`**: add `progress: f64` field; change internal progress channel from `String` to the enum directly
- **`Database::create_from_core_files`**: add `on_progress: F` callback parameter; count rows first (cheap line scan), build `CREATE TABLE` DDL from sniffed columns + `TYPE_OVERRIDES`, stream rows via `Appender` in 50k-row batches, calling `on_progress` after each flush
- **`dwca/archive.rs`**: thread progress closure into `create_from_core_files`
- **`+page.svelte`**: read `progress.progress` on `creatingDatabase` events; update `archiveLoadingProgress` derivation to use actual value in the 40–95% range

### Type conversion per row

- `VARCHAR`: pass as `&str`
- `DOUBLE` (decimalLatitude, decimalLongitude): `parse::<f64>().ok()` → `Option<f64>`, empty → `None`
- `BOOLEAN` (captive, hasCoordinate, etc.): match `"true"/"1"` → `Some(true)`, `"false"/"0"` → `Some(false)`, `""` → `None`

Extension tables are left on `read_csv` for now (usually much smaller than occurrences).

## Risks

| Risk | Mitigation |
|---|---|
| Appender column order diverges from CREATE TABLE | Build both from a single shared `Vec<(col, type)>` |
| BOOLEAN format variation across archives | Lowercase + check both forms; warn on unrecognised values |
| Archives with no rows | Guard `total_rows == 0`, call `on_progress(1.0)` immediately |
| Regression on NULL / empty-column-drop behaviour | Existing tests cover this; add explicit test for NULL handling in Appender path |

## Why this is an experiment

Estimated 50k–100k tokens to implement, with the wide range driven by how finicky the Appender type handling turns out to be. The improvement is real but only matters for very large archives. Worth doing if those users are a priority.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiment: stream CSV import via DuckDB Appender to show progress #52

Proposed approach

Key changes

Type conversion per row

Risks

Why this is an experiment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Risk	Mitigation
Appender column order diverges from CREATE TABLE	Build both from a single shared `Vec<(col, type)>`
BOOLEAN format variation across archives	Lowercase + check both forms; warn on unrecognised values
Archives with no rows	Guard `total_rows == 0`, call `on_progress(1.0)` immediately
Regression on NULL / empty-column-drop behaviour	Existing tests cover this; add explicit test for NULL handling in Appender path

experiment: stream CSV import via DuckDB Appender to show progress #52

Description

Proposed approach

Key changes

Type conversion per row

Risks

Why this is an experiment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions