Conversation
b6c0dec to
bb179a6
Compare
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
bb179a6 to
3c56d41
Compare
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
89355de to
0734ed6
Compare
quickwit/quickwit-metastore/migrations/postgresql/26_add-split-soft-deleted-doc-ids.down.sql
Outdated
Show resolved
Hide resolved
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
c86fbfb to
817c329
Compare
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
rdettai-sk
left a comment
There was a problem hiding this comment.
Not 100% through with the review (I only glanced over the search part). I think an integration test would be really nice.
eb8afbf to
c2300ad
Compare
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
c2300ad to
a2b75d8
Compare
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
rdettai-sk
left a comment
There was a problem hiding this comment.
My biggest remaining concern is regarding the list splits endpoint. It is actually very slow (couple of seconds) on big indexes.
| metastore: &MetastoreServiceClient, | ||
| progress: &Progress, | ||
| ) { | ||
| let list_splits_request = match ListSplitsRequest::try_from_index_uid(index_uid.clone()) { |
There was a problem hiding this comment.
This can gather a lot of splits from the metastore and needs to be called at every merge upload. We should try to cap the overhead. We can easily:
- filter on the current node
- filter on published splits only
I think we can also filter on immature splits. Not 100% sure how that works, but it would be the most efficient.
There was a problem hiding this comment.
of course if we bring it to the metastore we can more easily get only the splits we need 😄
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
a24023f to
d945645
Compare
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
There was a problem hiding this comment.
Pull request overview
This PR introduces end-to-end soft deletion of documents in Quickwit: recording deleted Tantivy doc IDs in the metastore, excluding them from search/count paths, and ensuring merges physically drop these documents.
Changes:
- Adds a new metastore RPC (
SoftDeleteDocuments) and storessoft_deleted_doc_idsin split metadata. - Updates search execution (including count-only paths) to exclude soft-deleted doc IDs via a Tantivy query wrapper.
- Adds a new REST endpoint
POST /{index_id}/soft-deleteand updates merge logic to hard-delete soft-deleted docs during split merges.
Reviewed changes
Copilot reviewed 49 out of 49 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| quickwit/quickwit-serve/src/soft_delete_api/mod.rs | Registers soft-delete API module/export. |
| quickwit/quickwit-serve/src/soft_delete_api/handler.rs | Implements REST endpoint to soft-delete by query (search → metastore write) + tests. |
| quickwit/quickwit-serve/src/rest.rs | Wires soft-delete routes into API v1 router. |
| quickwit/quickwit-serve/src/lib.rs | Adds soft_delete_api module. |
| quickwit/quickwit-search/src/tests.rs | Adds integration-style tests ensuring soft-deletes are excluded from search/count paths. |
| quickwit/quickwit-search/src/soft_delete_query.rs | Adds Tantivy Query wrapper excluding a set of doc IDs. |
| quickwit/quickwit-search/src/root.rs | Adjusts metadata-count computation to subtract soft-deleted docs. |
| quickwit/quickwit-search/src/retry/search.rs | Updates tests/fixtures for new soft_deleted_doc_ids field in split offsets. |
| quickwit/quickwit-search/src/retry/mod.rs | Updates tests/fixtures for new soft_deleted_doc_ids field in split offsets. |
| quickwit/quickwit-search/src/list_fields_cache.rs | Updates tests/fixtures for new soft_deleted_doc_ids field in split offsets. |
| quickwit/quickwit-search/src/lib.rs | Plumbs split soft_deleted_doc_ids into SplitIdAndFooterOffsets. |
| quickwit/quickwit-search/src/leaf.rs | Wraps queries with SoftDeleteQuery and adjusts count paths to subtract soft-deletes. |
| quickwit/quickwit-search/src/leaf_cache.rs | Adds soft-delete length to cache key to invalidate on new deletions. |
| quickwit/quickwit-search/src/collector.rs | Minor comment cleanup. |
| quickwit/quickwit-search/src/cluster_client.rs | Updates tests/fixtures for new soft_deleted_doc_ids field in split offsets. |
| quickwit/quickwit-proto/src/getters.rs | Generates getters for SoftDeleteDocumentsRequest. |
| quickwit/quickwit-proto/src/codegen/quickwit/quickwit.search.rs | Adds soft_deleted_doc_ids to SplitIdAndFooterOffsets proto struct. |
| quickwit/quickwit-proto/src/codegen/quickwit/quickwit.metastore.rs | Adds new metastore messages + RPC plumbing for soft-delete. |
| quickwit/quickwit-proto/protos/quickwit/search.proto | Adds soft_deleted_doc_ids to split offsets message. |
| quickwit/quickwit-proto/protos/quickwit/metastore.proto | Adds SoftDeleteDocuments RPC and request/response messages. |
| quickwit/quickwit-metastore/test-data/split-metadata/v0.9.json | Updates regression JSON to include soft_deleted_doc_ids. |
| quickwit/quickwit-metastore/test-data/split-metadata/v0.9.expected.json | Updates expected regression JSON to include soft_deleted_doc_ids. |
| quickwit/quickwit-metastore/test-data/split-metadata/v0.8.expected.json | Updates expected regression JSON to include soft_deleted_doc_ids. |
| quickwit/quickwit-metastore/test-data/split-metadata/v0.7.expected.json | Updates expected regression JSON to include soft_deleted_doc_ids. |
| quickwit/quickwit-metastore/test-data/file-backed-index/v0.9.json | Updates file-backed index regression JSON to include soft-delete fields. |
| quickwit/quickwit-metastore/test-data/file-backed-index/v0.9.expected.json | Updates expected file-backed index regression JSON. |
| quickwit/quickwit-metastore/test-data/file-backed-index/v0.8.expected.json | Updates expected file-backed index regression JSON. |
| quickwit/quickwit-metastore/test-data/file-backed-index/v0.7.expected.json | Updates expected file-backed index regression JSON. |
| quickwit/quickwit-metastore/src/tests/split.rs | Adds metastore tests for soft-delete behavior, idempotency, limits, and atomicity. |
| quickwit/quickwit-metastore/src/tests/mod.rs | Adds soft-delete tests to metastore test suite macro. |
| quickwit/quickwit-metastore/src/split_metadata.rs | Adds soft_deleted_doc_ids to split metadata model + debug/test updates. |
| quickwit/quickwit-metastore/src/split_metadata_version.rs | Adds backward-compatible serialization for soft_deleted_doc_ids. |
| quickwit/quickwit-metastore/src/metastore/postgres/utils.rs | Adds split_ids filtering to Postgres list-splits query builder. |
| quickwit/quickwit-metastore/src/metastore/postgres/model.rs | Adds Postgres model enum variant for soft-deleted doc IDs column. |
| quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs | Implements soft_delete_documents for Postgres metastore. |
| quickwit/quickwit-metastore/src/metastore/mod.rs | Introduces per-split soft-delete limit + ListSplitsQuery::split_ids filter. |
| quickwit/quickwit-metastore/src/metastore/file_backed/mod.rs | Implements soft_delete_documents in file-backed metastore service. |
| quickwit/quickwit-metastore/src/metastore/file_backed/file_backed_index/mod.rs | Implements atomic soft-delete mutation + split-id filtering in file-backed index. |
| quickwit/quickwit-metastore/src/metastore/control_plane_metastore.rs | Proxies soft_delete_documents through control-plane metastore. |
| quickwit/quickwit-indexing/src/soft_delete_query.rs | Adds Tantivy query used to hard-delete soft-deleted doc IDs during merges. |
| quickwit/quickwit-indexing/src/models/split_attrs.rs | Tracks replaced splits with soft-delete snapshots; initializes split metadata with empty soft-deletes. |
| quickwit/quickwit-indexing/src/models/publisher_message.rs | Extends publish message to include replaced-splits snapshot data. |
| quickwit/quickwit-indexing/src/models/mod.rs | Re-exports ReplacedSplit model. |
| quickwit/quickwit-indexing/src/models/indexed_split.rs | Updates builder to use replaced_splits. |
| quickwit/quickwit-indexing/src/lib.rs | Registers new soft_delete_query module. |
| quickwit/quickwit-indexing/src/actors/uploader.rs | Propagates replaced-splits snapshot into SplitsUpdate (removes old HashSet-based dedup path). |
| quickwit/quickwit-indexing/src/actors/publisher.rs | Adds best-effort detection/logging for soft-delete race during merge publish. |
| quickwit/quickwit-indexing/src/actors/packager.rs | Updates tests/fixtures for replaced_splits. |
| quickwit/quickwit-indexing/src/actors/merge_executor.rs | Applies soft-deleted-doc hard deletes during merges and adjusts split attrs accordingly; adds tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs
Outdated
Show resolved
Hide resolved
quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs
Outdated
Show resolved
Hide resolved
| .fetch_all(tx.as_mut()) | ||
| .await | ||
| .map_err(|sqlx_error| convert_sqlx_err(&index_uid.index_id, sqlx_error))?; | ||
|
|
There was a problem hiding this comment.
The Postgres implementation silently ignores split IDs that are missing or not in Published state (they simply won't be returned by FETCH_SPLITS_METADATA_QUERY), and still returns Ok(...). This diverges from the file-backed metastore which returns NotFound/FailedPrecondition; consider explicitly checking that all requested split IDs were fetched and returning an error otherwise (or documenting the intended behavior).
| // Ensure that all requested split IDs were found and are in `Published` state. | |
| // The query above only returns existing, published splits; any missing IDs would | |
| // otherwise be silently ignored, diverging from the file-backed metastore behavior. | |
| let fetched_split_ids: BTreeSet<&str> = | |
| rows.iter().map(|(split_id, _)| split_id.as_str()).collect(); | |
| for requested_split_id in &requested_split_ids { | |
| if !fetched_split_ids.contains(requested_split_id) { | |
| return Err(MetastoreError::NotFound { | |
| entity: EntityKind::Split { | |
| split_id: (*requested_split_id).to_string(), | |
| }, | |
| }); | |
| } | |
| } |
There was a problem hiding this comment.
I think this divergence is acceptable
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
3f6b1e1 to
e6ce383
Compare
Description
This PR allow to soft delete documents from an index
Tasks