Skip to content

feat: Soft Delete documents#16

Open
Darkheir wants to merge 12 commits intosekoiafrom
feat/soft_delete_documents
Open

feat: Soft Delete documents#16
Darkheir wants to merge 12 commits intosekoiafrom
feat/soft_delete_documents

Conversation

@Darkheir
Copy link
Copy Markdown
Collaborator

@Darkheir Darkheir commented Mar 16, 2026

Description

This PR allow to soft delete documents from an index

Tasks

  • Update quickwit-proto with new messages
  • Update metastore to store soft deleted documents
  • Update quickwit search to ignore soft deleted documents
  • Add new REST endpoint to soft delete documents
  • Add Elasticsearch compatible endpoint to delete documents
  • Ignore deleted documents on merge of splits
  • Enforce limit of soft deleted events for a given split

@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch from b6c0dec to bb179a6 Compare March 17, 2026 08:20
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch from bb179a6 to 3c56d41 Compare March 17, 2026 08:39
@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch from 89355de to 0734ed6 Compare March 18, 2026 09:49
@Darkheir Darkheir requested a review from rdettai-sk March 18, 2026 11:25
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch from c86fbfb to 817c329 Compare March 19, 2026 14:36
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@Darkheir Darkheir requested a review from rdettai-sk March 19, 2026 17:00
Copy link
Copy Markdown
Collaborator

@rdettai-sk rdettai-sk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% through with the review (I only glanced over the search part). I think an integration test would be really nice.

@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch 6 times, most recently from eb8afbf to c2300ad Compare March 26, 2026 13:19
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch from c2300ad to a2b75d8 Compare March 26, 2026 13:32
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@Darkheir Darkheir requested a review from rdettai-sk March 26, 2026 14:34
Copy link
Copy Markdown
Collaborator

@rdettai-sk rdettai-sk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest remaining concern is regarding the list splits endpoint. It is actually very slow (couple of seconds) on big indexes.

metastore: &MetastoreServiceClient,
progress: &Progress,
) {
let list_splits_request = match ListSplitsRequest::try_from_index_uid(index_uid.clone()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can gather a lot of splits from the metastore and needs to be called at every merge upload. We should try to cap the overhead. We can easily:

  • filter on the current node
  • filter on published splits only

I think we can also filter on immature splits. Not 100% sure how that works, but it would be the most efficient.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course if we bring it to the metastore we can more easily get only the splits we need 😄

Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch from a24023f to d945645 Compare March 27, 2026 19:08
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@rdettai-sk rdettai-sk marked this pull request as ready for review March 31, 2026 13:53
Copilot AI review requested due to automatic review settings March 31, 2026 13:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces end-to-end soft deletion of documents in Quickwit: recording deleted Tantivy doc IDs in the metastore, excluding them from search/count paths, and ensuring merges physically drop these documents.

Changes:

  • Adds a new metastore RPC (SoftDeleteDocuments) and stores soft_deleted_doc_ids in split metadata.
  • Updates search execution (including count-only paths) to exclude soft-deleted doc IDs via a Tantivy query wrapper.
  • Adds a new REST endpoint POST /{index_id}/soft-delete and updates merge logic to hard-delete soft-deleted docs during split merges.

Reviewed changes

Copilot reviewed 49 out of 49 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
quickwit/quickwit-serve/src/soft_delete_api/mod.rs Registers soft-delete API module/export.
quickwit/quickwit-serve/src/soft_delete_api/handler.rs Implements REST endpoint to soft-delete by query (search → metastore write) + tests.
quickwit/quickwit-serve/src/rest.rs Wires soft-delete routes into API v1 router.
quickwit/quickwit-serve/src/lib.rs Adds soft_delete_api module.
quickwit/quickwit-search/src/tests.rs Adds integration-style tests ensuring soft-deletes are excluded from search/count paths.
quickwit/quickwit-search/src/soft_delete_query.rs Adds Tantivy Query wrapper excluding a set of doc IDs.
quickwit/quickwit-search/src/root.rs Adjusts metadata-count computation to subtract soft-deleted docs.
quickwit/quickwit-search/src/retry/search.rs Updates tests/fixtures for new soft_deleted_doc_ids field in split offsets.
quickwit/quickwit-search/src/retry/mod.rs Updates tests/fixtures for new soft_deleted_doc_ids field in split offsets.
quickwit/quickwit-search/src/list_fields_cache.rs Updates tests/fixtures for new soft_deleted_doc_ids field in split offsets.
quickwit/quickwit-search/src/lib.rs Plumbs split soft_deleted_doc_ids into SplitIdAndFooterOffsets.
quickwit/quickwit-search/src/leaf.rs Wraps queries with SoftDeleteQuery and adjusts count paths to subtract soft-deletes.
quickwit/quickwit-search/src/leaf_cache.rs Adds soft-delete length to cache key to invalidate on new deletions.
quickwit/quickwit-search/src/collector.rs Minor comment cleanup.
quickwit/quickwit-search/src/cluster_client.rs Updates tests/fixtures for new soft_deleted_doc_ids field in split offsets.
quickwit/quickwit-proto/src/getters.rs Generates getters for SoftDeleteDocumentsRequest.
quickwit/quickwit-proto/src/codegen/quickwit/quickwit.search.rs Adds soft_deleted_doc_ids to SplitIdAndFooterOffsets proto struct.
quickwit/quickwit-proto/src/codegen/quickwit/quickwit.metastore.rs Adds new metastore messages + RPC plumbing for soft-delete.
quickwit/quickwit-proto/protos/quickwit/search.proto Adds soft_deleted_doc_ids to split offsets message.
quickwit/quickwit-proto/protos/quickwit/metastore.proto Adds SoftDeleteDocuments RPC and request/response messages.
quickwit/quickwit-metastore/test-data/split-metadata/v0.9.json Updates regression JSON to include soft_deleted_doc_ids.
quickwit/quickwit-metastore/test-data/split-metadata/v0.9.expected.json Updates expected regression JSON to include soft_deleted_doc_ids.
quickwit/quickwit-metastore/test-data/split-metadata/v0.8.expected.json Updates expected regression JSON to include soft_deleted_doc_ids.
quickwit/quickwit-metastore/test-data/split-metadata/v0.7.expected.json Updates expected regression JSON to include soft_deleted_doc_ids.
quickwit/quickwit-metastore/test-data/file-backed-index/v0.9.json Updates file-backed index regression JSON to include soft-delete fields.
quickwit/quickwit-metastore/test-data/file-backed-index/v0.9.expected.json Updates expected file-backed index regression JSON.
quickwit/quickwit-metastore/test-data/file-backed-index/v0.8.expected.json Updates expected file-backed index regression JSON.
quickwit/quickwit-metastore/test-data/file-backed-index/v0.7.expected.json Updates expected file-backed index regression JSON.
quickwit/quickwit-metastore/src/tests/split.rs Adds metastore tests for soft-delete behavior, idempotency, limits, and atomicity.
quickwit/quickwit-metastore/src/tests/mod.rs Adds soft-delete tests to metastore test suite macro.
quickwit/quickwit-metastore/src/split_metadata.rs Adds soft_deleted_doc_ids to split metadata model + debug/test updates.
quickwit/quickwit-metastore/src/split_metadata_version.rs Adds backward-compatible serialization for soft_deleted_doc_ids.
quickwit/quickwit-metastore/src/metastore/postgres/utils.rs Adds split_ids filtering to Postgres list-splits query builder.
quickwit/quickwit-metastore/src/metastore/postgres/model.rs Adds Postgres model enum variant for soft-deleted doc IDs column.
quickwit/quickwit-metastore/src/metastore/postgres/metastore.rs Implements soft_delete_documents for Postgres metastore.
quickwit/quickwit-metastore/src/metastore/mod.rs Introduces per-split soft-delete limit + ListSplitsQuery::split_ids filter.
quickwit/quickwit-metastore/src/metastore/file_backed/mod.rs Implements soft_delete_documents in file-backed metastore service.
quickwit/quickwit-metastore/src/metastore/file_backed/file_backed_index/mod.rs Implements atomic soft-delete mutation + split-id filtering in file-backed index.
quickwit/quickwit-metastore/src/metastore/control_plane_metastore.rs Proxies soft_delete_documents through control-plane metastore.
quickwit/quickwit-indexing/src/soft_delete_query.rs Adds Tantivy query used to hard-delete soft-deleted doc IDs during merges.
quickwit/quickwit-indexing/src/models/split_attrs.rs Tracks replaced splits with soft-delete snapshots; initializes split metadata with empty soft-deletes.
quickwit/quickwit-indexing/src/models/publisher_message.rs Extends publish message to include replaced-splits snapshot data.
quickwit/quickwit-indexing/src/models/mod.rs Re-exports ReplacedSplit model.
quickwit/quickwit-indexing/src/models/indexed_split.rs Updates builder to use replaced_splits.
quickwit/quickwit-indexing/src/lib.rs Registers new soft_delete_query module.
quickwit/quickwit-indexing/src/actors/uploader.rs Propagates replaced-splits snapshot into SplitsUpdate (removes old HashSet-based dedup path).
quickwit/quickwit-indexing/src/actors/publisher.rs Adds best-effort detection/logging for soft-delete race during merge publish.
quickwit/quickwit-indexing/src/actors/packager.rs Updates tests/fixtures for replaced_splits.
quickwit/quickwit-indexing/src/actors/merge_executor.rs Applies soft-deleted-doc hard deletes during merges and adjusts split attrs accordingly; adds tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.fetch_all(tx.as_mut())
.await
.map_err(|sqlx_error| convert_sqlx_err(&index_uid.index_id, sqlx_error))?;

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Postgres implementation silently ignores split IDs that are missing or not in Published state (they simply won't be returned by FETCH_SPLITS_METADATA_QUERY), and still returns Ok(...). This diverges from the file-backed metastore which returns NotFound/FailedPrecondition; consider explicitly checking that all requested split IDs were fetched and returning an error otherwise (or documenting the intended behavior).

Suggested change
// Ensure that all requested split IDs were found and are in `Published` state.
// The query above only returns existing, published splits; any missing IDs would
// otherwise be silently ignored, diverging from the file-backed metastore behavior.
let fetched_split_ids: BTreeSet<&str> =
rows.iter().map(|(split_id, _)| split_id.as_str()).collect();
for requested_split_id in &requested_split_ids {
if !fetched_split_ids.contains(requested_split_id) {
return Err(MetastoreError::NotFound {
entity: EntityKind::Split {
split_id: (*requested_split_id).to_string(),
},
});
}
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this divergence is acceptable

Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch from 3f6b1e1 to e6ce383 Compare March 31, 2026 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants