fix(aggs): top_hits panics with unknown fields by Totodore · Pull Request #2803 · quickwit-oss/tantivy

Totodore · 2026-01-14T11:04:54Z

Related to this comment:
quickwit-oss/quickwit#6088 (comment)

The current implementation top hits panics when incorrect fields are specified in docvalue_fields. This leads to non-ergonomic errors in quickwit for example when we query non-existent docvalue_fields.
fixes a bug where non-glob fast-fields where incorrectly matched because the dot notation of columns was incorrectly escaped.
Make allowed fields of TopHitsAggregationReq pub so it is possible to instantiate and modify a TopHitsAggregationReq.

fulmicoton · 2026-01-14T13:12:49Z

src/aggregation/metric/top_hits.rs

-                        .any(|(name, _)| name.as_str() == field)
-                {
-                    return Ok(vec![field.to_owned()]);
+                        .find(|(name, _)| &name.replace(JSON_PATH_SEGMENT_SEP_STR, ".") == field)


that's not the way to do things. Can you look into the code and see how it is done elsewhere?

I used FastFieldReaders::resolve_field which required to make it pub(crate), as I don't know the type of the column when checking if the field exists so I cant use column_opt. Is it Ok? or should I use something else?

There are a bunch of aggregations relying on columns.

Could you elaborate?

There are a many aggregations relying on columns, including aggregation that do not know beforehands the type of the targeted columns. Term aggregation for instance. Before increasing the visibility of a method, could you see if you can do what is done there?

The thing I was missing is that this aggregation type implementation is terrible and misleading.

…exists

fulmicoton · 2026-01-14T17:59:11Z

src/aggregation/metric/top_hits.rs

            unsupported_err("version")?;
        }

        self.doc_value_fields = self


this is pretty terrible.

Can you answer the question what is "self.doc_value_fields"?

Can you investigate whether we can fix this subaggregation in a deeper way?

You mean the full reallocation? In that case I agree and I can refactor that.

The only purpose of this seems to map globbed patterns to real fields, non glob-patterns should not trigger clones though.

No the reallocation is not the problem.

We need to be quantitative in this kind of discussion: this allocation will happen once per segment. This is negligible. Same thing for FxHashMap vs HashMap in quickwit and your concern about calling a method that is public but does "more".

On the other hand, the following code (again this is not your doing) is a massive footgun.

If you try to explain to someone what doc_value_fields is it would look something like this.

"well before we run validation, it is a human readable string that describes a user input, but after validation it is a string that tantivy uses to address the regular field or a json field that it uses internally"

It is even exposed to the external world because aggregations make it possible to get the list of columns used in an aggregation as it used for IO in quickwit.

fulmicoton · 2026-01-14T17:59:37Z

src/aggregation/metric/top_hits.rs

-                        .any(|(name, _)| name.as_str() == field)
-                {
+                if !field.contains('*') {
+                    reader.resolve_field(field)?.ok_or_else(|| {


I suspect dynamic_column_handles is better (as it is public). I am not sure though

dynamic_column_handles does a bit more work and internally calls resolve_field:
https://docs.rs/tantivy/latest/src/tantivy/fastfield/readers.rs.html#238-251

ok, but can it is not public :-/

Changing to pub(crate) is not so bad but unnecessary here.

fulmicoton · 2026-01-14T18:00:44Z

src/aggregation/metric/top_hits.rs

                let pattern = globbed_string_to_regex(field)?;
                let fields = reader
+                    .columnar()
                    .iter_columns()?


I think you mean to use it in Quickwit? I suspect it might not work if you have a large number of columns.

fulmicoton · 2026-01-14T18:01:07Z

src/aggregation/metric/top_hits.rs

-                );
+
+                if fields.is_empty() {
+                    return Err(TantivyError::SchemaError(format!(


fulmicoton · 2026-01-14T18:05:13Z

src/aggregation/metric/top_hits.rs

                    .iter_columns()?
                    .map(|(name, _)| {
                        // normalize path from internal fast field repr
                        name.replace(JSON_PATH_SEGMENT_SEP_STR, ".")


This seems wrong too

fulmicoton · 2026-01-14T18:06:30Z

@Totodore the whole sub aggregation is terrible. We should have never merged this PR originally. Can you approach it with clean eyes and clean it up?

Totodore · 2026-01-15T08:16:54Z

Thanks for your feedbacks @fulmicoton I'll try to provide a full refacto based on what I can learn from other aggregation implementations (terms for example,) rather than a small patch.

fulmicoton · 2026-01-15T12:14:13Z

@Totodore That would be awesome! Thank you!

For the most obscure part:
Quickwit need to pre-fetch all columns from S3 before running search. For this reason, aggregation need to explicitly declare the list of columns they rely on. We need to make sure that the function returns the column in the format expected.

Totodore · 2026-01-25T12:35:43Z

@fulmicoton I dig a bit and I don't really see how we could provide the dynamic field with pattern matching functionality available in docvalue_fields while pre-fetching columns in quickwit.

I think there are three solutions:

Remove the pattern matching on column names feature entirely.
Keep it but make it unusable from quickwit as we cannot pre-fetch the fast fields if there are wildcards.
Update get_fast_field_names to take a handle to the columns list to resolve the regexes dynamically when quickwit is getting all the fastfields on the aggregation.

Totodore added 3 commits January 14, 2026 12:02

fix(aggs): top_hits panics with unknown fields

f507a29

fix(aggs): top_hits panics with unknown fields

dc570f8

fix(aggs): top_hits incorrect escaping when matching non glob fields

862b4cf

Totodore marked this pull request as ready for review January 14, 2026 12:33

fix(aggs): make top hits agg req fields pub

d0700a8

fulmicoton reviewed Jan 14, 2026

View reviewed changes

fix(aggs): use FastFieldReaders::resolve_field to check if a field …

4d986ba

…exists

Totodore requested a review from fulmicoton January 14, 2026 15:03

fulmicoton reviewed Jan 14, 2026

View reviewed changes

Uh oh!

Conversation

Totodore commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fulmicoton Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fulmicoton Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fulmicoton Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fulmicoton commented Jan 14, 2026

Uh oh!

Totodore commented Jan 15, 2026

Uh oh!

fulmicoton commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Totodore commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Totodore commented Jan 14, 2026 •

edited

Loading

fulmicoton Jan 14, 2026 •

edited

Loading

fulmicoton Jan 14, 2026 •

edited

Loading

fulmicoton Jan 14, 2026 •

edited

Loading

fulmicoton commented Jan 15, 2026 •

edited

Loading

Totodore commented Jan 25, 2026 •

edited

Loading