Skip to content

Ranking Parent Documents with Nested Features#946

Open
dhruva-r wants to merge 5 commits intomainfrom
dhruva_parents_documents_can_reference_nested_fields
Open

Ranking Parent Documents with Nested Features#946
dhruva-r wants to merge 5 commits intomainfrom
dhruva_parents_documents_can_reference_nested_fields

Conversation

@dhruva-r
Copy link
Copy Markdown
Contributor

Jira Ticket

Summary of Changes

  • Add _CHILDREN. prefix support in SegmentDocLookup, allowing scoring plugins running on parent documents to access aggregated doc values from their child (nested) documents (e.g., _CHILDREN.appointments.price
    returns all appointment prices across a parent's children)
  • Store _nested_child_count as a new meta field on parent documents at index time so ChildAggregatedDocValues can compute the child range without backward scanning
  • Filter children by nested path using BitSetProducer, so indexes with multiple nested types (e.g., appointments and reviews) return only values from the target path
  • Add ChildFilter proto message on SearchRequest, enabling query-time filtering of which children are included in _CHILDREN. aggregation (e.g., only appointments with price >= 50)
  • Cache the child field iterator in ChildAggregatedDocValues for efficiency, with a lastParentDocId guard to prevent incorrect backward iteration on repeated access for the same document

Tests

@dhruva-r dhruva-r marked this pull request as ready for review April 1, 2026 21:29
@dhruva-r dhruva-r requested review from vim345 and waziqi89 April 1, 2026 21:29
@dhruva-r
Copy link
Copy Markdown
Contributor Author

dhruva-r commented Apr 1, 2026

If a different approach is required or if this approach is inefficient let me know.

Copy link
Copy Markdown
Contributor

@vim345 vim345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I just had one comment.

Also, AI mentioned the following about missing test coverage:

 What's missing:

  1. No test for the issue #1 bug path — there's no test that verifies the _nested_child_count doc values accessor is reused across multiple documents in the same segment (i.e. no performance regression test, and the bug wouldn't cause a test failure — just extra allocations).
  2. No test for a parent with zero children — the childCount <= 0 early-return path in setDocId is untested.
  3. No test for documents without nested fields — i.e. calling _CHILDREN. on a non-parent document in a mixed index.
  4. No test for the lastParentDocId caching guard — the early-return if (parentDocId == lastParentDocId) return; is never exercised by a test that calls setDocId twice with the same doc ID.
  5. ChildFilterIntegrationTest uses scores to encode results rather than fetching actual field values, which makes assertions somewhat fragile and hard to read — though this is a style concern rather than a correctness gap.

Is it possible to address these as well?

return;
}

LoadedDocValues<?> childCountDocValues = childCountFieldDef.getDocValues(leafContext);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDocValues(leafContext) is called on every setDocId invocation rather than once in the constructor (as is done for childFieldDocValues). This allocates a new accessor on every parent document scored. It should be initialized once in the constructor and reused like childFieldDocValues.

Is it possible for you to do some benchmarking to see what the performance looks like with or without the above suggestion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants