Skip to content

Unify the validation pipeline for full and partial data columns#16465

Open
aarshkshah1992 wants to merge 7 commits intotests/tests-for-partial-broadcasterfrom
fix/unify-validation-pipeline
Open

Unify the validation pipeline for full and partial data columns#16465
aarshkshah1992 wants to merge 7 commits intotests/tests-for-partial-broadcasterfrom
fix/unify-validation-pipeline

Conversation

@aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented Mar 4, 2026

What type of PR is this?

Uncomment one line below and remove others.
Feature

What does this PR do? Why is it needed?

This PR unifies the validation pipelines for full data columns and partial data columns so they both satisfy the same set of validation requirements once the partial data column is full. It also ensures that we can get a verified data column from a partial data column only from the verification package after all the validation requirements have been satisfied.

We account for the fact that partial data column headers and cells arrive separately and in incremental parts.
Also, cells that we read from the EL are trusted and do not need to be verified.

Acknowledgements

  • I have read CONTRIBUTING.md.
  • I have included a uniquely named changelog fragment file.
  • I have added a description with sufficient context for reviewers to understand this PR.
  • I have tested that my changes work as expected and I added a testing plan to the PR description (if applicable).

@aarshkshah1992 aarshkshah1992 changed the title unify the validation pipeline for full and partial data columns [WIP] Unify the validation pipeline for full and partial data columns Mar 4, 2026
@aarshkshah1992 aarshkshah1992 changed the title [WIP] Unify the validation pipeline for full and partial data columns Unify the validation pipeline for full and partial data columns Mar 5, 2026
@aarshkshah1992 aarshkshah1992 requested a review from kasey March 5, 2026 12:46
@aarshkshah1992
Copy link
Contributor Author

@kasey This is now ready for review.

return NewVerifiedRODataColumn(rodc), true
// IsComplete returns true if all cells are now present in this column.
func (p *PartialDataColumn) IsComplete() bool {
return uint64(len(p.KzgCommitments)) == p.Included.Count()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reverse these casts to len(p.KzgCommitments) == int(p.Included.Count()) . It's weird for Count to return a uint64 in the first place, int is the standard for counting things.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I just made a note to self to modify Count to return an int - looking at all current usages, it is either compared to another instance of Count, or to an int that has to be cast to uint64 to accommodate this odd choice.

// errSidecarParentNotSeen means RequireSidecarParentSeen failed.
errSidecarParentNotSeen = errors.New("parent root has not been seen")
// ErrSidecarParentSlotUnavailable means that looking up a sidecar parent's slot failed.
ErrSidecarParentSlotUnavailable = errors.New("parent slot unavailable")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording here is confusing - there's no case where we know the parent but not the slot - the root cause of not knowing the slot is that the parent isn't in forkchoice. I would change this to ErrSidecarParentUnknown = errors.New("parent not found in forkchoice").

type PartialColumnVerifier struct {
DataColumnsVerifier
Column *blocks.PartialDataColumn
verifiedCellByIndex map[uint64]bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of tracking this with a separate map, can we use the pv.Column.Included? That would avoid extra conversion back and forth and the requirement to explicitly call MarkIncludedCellsVerified. Everything is simpler if we can maintain these invariants:

  • only verified cells are set in pv.Column
  • pv.Column.Included is updated when those cells are set, so that it also represents the set of verified cells.

I think this suggestion is half-baked because I'm not working through the untrusted cell path yet. One thought is to PartialColumnVerifier to have separate *blocks.PartialDataColumn fields for verified (and/or trusted) and unverified cells. When we verify an unverified PartialDataColumn, we swap the reference so they both point to the same thing.

var shouldRepublish bool

if ourDataColumn == nil && hasMessage {
if ourVerifier == nil && hasMessage {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logical flow of this method is clunky, and it's very long, due to these giant if statements. I would like to see large portions of this method refactored into a set of smaller methods, with early returns used to telegraph the flow more clearly.

log.WithError(err).WithFields(logrus.Fields{
"topic": topicID,
"columnIndex": columnIndex,
"numCommitments": len(header.KzgCommitments),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these big multi-line log statements, I find it helpful to a method to logging helpers, as typically the same log fields are used across different cases. Actually I think we could make some changes to rpcWithFrom to make it more ergonomic in multiple ways - I think the type name could also be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants