Unify the validation pipeline for full and partial data columns#16465
Unify the validation pipeline for full and partial data columns#16465aarshkshah1992 wants to merge 7 commits intotests/tests-for-partial-broadcasterfrom
Conversation
…ter' into fix/unify-validation-pipeline
|
@kasey This is now ready for review. |
| return NewVerifiedRODataColumn(rodc), true | ||
| // IsComplete returns true if all cells are now present in this column. | ||
| func (p *PartialDataColumn) IsComplete() bool { | ||
| return uint64(len(p.KzgCommitments)) == p.Included.Count() |
There was a problem hiding this comment.
I would reverse these casts to len(p.KzgCommitments) == int(p.Included.Count()) . It's weird for Count to return a uint64 in the first place, int is the standard for counting things.
There was a problem hiding this comment.
In fact I just made a note to self to modify Count to return an int - looking at all current usages, it is either compared to another instance of Count, or to an int that has to be cast to uint64 to accommodate this odd choice.
| // errSidecarParentNotSeen means RequireSidecarParentSeen failed. | ||
| errSidecarParentNotSeen = errors.New("parent root has not been seen") | ||
| // ErrSidecarParentSlotUnavailable means that looking up a sidecar parent's slot failed. | ||
| ErrSidecarParentSlotUnavailable = errors.New("parent slot unavailable") |
There was a problem hiding this comment.
The wording here is confusing - there's no case where we know the parent but not the slot - the root cause of not knowing the slot is that the parent isn't in forkchoice. I would change this to ErrSidecarParentUnknown = errors.New("parent not found in forkchoice").
| type PartialColumnVerifier struct { | ||
| DataColumnsVerifier | ||
| Column *blocks.PartialDataColumn | ||
| verifiedCellByIndex map[uint64]bool |
There was a problem hiding this comment.
Instead of tracking this with a separate map, can we use the pv.Column.Included? That would avoid extra conversion back and forth and the requirement to explicitly call MarkIncludedCellsVerified. Everything is simpler if we can maintain these invariants:
- only verified cells are set in
pv.Column pv.Column.Includedis updated when those cells are set, so that it also represents the set of verified cells.
I think this suggestion is half-baked because I'm not working through the untrusted cell path yet. One thought is to PartialColumnVerifier to have separate *blocks.PartialDataColumn fields for verified (and/or trusted) and unverified cells. When we verify an unverified PartialDataColumn, we swap the reference so they both point to the same thing.
| var shouldRepublish bool | ||
|
|
||
| if ourDataColumn == nil && hasMessage { | ||
| if ourVerifier == nil && hasMessage { |
There was a problem hiding this comment.
The logical flow of this method is clunky, and it's very long, due to these giant if statements. I would like to see large portions of this method refactored into a set of smaller methods, with early returns used to telegraph the flow more clearly.
| log.WithError(err).WithFields(logrus.Fields{ | ||
| "topic": topicID, | ||
| "columnIndex": columnIndex, | ||
| "numCommitments": len(header.KzgCommitments), |
There was a problem hiding this comment.
With these big multi-line log statements, I find it helpful to a method to logging helpers, as typically the same log fields are used across different cases. Actually I think we could make some changes to rpcWithFrom to make it more ergonomic in multiple ways - I think the type name could also be improved.
What type of PR is this?
What does this PR do? Why is it needed?
This PR unifies the validation pipelines for full data columns and partial data columns so they both satisfy the same set of validation requirements once the partial data column is full. It also ensures that we can get a verified data column from a partial data column only from the verification package after all the validation requirements have been satisfied.
We account for the fact that partial data column headers and cells arrive separately and in incremental parts.
Also, cells that we read from the EL are trusted and do not need to be verified.
Acknowledgements