Add support for row lineage in v3 by dttung2905 · Pull Request #735 · apache/iceberg-go

dttung2905 · 2026-02-17T18:39:13Z

This should fully support read path and partially support write path
Unsupported write path:

Rewrite/compaction: When overwrite or rewrite copies existing rows into new data files, existing non-null _row_id and _last_updated_sequence_number are not copied into the new files. Row lineage is preserved for appends and for metadata/manifest list; it is not yet preserved when rewriting data files.
Explicit null columns on append: New data files do not write _row_id/_last_updated_sequence_number as null columns (they are omitted); that is allowed by the spec and is not planned in this PR.

A data file with only new rows for the table may omit the _last_updated_sequence_number and _row_id. If the columns are missing, readers should treat both columns as if they exist and are set to null for all rows.

laskoviymishka

The read path structure is solid and the Java alignment is largely correct — field IDs, doc strings, manifest list writer semantics, and the Arrow synthesis pipeline all check out.

Three issues need to land before this merges.

First row ID inheritance diverges from Java spec (manifest.go ReadEntry). Java's idAssigner unconditionally executes nextRowId += file.recordCount() for every file — null or explicit. The Go implementation only advances nextFirstRowID when FirstRowIDField == nil, so a file with an explicit first_row_id silently resets the baseline for all subsequent null files in the same manifest, producing overlapping row ID ranges. The fix and the *int64 cleanup land together: initialize nextFirstRowID eagerly in NewManifestReader, then unconditionally advance after the conditional assign.

Wrong sequence number for DataSequenceNumber (scanner.go PlanFiles). e.SequenceNum() is the manifest entry's metadata sequence number; _last_updated_sequence_number per spec requires the data sequence number — entry.dataSequenceNumber() in Java, e.FileSequenceNum() in Go. These are identical for freshly ADDED entries but diverge for EXISTING entries carried across compacted manifests, where the bug silently inflates the reported sequence number.

ManifestFile.FirstRowId() must be FirstRowID() before this public interface is merged. The PR already correctly renames the struct field to FirstRowID; the exported method should follow the same Go acronym convention. Fixing a public interface post-merge requires a breaking change.

manifest.go

table/scanner.go

manifest.go

table/arrow_scanner.go

manifest.go

laskoviymishka

One more thing: memory leak, aside that - all good.

Same root cause as #762 — NewArray() starts at refcount 1, NewRecordBatch retains to refcount 2, local refs are never dropped so memory is never freed. Two places: the production release loop in synthesizeRowLineageColumns and the test setup in TestSynthesizeRowLineageColumns. The test fix is as important as the production fix — NewCheckedAllocator would have caught this immediately and prevents regressions of the same class.

table/arrow_scanner.go

table/scanner_internal_test.go

dttung2905 · 2026-03-09T22:08:44Z

@zeroshade could you help to review this PR as well?

laskoviymishka

LGTM!

zeroshade · 2026-03-15T04:06:58Z

@dttung2905 sorry for the delay, i'll give this a review tomorrow or monday

table/arrow_scanner.go

table/scanner.go

zeroshade · 2026-03-27T15:49:47Z

@dttung2905 is this ready for a new review?

dttung2905 · 2026-03-27T15:53:41Z

@dttung2905 is this ready for a new review?

Yes it is @zeroshade

zeroshade

Looking good so far, though there's still the outstanding question at https://github.com/apache/iceberg-go/pull/735/changes#r2943078618

table/arrow_scanner.go

laskoviymishka · 2026-03-28T16:53:07Z

metadata_columns.go

+}
+
+// IsMetadataColumn returns true if the field ID is a reserved metadata column (e.g. row lineage).
+func IsMetadataColumn(fieldID int) bool {


nit: Now that IsMetadataColumn exists, worth adding a guard in NewSchema (or update_schema.go:AddColumn) that rejects user-defined fields with reserved IDs. Could be as follow-up PR for this.

I will try to follow this up with another PR. I think it is getting big to review now

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

…n API

dttung2905 mentioned this pull request Feb 17, 2026

feat: Wire V3 snapshot producer to row-lineage state #728

Merged

dttung2905 force-pushed the row-lineage-v3 branch from 6af257b to 4da5bf5 Compare February 20, 2026 23:01

laskoviymishka suggested changes Mar 3, 2026

View reviewed changes

dttung2905 force-pushed the row-lineage-v3 branch from 9256510 to 61787dd Compare March 3, 2026 23:01

dttung2905 requested a review from laskoviymishka March 4, 2026 16:45

laskoviymishka suggested changes Mar 4, 2026

View reviewed changes

table/arrow_scanner.go Show resolved Hide resolved

table/scanner_internal_test.go Outdated Show resolved Hide resolved

dttung2905 force-pushed the row-lineage-v3 branch from 3b3c7e2 to b21cd14 Compare March 6, 2026 18:05

dttung2905 marked this pull request as ready for review March 6, 2026 18:06

dttung2905 requested a review from laskoviymishka March 6, 2026 18:06

laskoviymishka approved these changes Mar 9, 2026

View reviewed changes

zeroshade requested changes Mar 16, 2026

View reviewed changes

zeroshade requested changes Mar 27, 2026

View reviewed changes

laskoviymishka reviewed Mar 28, 2026

View reviewed changes

laskoviymishka mentioned this pull request Mar 28, 2026

feat(table): reject reserved metadata column IDs in user schemas #821

Draft

dttung2905 added 9 commits April 6, 2026 12:49

Add support for row lineage in v3

2526b68

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

Fix CI failure

9af79e3

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

Fix CI failure

ae65914

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

Fixes from codereview

ed7a5a2

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

Fixes from codereview

21586f0

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

Fix leak

59dfe22

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

Fixes from code review

5376b5c

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

Fixes from code review

08b746c

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

Fix PartitionField in TestV3DataManifestFirstRowIDInheritance for mai…

0b6298b

…n API

dttung2905 force-pushed the row-lineage-v3 branch from 96ef322 to 0b6298b Compare April 6, 2026 11:53

Conversation

dttung2905 commented Feb 17, 2026

Uh oh!

laskoviymishka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laskoviymishka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dttung2905 commented Mar 9, 2026

Uh oh!

laskoviymishka left a comment

Choose a reason for hiding this comment

Uh oh!

zeroshade commented Mar 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zeroshade commented Mar 27, 2026

Uh oh!

dttung2905 commented Mar 27, 2026

Uh oh!

zeroshade left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laskoviymishka Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

dttung2905 Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants