fix(table): fast-append must inherit all parent manifests unconditionally#869
Draft
cassio-paesleme wants to merge 1 commit intoapache:mainfrom
Draft
Conversation
…ally fastAppendFiles.existingManifests() filtered parent manifests using HasAddedFiles() || HasExistingFiles(). Both methods return false when the manifest list entry has added_files_count=0 and existing_files_count=0, which is the standard Iceberg v2 representation for inherited manifests written by external writers such as Athena, Spark, and Trino. As a result, any data written by an external writer was silently dropped from the snapshot on the next iceberg-go fast-append. Queries against the table after the append returned only the iceberg-go-written rows; all previously existing data became invisible. A fast-append never removes or overwrites data files, so the correct behaviour is to inherit all manifests from the parent snapshot unconditionally. Remove the filter and return previous.Manifests() directly. Fixes: data loss when appending to an Iceberg table that was previously written by Athena or other external writers. Tested: new TestFastAppendInheritsZeroCountManifests reproduces the bug (FAIL before patch, PASS after) and the full ./table/... suite passes with no regressions.
laskoviymishka
approved these changes
Apr 10, 2026
Contributor
laskoviymishka
left a comment
There was a problem hiding this comment.
This is quite nasty bug.
, fix is correct and aligns with Java's FastAppend.apply() which unconditionally inherits all parent manifests via snapshot.allManifests(ops().io()). The removed filter has no equivalent in the reference implementation.
A few follow-ups to park for later:
- Doc HasAddedFiles() / HasExistingFiles() — these answer "did this snapshot add/track files via this manifest list entry?", not "does this manifest contain live files?". The naming is a semantic trap that caused this bug. Adding a clarifying doc comment on the ManifestFile interface would prevent future misuse.
- Test coverage for mergeAppendFiles — it embeds fastAppendFiles and inherits the fix, but there's no test exercising zero-count inherited manifests through the merge pipeline. Worth adding.
- End-to-end test with real manifest entries — current test asserts manifest list completeness (correct), but doesn't write actual Avro manifest files. An integration-style test that writes entries, fast-appends, then reads back all entries would catch any read-path issues with inherited manifests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
fastAppendFiles.existingManifests()filters parent manifests using:HasAddedFiles()returnsAddedFilesCount != 0andHasExistingFiles()returnsExistingFilesCount != 0(v2 manifest file). Both returnfalsewhen a manifest list entry hasadded_files_count=0andexisting_files_count=0, which is the standard Iceberg v2 representation for inherited manifests written by external writers (Athena, Spark, Trino, etc.).As a result, any data written by an external writer is silently dropped from the snapshot on the next iceberg-go fast-append. After the append, queries return only the iceberg-go-written rows; all previously existing data becomes invisible.
Root Cause Confirmed
Reproduction: create a table with Athena, insert 2 rows, append 1 row with iceberg-go. After the append the parent snapshot's manifest list entries have
added_files_count=0, existing_files_count=0. Both manifests are filtered out. Athena queries the new snapshot and sees only the 1 iceberg-go row.Diagnostic output from the new test (before fix):
This was discovered and confirmed during a production Iceberg table remediation at Docker. See docker/data-platform#406 for the full investigation.
Fix
A fast-append never removes or overwrites data files, so all parent manifests should be inherited unconditionally. Remove the filter and return
previous.Manifests()directly.Testing
TestFastAppendInheritsZeroCountManifestsreproduces the bug (FAIL before patch, PASS after): creates two zero-count manifests simulating an Athena-written snapshot, fast-appends one new data file, asserts all 3 manifests are present in the resulting snapshot../table/...suite passes with no regressions.