Skip to content

fix(table): fast-append must inherit all parent manifests unconditionally#869

Draft
cassio-paesleme wants to merge 1 commit intoapache:mainfrom
cassio-paesleme:fix/fast-append-drops-zero-count-manifests-v2
Draft

fix(table): fast-append must inherit all parent manifests unconditionally#869
cassio-paesleme wants to merge 1 commit intoapache:mainfrom
cassio-paesleme:fix/fast-append-drops-zero-count-manifests-v2

Conversation

@cassio-paesleme
Copy link
Copy Markdown

Problem

fastAppendFiles.existingManifests() filters parent manifests using:

if m.HasAddedFiles() || m.HasExistingFiles() || m.SnapshotID() == fa.base.snapshotID {
    existing = append(existing, m)
}

HasAddedFiles() returns AddedFilesCount != 0 and HasExistingFiles() returns ExistingFilesCount != 0 (v2 manifest file). Both return false when a manifest list entry has added_files_count=0 and existing_files_count=0, which is the standard Iceberg v2 representation for inherited manifests written by external writers (Athena, Spark, Trino, etc.).

As a result, any data written by an external writer is silently dropped from the snapshot on the next iceberg-go fast-append. After the append, queries return only the iceberg-go-written rows; all previously existing data becomes invisible.

Root Cause Confirmed

Reproduction: create a table with Athena, insert 2 rows, append 1 row with iceberg-go. After the append the parent snapshot's manifest list entries have added_files_count=0, existing_files_count=0. Both manifests are filtered out. Athena queries the new snapshot and sees only the 1 iceberg-go row.

Diagnostic output from the new test (before fix):

Parent has 2 manifests:
[0] snapshot_id=... added_files=0 existing_files=0 HasAdded=false HasExisting=false
[1] snapshot_id=... added_files=0 existing_files=0 HasAdded=false HasExisting=false

This was discovered and confirmed during a production Iceberg table remediation at Docker. See docker/data-platform#406 for the full investigation.

Fix

A fast-append never removes or overwrites data files, so all parent manifests should be inherited unconditionally. Remove the filter and return previous.Manifests() directly.

Testing

  • New test TestFastAppendInheritsZeroCountManifests reproduces the bug (FAIL before patch, PASS after): creates two zero-count manifests simulating an Athena-written snapshot, fast-appends one new data file, asserts all 3 manifests are present in the resulting snapshot.
  • Full ./table/... suite passes with no regressions.

…ally

fastAppendFiles.existingManifests() filtered parent manifests using
HasAddedFiles() || HasExistingFiles(). Both methods return false when the
manifest list entry has added_files_count=0 and existing_files_count=0,
which is the standard Iceberg v2 representation for inherited manifests
written by external writers such as Athena, Spark, and Trino.

As a result, any data written by an external writer was silently dropped
from the snapshot on the next iceberg-go fast-append. Queries against
the table after the append returned only the iceberg-go-written rows;
all previously existing data became invisible.

A fast-append never removes or overwrites data files, so the correct
behaviour is to inherit all manifests from the parent snapshot
unconditionally. Remove the filter and return previous.Manifests()
directly.

Fixes: data loss when appending to an Iceberg table that was previously
written by Athena or other external writers.

Tested: new TestFastAppendInheritsZeroCountManifests reproduces the bug
(FAIL before patch, PASS after) and the full ./table/... suite passes
with no regressions.
Copy link
Copy Markdown
Contributor

@laskoviymishka laskoviymishka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite nasty bug.

:shipit: , fix is correct and aligns with Java's FastAppend.apply() which unconditionally inherits all parent manifests via snapshot.allManifests(ops().io()). The removed filter has no equivalent in the reference implementation.

A few follow-ups to park for later:

  1. Doc HasAddedFiles() / HasExistingFiles() — these answer "did this snapshot add/track files via this manifest list entry?", not "does this manifest contain live files?". The naming is a semantic trap that caused this bug. Adding a clarifying doc comment on the ManifestFile interface would prevent future misuse.
  2. Test coverage for mergeAppendFiles — it embeds fastAppendFiles and inherits the fix, but there's no test exercising zero-count inherited manifests through the merge pipeline. Worth adding.
  3. End-to-end test with real manifest entries — current test asserts manifest list completeness (correct), but doesn't write actual Avro manifest files. An integration-style test that writes entries, fast-appends, then reads back all entries would catch any read-path issues with inherited manifests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants