CASSANALYTICS-147: BufferingInputStream fails to read last chunk by lukasz-antoniak · Pull Request #193 · apache/cassandra-analytics

lukasz-antoniak · 2026-04-02T11:44:04Z

…chunk

lukasz-antoniak · 2026-04-02T11:56:54Z

...analytics-core/src/test/java/org/apache/cassandra/spark/utils/BufferingInputStreamTests.java


        int bytesToRead = chunkSize * numChunks;
-        long skipAhead = size - bytesToRead + 1;
+        long skipAhead = size - bytesToRead;


I am not sure how effective the change in BufferingInputStream would affect skip() used during BIG index reading. All integration tests pass though, and I think hereby unit test is just a simulation.

lukasz-antoniak · 2026-04-02T12:00:22Z

...a-four-zero-bridge/src/test/java/org/apache/cassandra/io/util/CdcRandomAccessReaderTest.java


            // Deliver data in chunks until request is fulfilled
-            while (position < actualEnd)
+            while (position <= actualEnd) // range boundaries are inclusive


According to below JavaDoc, ranges should be considered inclusive.

/** * Asynchronously request bytes for the SSTable file component in the range start-end, and pass on to the StreamConsumer when available. * The start-end range is inclusive. * * @param start the start of the bytes range * @param end the end of the bytes range * @param consumer the StreamConsumer to return the bytes to when the request is complete */ void request(long start, long end, StreamConsumer consumer);

arjunashok · 2026-04-02T18:54:38Z

Seems like FileSystemSource.request() has a workaround (lines 94–99) that was compensating for the old off-by-one in BufferingInputStream.requestMore().
Comment in line 97: "Start-end range is inclusive but on the final request end == length so we need to exclude".

With this fix, end is now at most source.size() - 1, and length equals source.size(), so length <= end is always false.
As a result -

the code always takes the increment = 1 path, which happens to produce the correct end - start + 1, so no data corruption, but it's confusing dead code.
More importantly, close is never true, so the autoClose path in the finally block never triggers.
For sequential reads via FileSystemSSTable (non-BTI format, where autoClose = true), the RandomAccessFile handle is never closed after the last chunk, which is a file descriptor leak.

arjunashok · 2026-04-02T18:53:25Z

...analytics-core/src/test/java/org/apache/cassandra/spark/utils/BufferingInputStreamTests.java

    }

+    @Test
+    public void testUnalignedEndReading() throws IOException


Minor: We might want to assert on returnedBuffers.size() == 2 to catch regressions where extra or missing requests are issued

CASSANALYTICS-147: BufferingInputStream fails to read last unaligned …

38f0ac7

…chunk

lukasz-antoniak marked this pull request as ready for review April 2, 2026 11:44

lukasz-antoniak commented Apr 2, 2026

View reviewed changes

arjunashok reviewed Apr 2, 2026

View reviewed changes

lukasz-antoniak added 2 commits April 3, 2026 09:28

Apply review comments

68f3b13

Apply review comments

ffa112c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSANALYTICS-147: BufferingInputStream fails to read last chunk#193

CASSANALYTICS-147: BufferingInputStream fails to read last chunk#193
lukasz-antoniak wants to merge 3 commits intoapache:trunkfrom
lukasz-antoniak:CASSANALYTICS-147

lukasz-antoniak commented Apr 2, 2026

Uh oh!

lukasz-antoniak Apr 2, 2026 •

edited

Loading

Uh oh!

lukasz-antoniak Apr 2, 2026 •

edited

Loading

Uh oh!

arjunashok commented Apr 2, 2026

Uh oh!

arjunashok Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lukasz-antoniak commented Apr 2, 2026

Uh oh!

lukasz-antoniak Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukasz-antoniak Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arjunashok commented Apr 2, 2026

Uh oh!

arjunashok Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lukasz-antoniak Apr 2, 2026 •

edited

Loading

lukasz-antoniak Apr 2, 2026 •

edited

Loading