CASSANALYTICS-122 : Use long for absolute times and support C* 5.0 extended localDeletionTime by skoppu22 · Pull Request #183 · apache/cassandra-analytics

skoppu22 · 2026-03-17T17:02:18Z

Integers cannot hold absolute times beyond year 2038
C* 5.0 fixed this, and supporting extended localDeletionTime. C* analytics need to support this especially for CDC events.
Changed deletedAt in Avro schema to long

CI run here : https://app.circleci.com/pipelines/github/skoppu22/cassandra-analytics/75/workflows/f6d5f03b-8373-4549-8daf-5f776f78ff96

jyothsnakonisa

Thanks for working on this shailaja, looks good to me left few minor comments

jyothsnakonisa · 2026-03-18T12:16:25Z

.circleci/config.yml

Do you wanna revert these changes or are these intentional?

Intentional, CI pipelines failing otherwise.

cassandra-four-zero-bridge/src/test/java/org/apache/cassandra/spark/data/CqlTypeY2038Test.java

jyothsnakonisa · 2026-03-18T12:49:28Z

cassandra-analytics-cdc/src/test/java/org/apache/cassandra/cdc/avro/TestSchemaStore.java

This class looks very similar to "LocalTableSchemaStore" can you check if you can use that instead of adding this new class. You might make small modifications in LocalTableSchemaStore if required

LocalTableSchemaStore and new Avro transformer tests are in different modules. LocalTableSchemaStore loads avsc files from class path which doesn't exists in this other modules. Hence using TestSchemaStore, which doesn't load any schema files, instead allows registering of schemas.

yifan-c · 2026-03-19T05:29:25Z

.circleci/config.yml

+            CORE_MAX_PARALLEL_FORKS: 2
+            CORE_TEST_MAX_HEAP_SIZE: "2048m"
            CASSANDRA_USE_JDK11: <<parameters.use_jdk11>>
          command: |
+            export GRADLE_OPTS="-Xmx2g -Dorg.gradle.jvmargs=-Xmx2g"
            # Run compile/unit tests, skipping integration tests
-            ./gradlew --stacktrace clean assemble check -x cassandra-analytics-integration-tests:test -Dcassandra.analytics.bridges.sstable_format=<<parameters.sstable_format>>
+            ./gradlew --no-daemon --max-workers=2 --stacktrace clean assemble check -x cassandra-analytics-integration-tests:test -Dcassandra.analytics.bridges.sstable_format=<<parameters.sstable_format>>


Instead of making CI pipeline changes, can you try to tag the relevant tests with @Tag("Sequential")?

--no-daemon flag makes sense.

We don't know which tests causing this to tag them. Gradle getting killed while executing pipelines starting with spark3-* which run unit tests (gradlew check). Pipelines starting with int-* are running fine. This problem exists in trunk, not created by this PR. Running CI with --info didn't give any clues as it was bloating logs and CI killing gradle

And we will have the same problem of finding the culprit everytime CI is broken

This problem exists in trunk, not created by this PR

All commits are made only after CI is green. I do not think the problem pre-exists in trunk. Likely relevant to the new tests. You will have to dig into the details to determine which test should be tagged with Sequential. The general idea is to run the resource consuming tests sequentially. @lukasz-antoniak, might have some tips.

we will have the same problem of finding the culprit everytime CI is broken

I share the same concern. This is why I am conservative on modifying the CI configuration, given that the CI issue is probably relevant to the current patch.

I see same failures on trunk as well without any changes https://app.circleci.com/pipelines/github/skoppu22/cassandra-analytics?branch=trunk
I am seeing this consistency on trunk, not sure is Circle CI backend for Europe region has different setup by any chance.

Finding the test which went in trunk causing this will be a separate task and takes time. We won't be able to merge in this PR until we do that, as my trunk pipelines also never turning green without this fix

I'm with @skoppu22 on this one; we should band-aid here to get things working again and open another JIRA to follow up and fix it the right way. Blocking all work on root causing a regression from another commit should either be solved by:
a. git revert on that commit, or
b. a band-aid workaround w/a follow up JIRA to unblock work

In my humble opinion. :)

I am good with having follow up work to address the root cause.

My reason was that all the prior commits were only merged with green CI. This is the policy we have been following. Admittedly, there could be reruns and some flakiness. If the build starts to fail consistently, it could mean something specific to this patch.

Did the 5.0 support w/parameterized testing go in w/out green CI?

@yifan-c definitely something went in without green CI or something changed in Circle CI side. As I shared link above, trunk pipelines consistently failing with the same problem.

Something went in between March 2nd and March 13 broke this. My trunk pipeline didn't had this problem on March 2nd, then consistently failing from March 13th.

yifan-c · 2026-03-19T05:33:25Z

cassandra-analytics-cdc-codec/src/main/java/org/apache/cassandra/cdc/avro/CdcEventUtils.java

+    public static Map<String, Long> getTTL(CdcEvent event)
    {
        CdcEvent.TimeToLive ttl = event.getTtl();
        if (ttl == null)
        {
            return null;
        }
-        return mapOf(AvroConstants.TTL_KEY, ttl.ttlInSec, AvroConstants.DELETED_AT_KEY, ttl.expirationTimeInSec);
+        return mapOf(AvroConstants.TTL_KEY, (long) ttl.ttlInSec, AvroConstants.DELETED_AT_KEY, ttl.expirationTimeInSec);


Why converting TTL to long, meanwhile the schema has it as int?
The change is probably unnecessary.

Here we are creating map, one entry as (TTL_KEY, TTL value as int), second entry (DELETED_AT_KEY, expirationTimeInSec value as long), so we have to take higher of int and long. Otherwise we need to return Map<String, Number> so Number can represent both int and long.

yifan-c · 2026-03-19T05:39:42Z

...a-analytics-cdc/src/test/java/org/apache/cassandra/cdc/AvroGenericRecordTransformerTest.java

+
+    @ParameterizedTest
+    @MethodSource("org.apache.cassandra.cdc.test.TestVersionSupplier#testVersions")
+    public void testBasicInsertAvroEncoding(CassandraVersion version)


The file adds a few irrelevant test cases. Improving test coverage is in general a good thing. So I am fine with those test cases in this file.

We have updated two avro schema files, and there are no existing tests going through avro transformers. Hence had to add them. testMaxSupportedTtlAvroEncoding test below in this file verifies deletedAt is long and supporting max TTL as expected on both C* 4.x and 5.0

yifan-c · 2026-03-19T05:41:06Z

cassandra-analytics-core/build.gradle

-    maxParallelForks = Math.max(Runtime.runtime.availableProcessors() * 2, 8)
+    maxHeapSize = System.getenv('CORE_TEST_MAX_HEAP_SIZE') ?: '3072m'
+    maxParallelForks = System.getenv('CORE_MAX_PARALLEL_FORKS')?.toInteger()
+                       ?: Math.max(Runtime.runtime.availableProcessors() * 2, 8)


Please try out the Sequential test tag first.

As I mentioned above, pipelines broken on trunk, we don't know which test caused that, gradlew check is getting killed on CI

yifan-c · 2026-03-19T05:44:10Z

...egration-tests/src/test/java/org/apache/cassandra/analytics/BulkWriteComplexTypeTtlTest.java

+        Dataset<Row> preExpiry = bulkReaderDataFrame(UDT_TTL_TABLE).load();
+        assertThat(preExpiry.collectAsList()).hasSize(ROW_COUNT);
+
+        Uninterruptibles.sleepUninterruptibly(80, TimeUnit.SECONDS);


Please find a different way and avoid sleep the test for 80 seconds. We cannot have the CI sleep for 80 seconds.

If there is no other way, please delete all the tests in this file. They are irrelevant to the patch anyway.

These tests are relevant. CqlSet, CqlMap etc are implemented in C* 4.0 bridge and extended (i.e, reused) in C* 5.0 bridge. While 4.0 bridge expects int, 5.0 bridge expects long. I have modified these files below to use CqlType's method which is implemented separately for both bridges. And there are no existing tests for collections TTL going though this modified code path.

Just to be clear, I was not arguing the test is not valuable :p

The point is that we should not sleep for 80 seconds. Can you instead try to write data using a timestamp in the past?

I have merged all 4 tests into one and reduced the sleep to 5 seconds. That way entire test file sleeps for max 5 seconds.

yifan-c · 2026-03-19T05:55:08Z

cassandra-four-zero-bridge/src/test/java/org/apache/cassandra/spark/data/CqlTypeY2038Test.java

+ * Tests verifying Y2038 boundary behavior for {@link CqlType#tombstone} and {@link CqlType#expiring}
+ * in Cassandra 4.0.
+ */
+class CqlTypeY2038Test


I do not think this test is valuable. Analytics/CDC does not set TTL, it only reads the TTL values from Cassandra. The TTL values are guaranteed to be valid already; otherwise the cql write fails already. Please delete this file.

We are just ensuring checkedCast code we added indeed complaining when time indeed exceeds the range where int cannot accommodate. But the test can be removed as we are sure checkedCast does the job.

The reason for removal is mentioned in #183 (comment), and similarly in this comment too, #183 (comment)

Please remove the test.

This one was removed in the previous commit

yifan-c · 2026-03-19T05:58:01Z

...a-four-zero-types/src/main/java/org/apache/cassandra/spark/data/complex/AbstractCqlList.java

-                rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, type().serialize(o),
-                                                       CellPath.create(ByteBuffer.wrap(UUIDGen.getTimeUUIDBytes()))));
+                rowBuilder.addCell(CqlType.expiring(cd, timestamp, ttl, now, type().serialize(o),
+                                                    CellPath.create(ByteBuffer.wrap(UUIDGen.getTimeUUIDBytes()))));
            }


I think this change can be reverted once you drop CqlTypeY2038Test and revert the change in CqlType

BufferCell.expiring in 4.0 bridge expects int for nowInSec, 5.0 bridge expects long. CqlList implemented in C* 5.0 bridge doesn't override this method to pass long. Hence I have modified this to use CqlType's method which is implemented separately for both bridges and uses appropriate type in corresponding bridges. Without this change, we will have to override this method again in 5.0 bridge to pass long value.

yifan-c · 2026-03-19T05:58:13Z

cassandra-four-zero-types/src/main/java/org/apache/cassandra/spark/data/complex/CqlMap.java

-                rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, valueType().serialize(entry.getValue()),
-                                                       CellPath.create(keyType().serialize(entry.getKey()))));
+                rowBuilder.addCell(CqlType.expiring(cd, timestamp, ttl, now, valueType().serialize(entry.getValue()),
+                                                    CellPath.create(keyType().serialize(entry.getKey()))));


can be reverted too

CqlMap is implemented in C* 4.0 bridge and not extended (i.e, reused) in C* 5.0 bridge. While 4.0 bridge expects int for nowInSec, 5.0 bridge expects long for the same. I have modified this to use CqlType's method which is implemented separately for both bridges and uses appropriate type in corresponding bridges. Without this change, we will have to implement CqlMap again in 5.0 bridge and override this function to pass long value.

yifan-c · 2026-03-19T05:58:22Z

cassandra-four-zero-types/src/main/java/org/apache/cassandra/spark/data/complex/CqlSet.java

-                rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, ByteBufferUtil.EMPTY_BYTE_BUFFER,
-                                                       CellPath.create(type().serialize(o))));
+                rowBuilder.addCell(CqlType.expiring(cd, timestamp, ttl, now, ByteBufferUtil.EMPTY_BYTE_BUFFER,
+                                                    CellPath.create(type().serialize(o))));


Same as explained above for CqlMap

yifan-c · 2026-03-19T06:04:44Z

CHANGES.txt

@@ -1,5 +1,6 @@
 0.4.0
 -----
+ * Fix year 2038 problem using long for absolute times and support C* 5.0 extended localDeletionTime


The year 2038 problem sounds cool, but I feel it shades the actual change in this patch.

I would revise the change log entry to "Support extended deletion time in CDC for Cassandra 5.0" and update the JIRA title too.

yifan-c · 2026-03-25T19:40:22Z

CHANGES.txt

 0.4.0
 -----
- * Setup CI Pipeline with GitHub Actions (CASSANALYTICS-106)
+ * Support extended deletion time in CDC for Cassandra 5.0


👍 to the removal. GitHub Actions is not user facing

skoppu22 · 2026-03-26T11:50:52Z

Latest CI run https://app.circleci.com/pipelines/github/skoppu22/cassandra-analytics/82/workflows/cd78a753-181a-46e0-8ee1-c8889976ab07

skoppu22 added 4 commits March 16, 2026 13:22

absolute times need to be long

cdfccd1

Merge branch 'trunk' into ltime

5baec3e

integration tests

a2180fb

CI workaround

7459d2e

skoppu22 marked this pull request as draft March 17, 2026 17:02

Merge branch 'trunk' into ltime

b15f491

skoppu22 force-pushed the ltime branch from 2079dd3 to e26154a Compare March 17, 2026 17:49

try CI split

2bacb9b

skoppu22 force-pushed the ltime branch from e26154a to 2bacb9b Compare March 17, 2026 19:20

skoppu22 marked this pull request as ready for review March 18, 2026 10:39

don't change avro schema version

4890d64

jyothsnakonisa approved these changes Mar 18, 2026

View reviewed changes

skoppu22 added 2 commits March 18, 2026 14:13

reduce gradle test paralleism

fc56cca

try lower heap size for CI

30d895f

yifan-c requested changes Mar 19, 2026

View reviewed changes

skoppu22 added 3 commits March 19, 2026 12:53

comments

dc9d184

Merge branch 'trunk' into ltime

a3e32bd

comments

3c02705

yifan-c reviewed Mar 25, 2026

View reviewed changes

yifan-c approved these changes Mar 25, 2026

View reviewed changes

yifan-c merged commit 58ea38d into apache:trunk Mar 26, 2026
37 of 38 checks passed

Conversation

skoppu22 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jyothsnakonisa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skoppu22 Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skoppu22 Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skoppu22 Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skoppu22 commented Mar 17, 2026 •

edited

Loading

skoppu22 Mar 19, 2026 •

edited

Loading

skoppu22 Mar 19, 2026 •

edited

Loading

skoppu22 Mar 19, 2026 •

edited

Loading