Skip to content

CASSANALYTICS-122 : Use long for absolute times and support C* 5.0 extended localDeletionTime#183

Merged
yifan-c merged 12 commits intoapache:trunkfrom
skoppu22:ltime
Mar 26, 2026
Merged

CASSANALYTICS-122 : Use long for absolute times and support C* 5.0 extended localDeletionTime#183
yifan-c merged 12 commits intoapache:trunkfrom
skoppu22:ltime

Conversation

@skoppu22
Copy link
Copy Markdown
Contributor

@skoppu22 skoppu22 commented Mar 17, 2026

  • Integers cannot hold absolute times beyond year 2038
  • C* 5.0 fixed this, and supporting extended localDeletionTime. C* analytics need to support this especially for CDC events.
  • Changed deletedAt in Avro schema to long

CI run here : https://app.circleci.com/pipelines/github/skoppu22/cassandra-analytics/75/workflows/f6d5f03b-8373-4549-8daf-5f776f78ff96

@skoppu22 skoppu22 marked this pull request as draft March 17, 2026 17:02
@skoppu22 skoppu22 marked this pull request as ready for review March 18, 2026 10:39
Copy link
Copy Markdown
Contributor

@jyothsnakonisa jyothsnakonisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this shailaja, looks good to me left few minor comments

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you wanna revert these changes or are these intentional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional, CI pipelines failing otherwise.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class looks very similar to "LocalTableSchemaStore" can you check if you can use that instead of adding this new class. You might make small modifications in LocalTableSchemaStore if required

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalTableSchemaStore and new Avro transformer tests are in different modules. LocalTableSchemaStore loads avsc files from class path which doesn't exists in this other modules. Hence using TestSchemaStore, which doesn't load any schema files, instead allows registering of schemas.

Comment on lines +67 to +73
CORE_MAX_PARALLEL_FORKS: 2
CORE_TEST_MAX_HEAP_SIZE: "2048m"
CASSANDRA_USE_JDK11: <<parameters.use_jdk11>>
command: |
export GRADLE_OPTS="-Xmx2g -Dorg.gradle.jvmargs=-Xmx2g"
# Run compile/unit tests, skipping integration tests
./gradlew --stacktrace clean assemble check -x cassandra-analytics-integration-tests:test -Dcassandra.analytics.bridges.sstable_format=<<parameters.sstable_format>>
./gradlew --no-daemon --max-workers=2 --stacktrace clean assemble check -x cassandra-analytics-integration-tests:test -Dcassandra.analytics.bridges.sstable_format=<<parameters.sstable_format>>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making CI pipeline changes, can you try to tag the relevant tests with @Tag("Sequential")?

--no-daemon flag makes sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We don't know which tests causing this to tag them. Gradle getting killed while executing pipelines starting with spark3-* which run unit tests (gradlew check). Pipelines starting with int-* are running fine. This problem exists in trunk, not created by this PR. Running CI with --info didn't give any clues as it was bloating logs and CI killing gradle
  • And we will have the same problem of finding the culprit everytime CI is broken

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem exists in trunk, not created by this PR

All commits are made only after CI is green. I do not think the problem pre-exists in trunk. Likely relevant to the new tests. You will have to dig into the details to determine which test should be tagged with Sequential. The general idea is to run the resource consuming tests sequentially. @lukasz-antoniak, might have some tips.

we will have the same problem of finding the culprit everytime CI is broken

I share the same concern. This is why I am conservative on modifying the CI configuration, given that the CI issue is probably relevant to the current patch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I see same failures on trunk as well without any changes https://app.circleci.com/pipelines/github/skoppu22/cassandra-analytics?branch=trunk
    I am seeing this consistency on trunk, not sure is Circle CI backend for Europe region has different setup by any chance.

  • Finding the test which went in trunk causing this will be a separate task and takes time. We won't be able to merge in this PR until we do that, as my trunk pipelines also never turning green without this fix

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @skoppu22 on this one; we should band-aid here to get things working again and open another JIRA to follow up and fix it the right way. Blocking all work on root causing a regression from another commit should either be solved by:
a. git revert on that commit, or
b. a band-aid workaround w/a follow up JIRA to unblock work

In my humble opinion. :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with having follow up work to address the root cause.

My reason was that all the prior commits were only merged with green CI. This is the policy we have been following. Admittedly, there could be reruns and some flakiness. If the build starts to fail consistently, it could mean something specific to this patch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the 5.0 support w/parameterized testing go in w/out green CI?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yifan-c definitely something went in without green CI or something changed in Circle CI side. As I shared link above, trunk pipelines consistently failing with the same problem.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something went in between March 2nd and March 13 broke this. My trunk pipeline didn't had this problem on March 2nd, then consistently failing from March 13th.

Comment on lines +284 to +291
public static Map<String, Long> getTTL(CdcEvent event)
{
CdcEvent.TimeToLive ttl = event.getTtl();
if (ttl == null)
{
return null;
}
return mapOf(AvroConstants.TTL_KEY, ttl.ttlInSec, AvroConstants.DELETED_AT_KEY, ttl.expirationTimeInSec);
return mapOf(AvroConstants.TTL_KEY, (long) ttl.ttlInSec, AvroConstants.DELETED_AT_KEY, ttl.expirationTimeInSec);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why converting TTL to long, meanwhile the schema has it as int?
The change is probably unnecessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are creating map, one entry as (TTL_KEY, TTL value as int), second entry (DELETED_AT_KEY, expirationTimeInSec value as long), so we have to take higher of int and long. Otherwise we need to return Map<String, Number> so Number can represent both int and long.


@ParameterizedTest
@MethodSource("org.apache.cassandra.cdc.test.TestVersionSupplier#testVersions")
public void testBasicInsertAvroEncoding(CassandraVersion version)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file adds a few irrelevant test cases. Improving test coverage is in general a good thing. So I am fine with those test cases in this file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have updated two avro schema files, and there are no existing tests going through avro transformers. Hence had to add them. testMaxSupportedTtlAvroEncoding test below in this file verifies deletedAt is long and supporting max TTL as expected on both C* 4.x and 5.0

maxParallelForks = Math.max(Runtime.runtime.availableProcessors() * 2, 8)
maxHeapSize = System.getenv('CORE_TEST_MAX_HEAP_SIZE') ?: '3072m'
maxParallelForks = System.getenv('CORE_MAX_PARALLEL_FORKS')?.toInteger()
?: Math.max(Runtime.runtime.availableProcessors() * 2, 8)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try out the Sequential test tag first.

Copy link
Copy Markdown
Contributor Author

@skoppu22 skoppu22 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, pipelines broken on trunk, we don't know which test caused that, gradlew check is getting killed on CI

Dataset<Row> preExpiry = bulkReaderDataFrame(UDT_TTL_TABLE).load();
assertThat(preExpiry.collectAsList()).hasSize(ROW_COUNT);

Uninterruptibles.sleepUninterruptibly(80, TimeUnit.SECONDS);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please find a different way and avoid sleep the test for 80 seconds. We cannot have the CI sleep for 80 seconds.

If there is no other way, please delete all the tests in this file. They are irrelevant to the patch anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are relevant. CqlSet, CqlMap etc are implemented in C* 4.0 bridge and extended (i.e, reused) in C* 5.0 bridge. While 4.0 bridge expects int, 5.0 bridge expects long. I have modified these files below to use CqlType's method which is implemented separately for both bridges. And there are no existing tests for collections TTL going though this modified code path.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, I was not arguing the test is not valuable :p

The point is that we should not sleep for 80 seconds. Can you instead try to write data using a timestamp in the past?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have merged all 4 tests into one and reduced the sleep to 5 seconds. That way entire test file sleeps for max 5 seconds.

* Tests verifying Y2038 boundary behavior for {@link CqlType#tombstone} and {@link CqlType#expiring}
* in Cassandra 4.0.
*/
class CqlTypeY2038Test
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this test is valuable. Analytics/CDC does not set TTL, it only reads the TTL values from Cassandra. The TTL values are guaranteed to be valid already; otherwise the cql write fails already. Please delete this file.

Copy link
Copy Markdown
Contributor Author

@skoppu22 skoppu22 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are just ensuring checkedCast code we added indeed complaining when time indeed exceeds the range where int cannot accommodate. But the test can be removed as we are sure checkedCast does the job.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for removal is mentioned in #183 (comment), and similarly in this comment too, #183 (comment)

Please remove the test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was removed in the previous commit

Comment on lines -115 to 117
rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, type().serialize(o),
CellPath.create(ByteBuffer.wrap(UUIDGen.getTimeUUIDBytes()))));
rowBuilder.addCell(CqlType.expiring(cd, timestamp, ttl, now, type().serialize(o),
CellPath.create(ByteBuffer.wrap(UUIDGen.getTimeUUIDBytes()))));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change can be reverted once you drop CqlTypeY2038Test and revert the change in CqlType

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BufferCell.expiring in 4.0 bridge expects int for nowInSec, 5.0 bridge expects long. CqlList implemented in C* 5.0 bridge doesn't override this method to pass long. Hence I have modified this to use CqlType's method which is implemented separately for both bridges and uses appropriate type in corresponding bridges. Without this change, we will have to override this method again in 5.0 bridge to pass long value.

Comment on lines -130 to +131
rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, valueType().serialize(entry.getValue()),
CellPath.create(keyType().serialize(entry.getKey()))));
rowBuilder.addCell(CqlType.expiring(cd, timestamp, ttl, now, valueType().serialize(entry.getValue()),
CellPath.create(keyType().serialize(entry.getKey()))));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be reverted too

Copy link
Copy Markdown
Contributor Author

@skoppu22 skoppu22 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CqlMap is implemented in C* 4.0 bridge and not extended (i.e, reused) in C* 5.0 bridge. While 4.0 bridge expects int for nowInSec, 5.0 bridge expects long for the same. I have modified this to use CqlType's method which is implemented separately for both bridges and uses appropriate type in corresponding bridges. Without this change, we will have to implement CqlMap again in 5.0 bridge and override this function to pass long value.

Comment on lines -114 to +115
rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, ByteBufferUtil.EMPTY_BYTE_BUFFER,
CellPath.create(type().serialize(o))));
rowBuilder.addCell(CqlType.expiring(cd, timestamp, ttl, now, ByteBufferUtil.EMPTY_BYTE_BUFFER,
CellPath.create(type().serialize(o))));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can revert

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as explained above for CqlMap

CHANGES.txt Outdated
@@ -1,5 +1,6 @@
0.4.0
-----
* Fix year 2038 problem using long for absolute times and support C* 5.0 extended localDeletionTime
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The year 2038 problem sounds cool, but I feel it shades the actual change in this patch.

I would revise the change log entry to "Support extended deletion time in CDC for Cassandra 5.0" and update the JIRA title too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

0.4.0
-----
* Setup CI Pipeline with GitHub Actions (CASSANALYTICS-106)
* Support extended deletion time in CDC for Cassandra 5.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to the removal. GitHub Actions is not user facing

@skoppu22
Copy link
Copy Markdown
Contributor Author

@yifan-c yifan-c merged commit 58ea38d into apache:trunk Mar 26, 2026
37 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants