CASSANALYTICS-122 : Use long for absolute times and support C* 5.0 extended localDeletionTime#183
CASSANALYTICS-122 : Use long for absolute times and support C* 5.0 extended localDeletionTime#183yifan-c merged 12 commits intoapache:trunkfrom
Conversation
jyothsnakonisa
left a comment
There was a problem hiding this comment.
Thanks for working on this shailaja, looks good to me left few minor comments
There was a problem hiding this comment.
Do you wanna revert these changes or are these intentional?
There was a problem hiding this comment.
Intentional, CI pipelines failing otherwise.
cassandra-four-zero-bridge/src/test/java/org/apache/cassandra/spark/data/CqlTypeY2038Test.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
This class looks very similar to "LocalTableSchemaStore" can you check if you can use that instead of adding this new class. You might make small modifications in LocalTableSchemaStore if required
There was a problem hiding this comment.
LocalTableSchemaStore and new Avro transformer tests are in different modules. LocalTableSchemaStore loads avsc files from class path which doesn't exists in this other modules. Hence using TestSchemaStore, which doesn't load any schema files, instead allows registering of schemas.
| CORE_MAX_PARALLEL_FORKS: 2 | ||
| CORE_TEST_MAX_HEAP_SIZE: "2048m" | ||
| CASSANDRA_USE_JDK11: <<parameters.use_jdk11>> | ||
| command: | | ||
| export GRADLE_OPTS="-Xmx2g -Dorg.gradle.jvmargs=-Xmx2g" | ||
| # Run compile/unit tests, skipping integration tests | ||
| ./gradlew --stacktrace clean assemble check -x cassandra-analytics-integration-tests:test -Dcassandra.analytics.bridges.sstable_format=<<parameters.sstable_format>> | ||
| ./gradlew --no-daemon --max-workers=2 --stacktrace clean assemble check -x cassandra-analytics-integration-tests:test -Dcassandra.analytics.bridges.sstable_format=<<parameters.sstable_format>> |
There was a problem hiding this comment.
Instead of making CI pipeline changes, can you try to tag the relevant tests with @Tag("Sequential")?
--no-daemon flag makes sense.
There was a problem hiding this comment.
- We don't know which tests causing this to tag them. Gradle getting killed while executing pipelines starting with spark3-* which run unit tests (gradlew check). Pipelines starting with int-* are running fine. This problem exists in trunk, not created by this PR. Running CI with --info didn't give any clues as it was bloating logs and CI killing gradle
- And we will have the same problem of finding the culprit everytime CI is broken
There was a problem hiding this comment.
This problem exists in trunk, not created by this PR
All commits are made only after CI is green. I do not think the problem pre-exists in trunk. Likely relevant to the new tests. You will have to dig into the details to determine which test should be tagged with Sequential. The general idea is to run the resource consuming tests sequentially. @lukasz-antoniak, might have some tips.
we will have the same problem of finding the culprit everytime CI is broken
I share the same concern. This is why I am conservative on modifying the CI configuration, given that the CI issue is probably relevant to the current patch.
There was a problem hiding this comment.
-
I see same failures on trunk as well without any changes https://app.circleci.com/pipelines/github/skoppu22/cassandra-analytics?branch=trunk
I am seeing this consistency on trunk, not sure is Circle CI backend for Europe region has different setup by any chance. -
Finding the test which went in trunk causing this will be a separate task and takes time. We won't be able to merge in this PR until we do that, as my trunk pipelines also never turning green without this fix
There was a problem hiding this comment.
I'm with @skoppu22 on this one; we should band-aid here to get things working again and open another JIRA to follow up and fix it the right way. Blocking all work on root causing a regression from another commit should either be solved by:
a. git revert on that commit, or
b. a band-aid workaround w/a follow up JIRA to unblock work
In my humble opinion. :)
There was a problem hiding this comment.
I am good with having follow up work to address the root cause.
My reason was that all the prior commits were only merged with green CI. This is the policy we have been following. Admittedly, there could be reruns and some flakiness. If the build starts to fail consistently, it could mean something specific to this patch.
There was a problem hiding this comment.
Did the 5.0 support w/parameterized testing go in w/out green CI?
There was a problem hiding this comment.
@yifan-c definitely something went in without green CI or something changed in Circle CI side. As I shared link above, trunk pipelines consistently failing with the same problem.
There was a problem hiding this comment.
Something went in between March 2nd and March 13 broke this. My trunk pipeline didn't had this problem on March 2nd, then consistently failing from March 13th.
| public static Map<String, Long> getTTL(CdcEvent event) | ||
| { | ||
| CdcEvent.TimeToLive ttl = event.getTtl(); | ||
| if (ttl == null) | ||
| { | ||
| return null; | ||
| } | ||
| return mapOf(AvroConstants.TTL_KEY, ttl.ttlInSec, AvroConstants.DELETED_AT_KEY, ttl.expirationTimeInSec); | ||
| return mapOf(AvroConstants.TTL_KEY, (long) ttl.ttlInSec, AvroConstants.DELETED_AT_KEY, ttl.expirationTimeInSec); |
There was a problem hiding this comment.
Why converting TTL to long, meanwhile the schema has it as int?
The change is probably unnecessary.
There was a problem hiding this comment.
Here we are creating map, one entry as (TTL_KEY, TTL value as int), second entry (DELETED_AT_KEY, expirationTimeInSec value as long), so we have to take higher of int and long. Otherwise we need to return Map<String, Number> so Number can represent both int and long.
|
|
||
| @ParameterizedTest | ||
| @MethodSource("org.apache.cassandra.cdc.test.TestVersionSupplier#testVersions") | ||
| public void testBasicInsertAvroEncoding(CassandraVersion version) |
There was a problem hiding this comment.
The file adds a few irrelevant test cases. Improving test coverage is in general a good thing. So I am fine with those test cases in this file.
There was a problem hiding this comment.
We have updated two avro schema files, and there are no existing tests going through avro transformers. Hence had to add them. testMaxSupportedTtlAvroEncoding test below in this file verifies deletedAt is long and supporting max TTL as expected on both C* 4.x and 5.0
| maxParallelForks = Math.max(Runtime.runtime.availableProcessors() * 2, 8) | ||
| maxHeapSize = System.getenv('CORE_TEST_MAX_HEAP_SIZE') ?: '3072m' | ||
| maxParallelForks = System.getenv('CORE_MAX_PARALLEL_FORKS')?.toInteger() | ||
| ?: Math.max(Runtime.runtime.availableProcessors() * 2, 8) |
There was a problem hiding this comment.
Please try out the Sequential test tag first.
There was a problem hiding this comment.
As I mentioned above, pipelines broken on trunk, we don't know which test caused that, gradlew check is getting killed on CI
| Dataset<Row> preExpiry = bulkReaderDataFrame(UDT_TTL_TABLE).load(); | ||
| assertThat(preExpiry.collectAsList()).hasSize(ROW_COUNT); | ||
|
|
||
| Uninterruptibles.sleepUninterruptibly(80, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Please find a different way and avoid sleep the test for 80 seconds. We cannot have the CI sleep for 80 seconds.
If there is no other way, please delete all the tests in this file. They are irrelevant to the patch anyway.
There was a problem hiding this comment.
These tests are relevant. CqlSet, CqlMap etc are implemented in C* 4.0 bridge and extended (i.e, reused) in C* 5.0 bridge. While 4.0 bridge expects int, 5.0 bridge expects long. I have modified these files below to use CqlType's method which is implemented separately for both bridges. And there are no existing tests for collections TTL going though this modified code path.
There was a problem hiding this comment.
Just to be clear, I was not arguing the test is not valuable :p
The point is that we should not sleep for 80 seconds. Can you instead try to write data using a timestamp in the past?
There was a problem hiding this comment.
I have merged all 4 tests into one and reduced the sleep to 5 seconds. That way entire test file sleeps for max 5 seconds.
| * Tests verifying Y2038 boundary behavior for {@link CqlType#tombstone} and {@link CqlType#expiring} | ||
| * in Cassandra 4.0. | ||
| */ | ||
| class CqlTypeY2038Test |
There was a problem hiding this comment.
I do not think this test is valuable. Analytics/CDC does not set TTL, it only reads the TTL values from Cassandra. The TTL values are guaranteed to be valid already; otherwise the cql write fails already. Please delete this file.
There was a problem hiding this comment.
We are just ensuring checkedCast code we added indeed complaining when time indeed exceeds the range where int cannot accommodate. But the test can be removed as we are sure checkedCast does the job.
There was a problem hiding this comment.
The reason for removal is mentioned in #183 (comment), and similarly in this comment too, #183 (comment)
Please remove the test.
There was a problem hiding this comment.
This one was removed in the previous commit
| rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, type().serialize(o), | ||
| CellPath.create(ByteBuffer.wrap(UUIDGen.getTimeUUIDBytes())))); | ||
| rowBuilder.addCell(CqlType.expiring(cd, timestamp, ttl, now, type().serialize(o), | ||
| CellPath.create(ByteBuffer.wrap(UUIDGen.getTimeUUIDBytes())))); | ||
| } |
There was a problem hiding this comment.
I think this change can be reverted once you drop CqlTypeY2038Test and revert the change in CqlType
There was a problem hiding this comment.
BufferCell.expiring in 4.0 bridge expects int for nowInSec, 5.0 bridge expects long. CqlList implemented in C* 5.0 bridge doesn't override this method to pass long. Hence I have modified this to use CqlType's method which is implemented separately for both bridges and uses appropriate type in corresponding bridges. Without this change, we will have to override this method again in 5.0 bridge to pass long value.
| rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, valueType().serialize(entry.getValue()), | ||
| CellPath.create(keyType().serialize(entry.getKey())))); | ||
| rowBuilder.addCell(CqlType.expiring(cd, timestamp, ttl, now, valueType().serialize(entry.getValue()), | ||
| CellPath.create(keyType().serialize(entry.getKey())))); |
There was a problem hiding this comment.
CqlMap is implemented in C* 4.0 bridge and not extended (i.e, reused) in C* 5.0 bridge. While 4.0 bridge expects int for nowInSec, 5.0 bridge expects long for the same. I have modified this to use CqlType's method which is implemented separately for both bridges and uses appropriate type in corresponding bridges. Without this change, we will have to implement CqlMap again in 5.0 bridge and override this function to pass long value.
| rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, ByteBufferUtil.EMPTY_BYTE_BUFFER, | ||
| CellPath.create(type().serialize(o)))); | ||
| rowBuilder.addCell(CqlType.expiring(cd, timestamp, ttl, now, ByteBufferUtil.EMPTY_BYTE_BUFFER, | ||
| CellPath.create(type().serialize(o)))); |
There was a problem hiding this comment.
Same as explained above for CqlMap
CHANGES.txt
Outdated
| @@ -1,5 +1,6 @@ | |||
| 0.4.0 | |||
| ----- | |||
| * Fix year 2038 problem using long for absolute times and support C* 5.0 extended localDeletionTime | |||
There was a problem hiding this comment.
The year 2038 problem sounds cool, but I feel it shades the actual change in this patch.
I would revise the change log entry to "Support extended deletion time in CDC for Cassandra 5.0" and update the JIRA title too.
| 0.4.0 | ||
| ----- | ||
| * Setup CI Pipeline with GitHub Actions (CASSANALYTICS-106) | ||
| * Support extended deletion time in CDC for Cassandra 5.0 |
There was a problem hiding this comment.
👍 to the removal. GitHub Actions is not user facing
deletedAtin Avro schema to longCI run here : https://app.circleci.com/pipelines/github/skoppu22/cassandra-analytics/75/workflows/f6d5f03b-8373-4549-8daf-5f776f78ff96