-
Notifications
You must be signed in to change notification settings - Fork 26
CASSANALYTICS-122 : Use long for absolute times and support C* 5.0 extended localDeletionTime #183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
cdfccd1
5baec3e
a2180fb
7459d2e
b15f491
2bacb9b
4890d64
fc56cca
30d895f
dc9d184
a3e32bd
3c02705
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -64,10 +64,13 @@ commands: | |
| JDK_VERSION: "<<parameters.jdk>>" | ||
| INTEGRATION_MAX_PARALLEL_FORKS: 1 | ||
| INTEGRATION_MAX_HEAP_SIZE: "1500M" | ||
| CORE_MAX_PARALLEL_FORKS: 2 | ||
| CORE_TEST_MAX_HEAP_SIZE: "2048m" | ||
| CASSANDRA_USE_JDK11: <<parameters.use_jdk11>> | ||
| command: | | ||
| export GRADLE_OPTS="-Xmx2g -Dorg.gradle.jvmargs=-Xmx2g" | ||
| # Run compile/unit tests, skipping integration tests | ||
| ./gradlew --stacktrace clean assemble check -x cassandra-analytics-integration-tests:test -Dcassandra.analytics.bridges.sstable_format=<<parameters.sstable_format>> | ||
| ./gradlew --no-daemon --max-workers=2 --stacktrace clean assemble check -x cassandra-analytics-integration-tests:test -Dcassandra.analytics.bridges.sstable_format=<<parameters.sstable_format>> | ||
|
Comment on lines
+67
to
+73
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of making CI pipeline changes, can you try to tag the relevant tests with
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
All commits are made only after CI is green. I do not think the problem pre-exists in trunk. Likely relevant to the new tests. You will have to dig into the details to determine which test should be tagged with
I share the same concern. This is why I am conservative on modifying the CI configuration, given that the CI issue is probably relevant to the current patch.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm with @skoppu22 on this one; we should band-aid here to get things working again and open another JIRA to follow up and fix it the right way. Blocking all work on root causing a regression from another commit should either be solved by: In my humble opinion. :)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am good with having follow up work to address the root cause. My reason was that all the prior commits were only merged with green CI. This is the policy we have been following. Admittedly, there could be reruns and some flakiness. If the build starts to fail consistently, it could mean something specific to this patch.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did the 5.0 support w/parameterized testing go in w/out green CI?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @yifan-c definitely something went in without green CI or something changed in Circle CI side. As I shared link above, trunk pipelines consistently failing with the same problem.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something went in between March 2nd and March 13 broke this. My trunk pipeline didn't had this problem on March 2nd, then consistently failing from March 13th. |
||
|
|
||
| run_integration: | ||
| parameters: | ||
|
|
@@ -93,10 +96,11 @@ commands: | |
| INTEGRATION_MAX_HEAP_SIZE: "2500M" | ||
| CASSANDRA_USE_JDK11: <<parameters.use_jdk11>> | ||
| command: | | ||
| export GRADLE_OPTS="-Xmx2g -Dorg.gradle.jvmargs=-Xmx2g" | ||
| export DTEST_JAR="dtest-<< parameters.cassandra >>.jar" | ||
| export CASSANDRA_VERSION=$(echo << parameters.cassandra >> | cut -d'.' -f 1,2) | ||
| # Run compile but not unit tests (which are run in run_build) | ||
| ./gradlew --stacktrace clean assemble | ||
| ./gradlew --no-daemon --max-workers=2 --stacktrace clean assemble | ||
| # Run integration tests in parallel | ||
| cd cassandra-analytics-integration-tests/src/test/java | ||
| # Get list of classnames of tests that should run on this node | ||
|
|
@@ -132,7 +136,7 @@ jobs: | |
| name: Build dependencies for jdk11 builds | ||
| command: | | ||
| CASSANDRA_USE_JDK11=true ./scripts/build-dependencies.sh | ||
| ./gradlew codeCheckTasks | ||
| ./gradlew --no-daemon --max-workers=2 codeCheckTasks | ||
| - persist_to_workspace: | ||
| root: dependencies | ||
| paths: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| 0.4.0 | ||
| ----- | ||
| * Setup CI Pipeline with GitHub Actions (CASSANALYTICS-106) | ||
| * Support extended deletion time in CDC for Cassandra 5.0 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 to the removal. GitHub Actions is not user facing |
||
| * Flush event consumer before persisting CDC state to prevent data loss on failure (CASSANALYTICS-126) | ||
| * Fix ReadStatusTracker to distinguish clean completion from error termination in BufferingCommitLogReader (CASSANALYTICS-129) | ||
| * Adding CDC support for Cassandra 5.0 Commit Logs (CASSANALYTICS-60) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -281,14 +281,14 @@ public static GenericData.Record getTTLAvro(CdcEvent event, Schema ttlSchema) | |
| return ttlRecord; | ||
| } | ||
|
|
||
| public static Map<String, Integer> getTTL(CdcEvent event) | ||
| public static Map<String, Long> getTTL(CdcEvent event) | ||
| { | ||
| CdcEvent.TimeToLive ttl = event.getTtl(); | ||
| if (ttl == null) | ||
| { | ||
| return null; | ||
| } | ||
| return mapOf(AvroConstants.TTL_KEY, ttl.ttlInSec, AvroConstants.DELETED_AT_KEY, ttl.expirationTimeInSec); | ||
| return mapOf(AvroConstants.TTL_KEY, (long) ttl.ttlInSec, AvroConstants.DELETED_AT_KEY, ttl.expirationTimeInSec); | ||
|
Comment on lines
+284
to
+291
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why converting TTL to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here we are creating map, one entry as (TTL_KEY, TTL value as int), second entry (DELETED_AT_KEY, expirationTimeInSec value as long), so we have to take higher of int and long. Otherwise we need to return Map<String, Number> so Number can represent both int and long. |
||
| } | ||
|
|
||
| public static UpdatedEvent getUpdatedEvent(CdcEvent event, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -145,7 +145,7 @@ | |
| }, | ||
| { | ||
| "name": "deletedAt", | ||
| "type": "int", | ||
| "type": "long", | ||
| "doc": "Future timestamp in seconds" | ||
| } | ||
| ] | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -145,7 +145,7 @@ | |
| }, | ||
| { | ||
| "name": "deletedAt", | ||
| "type": "int", | ||
| "type": "long", | ||
| "doc": "Future timestamp in seconds" | ||
| } | ||
| ] | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,178 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.cassandra.cdc; | ||
|
|
||
| import java.io.IOException; | ||
| import java.nio.ByteBuffer; | ||
| import java.util.concurrent.TimeUnit; | ||
| import java.util.concurrent.atomic.AtomicReference; | ||
| import java.util.function.Function; | ||
|
|
||
| import org.junit.jupiter.params.ParameterizedTest; | ||
| import org.junit.jupiter.params.provider.MethodSource; | ||
|
|
||
| import org.apache.avro.Schema; | ||
| import org.apache.avro.generic.GenericData; | ||
| import org.apache.avro.generic.GenericDatumReader; | ||
| import org.apache.avro.generic.GenericRecord; | ||
| import org.apache.avro.io.DecoderFactory; | ||
| import org.apache.cassandra.bridge.CassandraVersion; | ||
| import org.apache.cassandra.bridge.CdcBridgeFactory; | ||
| import org.apache.cassandra.cdc.api.KeyspaceTypeKey; | ||
| import org.apache.cassandra.cdc.avro.AvroByteRecordTransformer; | ||
| import org.apache.cassandra.cdc.avro.AvroConstants; | ||
| import org.apache.cassandra.cdc.avro.AvroSchemas; | ||
| import org.apache.cassandra.cdc.avro.CqlToAvroSchemaConverter; | ||
| import org.apache.cassandra.cdc.avro.TestSchemaStore; | ||
| import org.apache.cassandra.cdc.msg.CdcEvent; | ||
| import org.apache.cassandra.cdc.test.CdcTestBase; | ||
| import org.apache.cassandra.cdc.test.CdcTester; | ||
| import org.apache.cassandra.spark.data.CqlField; | ||
| import org.apache.cassandra.spark.data.CqlTable; | ||
| import org.apache.cassandra.spark.utils.test.TestSchema; | ||
|
|
||
| import static org.apache.cassandra.cdc.test.CdcTester.testWith; | ||
| import static org.assertj.core.api.Assertions.assertThat; | ||
|
|
||
| /** | ||
| * Tests that exercise the CDC-to-Avro byte-serialization pipeline ({@code cdc_bytes.avsc}), | ||
| */ | ||
| public class AvroByteRecordTransformerTest extends CdcTestBase | ||
| { | ||
| private static final int NUM_ROWS = 50; | ||
|
|
||
| private CqlToAvroSchemaConverter getConverter(CassandraVersion version) | ||
| { | ||
| CqlToAvroSchemaConverter converter = CdcBridgeFactory.getCqlToAvroSchemaConverter(version); | ||
| assertThat(converter).isNotNull(); | ||
| return converter; | ||
| } | ||
|
|
||
| /** | ||
| * Build the Avro byte transformer by converting the CQL table schema | ||
| * into an Avro schema and registering it in the test schema store. | ||
| */ | ||
| private AvroByteRecordTransformer buildTransformer(CqlToAvroSchemaConverter converter, | ||
| CqlTable cqlTable, | ||
| TestSchemaStore schemaStore) | ||
| { | ||
| Schema avroSchema = converter.convert(cqlTable); | ||
| String namespace = cqlTable.keyspace() + "." + cqlTable.table(); | ||
| schemaStore.registerSchema(namespace, avroSchema); | ||
|
|
||
| Function<KeyspaceTypeKey, CqlField.CqlType> typeLookup = key -> bridge.parseType(key.type); | ||
| return new AvroByteRecordTransformer(schemaStore, typeLookup); | ||
| } | ||
|
|
||
| /** | ||
| * Deserialize a byte payload using the table's Avro schema from the schema store. | ||
| */ | ||
| private GenericRecord deserializePayload(ByteBuffer payloadBytes, GenericDatumReader<GenericRecord> reader) throws IOException | ||
| { | ||
| byte[] bytes = new byte[payloadBytes.remaining()]; | ||
| payloadBytes.get(bytes); | ||
| return reader.read(null, DecoderFactory.get().binaryDecoder(bytes, null)); | ||
| } | ||
|
|
||
| @ParameterizedTest | ||
| @MethodSource("org.apache.cassandra.cdc.test.TestVersionSupplier#testVersions") | ||
| public void testTtlDeletedAtByteAvroEncoding(CassandraVersion version) | ||
| { | ||
| AvroSchemas.registerLogicalTypes(); | ||
| CqlToAvroSchemaConverter converter = getConverter(version); | ||
|
|
||
| int ttlSeconds = 3600; | ||
| long beforeTestEpochSec = TimeUnit.MILLISECONDS.toSeconds(System.currentTimeMillis()); | ||
|
|
||
| TestSchema.Builder schemaBuilder = TestSchema.builder(bridge) | ||
| .withPartitionKey("pk", bridge.uuid()) | ||
| .withColumn("c1", bridge.bigint()) | ||
| .withColumn("c2", bridge.text()); | ||
|
|
||
| AtomicReference<CqlTable> tableRef = new AtomicReference<>(); | ||
| testWith(bridge, cdcBridge, commitLogDir, schemaBuilder) | ||
| .withNumRows(NUM_ROWS) | ||
| .clearWriters() | ||
| .withWriter((tester, rows, writer) -> { | ||
| tableRef.set(tester.cqlTable); | ||
| for (int i = 0; i < tester.numRows; i++) | ||
| { | ||
| TestSchema.TestRow testRow = CdcTester.newUniqueRow(tester.schema, rows); | ||
| testRow.setTTL(ttlSeconds); | ||
| writer.accept(testRow, TimeUnit.MILLISECONDS.toMicros(System.currentTimeMillis())); | ||
| } | ||
| }) | ||
| .withCdcEventChecker((testRows, events) -> { | ||
| long afterTestEpochSec = TimeUnit.MILLISECONDS.toSeconds(System.currentTimeMillis()); | ||
| assertThat(events).hasSize(NUM_ROWS); | ||
|
|
||
| TestSchemaStore schemaStore = new TestSchemaStore(); | ||
| AvroByteRecordTransformer transformer = buildTransformer(converter, tableRef.get(), schemaStore); | ||
| String namespace = tableRef.get().keyspace() + "." + tableRef.get().table(); | ||
|
|
||
| for (CdcEvent event : events) | ||
| { | ||
| GenericData.Record record = transformer.transform(event); | ||
| assertThat(record).isNotNull(); | ||
|
|
||
| // Validate header-level fields | ||
| assertThat(record.get(AvroConstants.SOURCE_TABLE_KEY).toString()).isEqualTo(event.table); | ||
| assertThat(record.get(AvroConstants.SOURCE_KEYSPACE_KEY).toString()).isEqualTo(event.keyspace); | ||
| assertThat(record.get(AvroConstants.OPERATION_TYPE_KEY).toString()).isEqualTo("INSERT"); | ||
| assertThat(record.get(AvroConstants.TIMESTAMP_KEY)).isNotNull(); | ||
| assertThat(record.get(AvroConstants.VERSION_KEY).toString()).isEqualTo(AvroConstants.CURRENT_VERSION); | ||
|
|
||
| // TTL record should be present | ||
| Object ttlField = record.get(AvroConstants.TTL_KEY); | ||
| assertThat(ttlField).isNotNull(); | ||
| GenericRecord ttlRecord = (GenericRecord) ttlField; | ||
|
|
||
| // Validate TTL value | ||
| assertThat(ttlRecord.get(AvroConstants.TTL_KEY)).isEqualTo(ttlSeconds); | ||
|
|
||
| // Validate deletedAt is a Long (confirms long type in cdc_bytes.avsc) | ||
| Object deletedAt = ttlRecord.get(AvroConstants.DELETED_AT_KEY); | ||
| assertThat(deletedAt).isInstanceOf(Long.class); | ||
|
|
||
| // Validate deletedAt is approximately nowInSeconds + TTL | ||
| long deletedAtValue = (Long) deletedAt; | ||
| assertThat(deletedAtValue) | ||
| .as("deletedAt should be approximately nowInSeconds + TTL") | ||
| .isBetween(beforeTestEpochSec + ttlSeconds, afterTestEpochSec + ttlSeconds); | ||
|
|
||
| // Validate payload: bytes in the byte schema need deserialization to verify content | ||
| Object payloadObj = record.get(AvroConstants.PAYLOAD_KEY); | ||
| assertThat(payloadObj).isInstanceOf(ByteBuffer.class); | ||
| GenericRecord payloadRecord; | ||
| try | ||
| { | ||
| payloadRecord = deserializePayload((ByteBuffer) payloadObj, | ||
| schemaStore.getReader(namespace, null)); | ||
| } | ||
| catch (IOException e) | ||
| { | ||
| throw new RuntimeException("Failed to deserialize payload", e); | ||
| } | ||
| assertThat(payloadRecord.get("pk")).isNotNull(); | ||
| } | ||
| }) | ||
| .run(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you wanna revert these changes or are these intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentional, CI pipelines failing otherwise.