CASSANALYTICS-24 and CASSANALYTICS-25 SSTable version based bridge determination by skoppu22 · Pull Request #169 · apache/cassandra-analytics

skoppu22 · 2026-02-10T17:12:02Z

Jira tasks: CASSANALYTICS-24 and CASSANALYTICS-25

This PR adds sstable version based bridge determination feature to C* analytics. This feature allows C* analytics to choose bridge (which C* version jar to load) based on sstable versions existing on the cluster, instead of strictly depending on running C* versions. This is needed especially to support C* 5.0, which supports multiple compatibility modes.

CI run: https://app.circleci.com/pipelines/github/skoppu22/cassandra-analytics/37/workflows/da025a29-00c0-41f2-95e8-63c2b7f1f276

sarankk

Thanks Shailaja, took a first pass. Reviewed half the files.

sarankk · 2026-02-10T23:53:57Z

cassandra-analytics-common/src/main/java/org/apache/cassandra/bridge/CassandraVersion.java

        this.name = name;
        this.jarBaseName = jarBaseName;
        this.sstableFormats = new HashSet<>(Arrays.asList(sstableFormats));
+        this.nativeSStableVersions = Collections.unmodifiableList(Arrays.asList(nativeSStableVersions));


Nit

Suggested change

this.nativeSStableVersions = Collections.unmodifiableList(Arrays.asList(nativeSStableVersions));

this.nativeSStableVersions = List.of(nativeSStableVersions);

sarankk · 2026-02-19T06:07:05Z

cassandra-analytics-common/src/main/java/org/apache/cassandra/bridge/CassandraVersion.java

+    THREEZERO(30, "3.0", "three-zero", new String[]{"big"},
+              new String[]{
+                  // Cassandra 3.x native sstable versions
+                  // order is important, used to determine the latest version


Array based ordering will be hard to debug in case of issues and hard to maintain as well.

After reading SSTableVersionAnalyzer we could do away without the internal ordering between the sstable versions. Since we are still interested in the associated cassandra version. We could treat the sstable versions within the Cassandra version on same level. Wdyt?

Yes, regardless of which sstable version found in a given Cassandra version, we choose that cassandra version as long as there is no sstable version from higher cassandra version. So the order of sstable versions within a cassandra version doesn't matter. Updated the sorting logic not to depend on array index.

sarankk · 2026-03-24T17:58:11Z

...andra-analytics-common/src/main/java/org/apache/cassandra/bridge/SSTableVersionAnalyzer.java

+
+/**
+ * Analyzes SSTable versions on a cluster to determine the appropriate
+ * Cassandra bridge to load for bulk write operations.


We are using version analyzer for both read and write

Suggested change

* Cassandra bridge to load for bulk write operations.

* Cassandra bridge to load for bulk read/write operations.

sarankk · 2026-03-24T18:31:47Z

...andra-analytics-common/src/main/java/org/apache/cassandra/bridge/SSTableVersionAnalyzer.java

+            if (!v2Opt.isPresent())
+            {
+                throw new IllegalArgumentException(
+                    String.format("Unknown SSTable version: %s. Cannot determine Cassandra version.", v2));


Instead of throwing even if one format is incorrect, we could log these as errors and throw in the end if no max version is found.

We should error and abort job if encountered unrecognized sstable versions. We can log error and ask the user to run the job using fallback mechanism if the user thinks it is safe to run the job. I added error to log this.

sarankk · 2026-03-24T19:54:53Z

cassandra-analytics-common/src/main/java/org/apache/cassandra/bridge/CassandraVersion.java

+
+        // Calculate previous major version: (majorVersion - 1) * 10
+        // E.g., 5 -> 40, 4 -> 30, 3 -> 20
+        return (majorVersion - 1) * 10;


Nit: How about we keep an ordered list of CassandraVersion enums and return the previous one, instead of computing the version.

C* 5.0 supports can read C* 4.0, 4.1, 4.2 ...
C* 4.x can read C* 3.0, 3.1, ..
We cannot add all minor versions and won't be able to keep up with C* release cycle if we do so. Also this PR goal is to make C* analytics independent of C* version numbers. Hence dynamically calculating this makes analytics independent of C* releases.

sarankk · 2026-03-24T20:01:08Z

cassandra-analytics-common/src/test/java/org/apache/cassandra/bridge/CassandraVersionTest.java

+    void testGetSSTableVersionIndexValidVersion()
+    {
+        int index = CassandraVersion.FOURZERO.getSSTableVersionIndex("big-na");
+        assertThat(index).isEqualTo(0);


We could avoid the index checks if we treat the SSTable versions within a Cassandra version on the same level.

Removed this test

sarankk · 2026-03-24T20:29:00Z

...a-analytics-common/src/test/java/org/apache/cassandra/bridge/SSTableVersionAnalyzerTest.java

+public class SSTableVersionAnalyzerTest
+{
+    @Test
+    void testDetermineBridgeVersionForWriteFallbackDisabledSingleVersion()


Nit: Could we make these tests parameterized, passing in expected result as well?

sarankk · 2026-03-24T20:33:21Z

...andra-analytics-common/src/main/java/org/apache/cassandra/bridge/SSTableVersionAnalyzer.java

+        if (sstableVersionsOnCluster == null || sstableVersionsOnCluster.isEmpty())
+        {
+            throw new IllegalStateException(String.format(
+                "Unable to retrieve SSTable versions from cluster. " +


Why not support fallback here and return Cassandra version based bridge?

Here we are asking user to run the job using fallback mechanism if they thinks that's appropriate. We do not use fallback mechanism when spark.cassandra_analytics.bridge.disable_sstable_version_based is false. There must be a reason why sstable version based bridge selection is failing. User can evaluate the cluster state and try with fallback mechanism only if they think that's fine for their use case or cluster situation.

sarankk · 2026-03-24T20:37:11Z

cassandra-analytics-sidecar-client/src/main/java/org/apache/cassandra/clients/Sidecar.java

+                                                                                     Set<SidecarInstance> instances)
+    {
+        return instances.stream()
+                .map(instance -> client


Nit: alignment

sarankk · 2026-03-24T20:38:23Z

cassandra-analytics-sidecar-client/src/main/java/org/apache/cassandra/clients/Sidecar.java

+     * @param instances all Sidecar instances
+     * @return completable futures with GossipInfoResponse
+     */
+    public static List<CompletableFuture<GossipInfoResponse>> gossipInfoFromAllNodes(SidecarClient client,


Not for addressing in this PR, but would be good to have a cluster wide gossip call available in Sidecar.

sarankk · 2026-03-24T21:17:02Z

...ics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/CassandraBulkWriterContext.java

+        return CassandraClusterInfo.getLowestCassandraVersion(conf, null);
+    }
+
+    @Override
+    protected Set<String> getSSTableVersionsOnCluster(@NotNull BulkSparkConf conf)
+    {
+        return CassandraClusterInfo.getSSTableVersionsOnCluster(conf, null);
+    }
+
+    @Override
+    protected ClusterInfo buildClusterInfo(CassandraVersion bridgeVersion)
+    {
+        return new CassandraClusterInfo(bulkSparkConf(), bridgeVersion);


We seem to buildCassandraContext thrice among these 3 method, each cassandra context initializes its own sidecar client. We should avoid this and combine them.

Looking into the possibility, seems difficult as we need to build context once and store somewhere to reuse it.

Fixed this. There is a chicken and egg problem here. Context is needed to determine the bridge, and context contains determined bridge version. Hence creating preliminary context initially (with null bridge version) which is used for bridge determination, then we create final context with the bridge value. Also made changes to reuse nodesettings.

sarankk · 2026-03-24T21:21:28Z

...rg/apache/cassandra/spark/bulkwriter/cloudstorage/coordinated/CassandraClusterInfoGroup.java

+        for (String clusterId : coordinatedWriteConf.clusters().keySet())
+        {
+            Set<String> sstableVersions = CassandraClusterInfo.getSSTableVersionsOnCluster(conf, clusterId);
+            aggregatedSSTableVersions.addAll(sstableVersions);


the check we do comparing lowest and highest Cassandra version in getLowestCassandraVersion is missing when we get bridge based on SSTable versions.

We do that above in getLowestCassandraVersion function, line 393 sorts versions, line 396 picks the lowest and line 397 picks the highest.
Whereas this function is getSSTableVersionsOnCluster, to get all sstable versions on the cluster. Then bridge determination logic sorts them, picks highest version and determines bridge version accordingly.

skoppu22 added 8 commits February 9, 2026 12:41

bridge determination

28b813a

fix checkstyle

e582376

set bridge version on executors

67ac531

refactor

58f86ca

Merge branch 'apache:trunk' into bridged

940e610

Merge branch 'apache:trunk' into bridged

05a503e

Merge branch 'trunk' into bridged

899313b

Merge branch 'trunk' into bridged

9984f24

sarankk reviewed Mar 24, 2026

View reviewed changes

skoppu22 added 5 commits March 25, 2026 19:20

comments

1944668

fix build

c127fbf

reuse context

8e9c69a

Merge branch 'trunk' into bridged

5c15f4c

fix broken merge

5d164ca

	this.nativeSStableVersions = Collections.unmodifiableList(Arrays.asList(nativeSStableVersions));
	this.nativeSStableVersions = List.of(nativeSStableVersions);

	* Cassandra bridge to load for bulk write operations.
	* Cassandra bridge to load for bulk read/write operations.

Conversation

skoppu22 commented Feb 10, 2026

Uh oh!

sarankk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants