Skip to content

CASSANALYTICS-137: Add end to end test with BTI format sstable#185

Open
yifan-c wants to merge 3 commits intoapache:trunkfrom
yifan-c:CASSANALYTICS-137/bti-test
Open

CASSANALYTICS-137: Add end to end test with BTI format sstable#185
yifan-c wants to merge 3 commits intoapache:trunkfrom
yifan-c:CASSANALYTICS-137/bti-test

Conversation

@yifan-c
Copy link
Copy Markdown
Contributor

@yifan-c yifan-c commented Mar 20, 2026

Patch by Yifan Cai; Reviewed by TBD for CASSANALYTICS-137

Patch by Yifan Cai; Reviewed by TBD for CASSANALYTICS-137
Copy link
Copy Markdown
Contributor

@frankgh frankgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

{
System.setProperty("spark.cassandra_analytics.cassandra.version", "5.0.0");
System.setProperty("cassandra.analytics.bridges.sstable_format", "bti");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we limit the test to C* 5+?

Suggested change
}
}
@Override
protected void beforeClusterProvisioning()
{
assumeThat(SimpleCassandraVersion.create(testVersion.version()).major)
.as("BTI is only supported in Cassandra 5+")
.isGreaterThanOrEqualTo(MIN_VERSION_FOR_BTI);
}

Copy link
Copy Markdown
Member

@lukasz-antoniak lukasz-antoniak Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively:

static
{
    System.setProperty("cassandra.analytics.bridges.sstable_format", "bti");
}

@Override
protected void beforeClusterProvisioning()
{
    assumeThat(CassandraVersion.fromVersion(TestUtils.getDTestClusterVersion().getValue())
                               .orElseThrow()
                               .supportedSstableFormats())
    .as("BTI sstable format is not supported")
    .contains("bti");
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both. I liked the check on the feature name (bti) better.

/**
* A simple test that runs a sample read/write Cassandra Analytics job using BTI format SSTable.
*/
class CassandraAnalyticsSimpleBtiTest extends CassandraAnalyticsSimpleTest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please read sstables after bulk write and verify files have bti da format in file name? I have a draft of this


    /**
     * Verifies that all SSTable data files across all nodes in the cluster use the BTI format (bti-da).
     * BTI format data files follow the naming pattern: {@code da-<generation>-bti-Data.db}
     */
    private void verifySSTableFormat()
    {
        boolean foundDataFiles = false;
        for (int i = 1; i <= cluster.size(); i++)
        {
            if (cluster.get(i).isShutdown())
            {
                continue;
            }
            Set<String> dataFileNames = findSSTableDataFiles(i);
            for (String fileName : dataFileNames)
            {
                foundDataFiles = true;
                assertThat(fileName)
                    .as("SSTable data file should be in BTI format (bti-da) on node %d: %s", i, fileName)
                    .contains("-bti-");
                // Verify the version component is 'da' for Cassandra 5.0 BTI format
                assertThat(fileName)
                    .as("SSTable data file should have version 'da' for Cassandra 5.0 BTI format on node %d: %s",
                        i, fileName)
                    .matches("da-\\d+-bti-Data\\.db");
            }
        }
        assertThat(foundDataFiles)
            .as("Expected to find SSTable data files on at least one node")
            .isTrue();
    }

    /**
     * Finds all SSTable data files for the test table on the given cluster node.
     *
     * @param nodeIndex the 1-based node index in the cluster
     * @return set of data file names found
     */
    private Set<String> findSSTableDataFiles(int nodeIndex)
    {
        String[] dataDirs = (String[]) cluster.get(nodeIndex)
                                              .config()
                                              .getParams()
                                              .get("data_file_directories");
        String dataDir = dataDirs[0];
        Path keyspacePath = Paths.get(dataDir, TEST_KEYSPACE);

        if (!Files.exists(keyspacePath))
        {
            return Collections.emptySet();
        }

        try (Stream<Path> walkStream = Files.walk(keyspacePath))
        {
            return walkStream
                .filter(Files::isRegularFile)
                .map(path -> path.getFileName().toString())
                .filter(name -> name.endsWith("-Data.db"))
                .collect(Collectors.toSet());
        }
        catch (IOException e)
        {
            throw new RuntimeException("Failed to list SSTable data files on node " + nodeIndex, e);
        }
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants