Skip to content

CASSANALYTICS-26: Support vector data type#144

Open
lukasz-antoniak wants to merge 6 commits intoapache:trunkfrom
lukasz-antoniak:CASSANALYTICS-26
Open

CASSANALYTICS-26: Support vector data type#144
lukasz-antoniak wants to merge 6 commits intoapache:trunkfrom
lukasz-antoniak:CASSANALYTICS-26

Conversation

@lukasz-antoniak
Copy link
Copy Markdown
Member

@lukasz-antoniak lukasz-antoniak marked this pull request as ready for review February 18, 2026 06:50

public abstract CqlField.CqlList list(CqlField.CqlType type);

public abstract CqlField.CqlVector vector(CqlField.CqlType type, int dimentions);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: dimentions -> dimensions

public abstract class CassandraTypes
{
public static final Pattern COLLECTION_PATTERN = Pattern.compile("^(set|list|map|tuple)<(.+)>$", Pattern.CASE_INSENSITIVE);
public static final Pattern VECTOR_PATTERN = Pattern.compile("^(vector)<(.+),(.*)>$", Pattern.CASE_INSENSITIVE);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the last group (.*) could be improved. We do not want to match 0 characters, right?
I'd propose the pattern ^(vector)<(.+),(.+)>$.

Comment on lines +247 to +248
throw new UnsupportedOperationException("Only native, collection, tuples or UDT data types are supported, "
+ "unsupported data type: " + cqlType.toString());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for cassandra 5, vector is also supported, so the error message ("Only native, collection, tuples or UDT data types are supported") is misleading.

Comment on lines +120 to +130
for (Object o : (List<?>) value)
{
if (ttl != NO_TTL)
{
rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, type().serialize(o),
CellPath.create(TimeUUID.Generator.nextTimeUUID().toBytes())));
}
else
{
rowBuilder.addCell(BufferCell.live(cd, timestamp, type().serialize(o), randomCellPath()));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think vector is a single cell data type. But the for-loop adds multiple cells. Interesting that the tests are passing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. This method is only used for creating test mutations for CDC.


@ParameterizedTest
@MethodSource("org.apache.cassandra.bridge.VersionRunner#bridges")
public void testVector(CassandraBridge bridge)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this test uses bridge to write sstables to disk and doesn't test complete bulk writer and bulk reader path. Can you add atleast one dtest using bulkWriterDataFrameWriter and bulkReaderDataFrame to test e2e writing and reading of vectors?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure do we need a vector converter in SqlToCqlTypeConverter.java similar to what we did for Tuples in this PR https://github.com/apache/cassandra-analytics/pull/174/changes. A dtest using bulk writer and reader will expose it if a converter is needed

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good catch! Still working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants