CASSANALYTICS-26: Support vector data type#144
CASSANALYTICS-26: Support vector data type#144lukasz-antoniak wants to merge 6 commits intoapache:trunkfrom
Conversation
|
|
||
| public abstract CqlField.CqlList list(CqlField.CqlType type); | ||
|
|
||
| public abstract CqlField.CqlVector vector(CqlField.CqlType type, int dimentions); |
There was a problem hiding this comment.
typo: dimentions -> dimensions
| public abstract class CassandraTypes | ||
| { | ||
| public static final Pattern COLLECTION_PATTERN = Pattern.compile("^(set|list|map|tuple)<(.+)>$", Pattern.CASE_INSENSITIVE); | ||
| public static final Pattern VECTOR_PATTERN = Pattern.compile("^(vector)<(.+),(.*)>$", Pattern.CASE_INSENSITIVE); |
There was a problem hiding this comment.
I think the last group (.*) could be improved. We do not want to match 0 characters, right?
I'd propose the pattern ^(vector)<(.+),(.+)>$.
| throw new UnsupportedOperationException("Only native, collection, tuples or UDT data types are supported, " | ||
| + "unsupported data type: " + cqlType.toString()); |
There was a problem hiding this comment.
nit: for cassandra 5, vector is also supported, so the error message ("Only native, collection, tuples or UDT data types are supported") is misleading.
| for (Object o : (List<?>) value) | ||
| { | ||
| if (ttl != NO_TTL) | ||
| { | ||
| rowBuilder.addCell(BufferCell.expiring(cd, timestamp, ttl, now, type().serialize(o), | ||
| CellPath.create(TimeUUID.Generator.nextTimeUUID().toBytes()))); | ||
| } | ||
| else | ||
| { | ||
| rowBuilder.addCell(BufferCell.live(cd, timestamp, type().serialize(o), randomCellPath())); | ||
| } |
There was a problem hiding this comment.
I think vector is a single cell data type. But the for-loop adds multiple cells. Interesting that the tests are passing.
There was a problem hiding this comment.
Ah. This method is only used for creating test mutations for CDC.
|
|
||
| @ParameterizedTest | ||
| @MethodSource("org.apache.cassandra.bridge.VersionRunner#bridges") | ||
| public void testVector(CassandraBridge bridge) |
There was a problem hiding this comment.
I believe this test uses bridge to write sstables to disk and doesn't test complete bulk writer and bulk reader path. Can you add atleast one dtest using bulkWriterDataFrameWriter and bulkReaderDataFrame to test e2e writing and reading of vectors?
There was a problem hiding this comment.
Not sure do we need a vector converter in SqlToCqlTypeConverter.java similar to what we did for Tuples in this PR https://github.com/apache/cassandra-analytics/pull/174/changes. A dtest using bulk writer and reader will expose it if a converter is needed
There was a problem hiding this comment.
Very good catch! Still working on it.
33f4d30 to
2ca9e3f
Compare
Fixes CASSANALYTICS-26.