feat: Add array data type support#433
Conversation
| let (val, next) = reader.read_long(cursor); | ||
| let decimal = Decimal::from_unscaled_long(val, precision, scale) | ||
| .expect("Failed to create decimal from unscaled long"); | ||
| let decimal = |
There was a problem hiding this comment.
i think we should convert all expect to typed errors so malformed array payload handling does not leave panic backdoor
| // silently widen key semantics. | ||
| if matches!( | ||
| field_type, | ||
| DataType::Array(_) | DataType::Map(_) | DataType::Row(_) |
There was a problem hiding this comment.
this applies to all array/map/row, so i understand this PR only adds Array, but i think better to add all of them.
if there is objection, i can remove
|
@charlesdong1991 Thanks for the great pr. I'll find some time to review |
There was a problem hiding this comment.
Pull request overview
Adds end-to-end ARRAY column support to the Fluss Rust client, including a Java-compatible binary array representation and integration across row encoders/decoders, Arrow interop, and public APIs.
Changes:
- Introduces
FlussArray/FlussArrayWriter(BinaryArray layout) and wires it intoDatum,InternalRow, and runtime field/value writers. - Adds compacted row read/write support for arrays (length-prefixed BinaryArray bytes) and rejects
ARRAYas a key column type. - Extends Arrow conversion logic and updates user/docs + tests for primitive, nullable, empty, and nested arrays.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| website/docs/user-guide/rust/data-types.md | Documents ARRAY<T> type mapping and how to construct/read arrays. |
| website/docs/user-guide/rust/api-reference.md | Adds InternalRow::get_array and documents FlussArray API. |
| crates/fluss/src/row/mod.rs | Exposes binary_array module and re-exports FlussArray; adds InternalRow::get_array. |
| crates/fluss/src/row/field_getter.rs | Adds Array field getter + tests. |
| crates/fluss/src/row/encode/compacted_key_encoder.rs | Adds test ensuring array types are rejected in key encoding. |
| crates/fluss/src/row/datum.rs | Adds Datum::Array and Arrow ListBuilder append/conversion logic. |
| crates/fluss/src/row/compacted/compacted_row_writer.rs | Adds write_array (delegates to length-prefixed bytes). |
| crates/fluss/src/row/compacted/compacted_row_reader.rs | Makes deserialization fallible and adds array decoding via FlussArray::from_bytes. |
| crates/fluss/src/row/compacted/compacted_row.rs | Propagates fallible deserialization and adds InternalRow::get_array + array tests. |
| crates/fluss/src/row/compacted/compacted_key_writer.rs | Explicitly rejects complex key types and adds write_array to the delegated writer surface. |
| crates/fluss/src/row/column.rs | Implements ColumnarRow::get_array by converting Arrow ListArray → FlussArray; adds tests. |
| crates/fluss/src/row/binary_array.rs | New Java-compatible binary array implementation + writer + tests. |
| crates/fluss/src/row/binary/binary_writer.rs | Adds BinaryWriter::write_array and InnerValueWriter::Array. |
| crates/fluss/src/record/arrow.rs | Adds Arrow List builder support and from_arrow_type mapping for list element conversion. |
| bindings/cpp/src/types.rs | Propagates Datum::Array handling in type resolution and row ownership conversion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
@charlesdong1991 LGTM overall, left minor comments
If you fixed smth in the meantime - good, sorry for the noise, as I started reviewing before the new code :)
|
Hi @fresh-borzoni thanks a lot! |
leekeiabstraction
left a comment
There was a problem hiding this comment.
Thank you for the PR! Left some comments. PTAL
fresh-borzoni
left a comment
There was a problem hiding this comment.
@charlesdong1991 TY, looked through again, left comments
PTAL
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fresh-borzoni
left a comment
There was a problem hiding this comment.
@charlesdong1991 TY, Looked through, LGTM overall, one nit comment 👍
|
Thanks both! PTAL @fresh-borzoni @leekeiabstraction @luoyuxia |
|
Test case changes looks good to me Edit: for some reason I mistaken the diff between commits as overall diff. |
fresh-borzoni
left a comment
There was a problem hiding this comment.
@charlesdong1991 TY, LGTM
leekeiabstraction
left a comment
There was a problem hiding this comment.
Approved. TY for the PR!
|
@charlesdong1991 Hi, is it possible to add IT for covering the support for array? Or we can merge this first, but not expose it to user in document util we add it to make sure it work. |
|
|
||
| Element getters mirror `InternalRow` typed getters and return `Result<T>`. For example, use `get_int()`, `get_long()`, and `get_double()` for primitive elements, and `get_string()`, `get_binary()`, `get_decimal()`, `get_timestamp_ntz()`, `get_timestamp_ltz()`, and `get_array()` for variable-length or nested elements. | ||
|
|
||
| TODO: `FlussArray` currently exposes the fallible getter surface as the stable API. Infallible fast-path variants may be added later as non-breaking extensions if needed. |
There was a problem hiding this comment.
why left todo in user-face document?
There was a problem hiding this comment.
it was added to address a comment in Copilot: 66359e4 for the design decision we have now that fluss array only has fallible getters now, and we might add infallible variants in future as non-breaking change.
But i think indeed, it's confusing to leave TODO in user facing doc, and let me just remove this part completely, and move the note to codebase instead @luoyuxia
Purpose
Linked issue: close #386
Brief change log
Add end-to-end ARRAY column support in fluss-rust
Tests
Yes all tests are passed locally